Generative Adversarial Networks for Temporally-Marked Event Sequences

Manco, Giuseppe; Folino, Francesco; Pontieri, Luigi; Ritacco, Ettore

Increasing amounts of data are becoming available in the form of "asynchronous" sequences of event records, associated each with a content and a temporal mark, e.g. sequences of activities on social media, clickstream data, user interaction logs, point- of-interest trajectories, business process logs, application logs and IoT logs, to name a few. Such a kind of data are more general than classic time series, as the lapse of time between consecutive events in a sequence may be an arbitrary continuous value. Usually, events in the same sequence exhibit hidden correlations (e.g., an event can cause or prevent the occurrence of certain kinds of events in the future). Generative models constitute a powerful and versatile means for analysing such data (e.g., by supporting variegate key tasks, such as data completion, data denoising, simulation analyses), as well as for enabling the very generation of new data instances (e.g., for preventing information leakage). In particular, if devised in a conditional fashion, these models can be exploited to predict which events will happen in the remainder of a given (unfinished) sequence and when, based on the sequence's history. Different kinds of neural generative models have been used in the last years for analysing sequence data, which range from Recurrent Neural Networks (RNNs), to Self-attention models, to more sophisticated frameworks like Variational Auto- encoders (VAEs) and Generative Adversarial Networks (GANs). In particular, roughly speaking, basic GAN frameworks implement a sort min-max game, where a "discrim- inator" sub-net is trained to distinguish real data instances from those produced by a "generator" sub-net, which is trained instead to fool the former sub-net. In principle, compared to previous solutions, GANs could yield models that are more general (owing to the GANs' capability to learn implicit data distributions) and more robust to "exposure bias" issues (i.e., to the risk of accumulating errors at inference time). However, they are notoriously difficult to train and tune optimally --in particular, they may converge slowly, and eventually reach an equilibrium that does not ensure a good enough generator (e.g., the latter may suffer from severe mode collapse issues). Moreover, the discrete nature of temporally-marked event sequences calls for extending traditional GAN schemes, as to prevent the problem of breaking the differentiability of the discriminators output w.r.t. the generators parameters (that would arise when sampling one event sequence from the generator's distribution).

CNR Institutional Research Information System