Boeva, Galina
Label Attention Network for sequential multi-label classification: you were looking at a wrong self-attention
Kovtun, Elizaveta, Boeva, Galina, Zabolotnyi, Artem, Burnaev, Evgeny, Spindler, Martin, Zaytsev, Alexey
Most of the available user information can be represented as a sequence of timestamped events. Each event is assigned a set of categorical labels whose future structure is of great interest. For instance, our goal is to predict a group of items in the next customer's purchase or tomorrow's client transactions. This is a multi-label classification problem for sequential data. Modern approaches focus on transformer architecture for sequential data introducing self-attention for the elements in a sequence. In that case, we take into account events' time interactions but lose information on label inter-dependencies. Motivated by this shortcoming, we propose leveraging a self-attention mechanism over labels preceding the predicted step. As our approach is a Label-Attention NETwork, we call it LANET. Experimental evidence suggests that LANET outperforms the established models' performance and greatly captures interconnections between labels. For example, the micro-AUC of our approach is $0.9536$ compared to $0.7501$ for a vanilla transformer. We provide an implementation of LANET to facilitate its wider usage.
Continuous-time convolutions model of event sequences
Zhuzhel, Vladislav, Grabar, Vsevolod, Boeva, Galina, Zabolotnyi, Artem, Stepikin, Alexander, Zholobov, Vladimir, Ivanova, Maria, Orlov, Mikhail, Kireev, Ivan, Burnaev, Evgeny, Rivera-Castro, Rodrigo, Zaytsev, Alexey
Massive samples of event sequences data occur in various domains, including e-commerce, healthcare, and finance. There are two main challenges regarding inference of such data: computational and methodological. The amount of available data and the length of event sequences per client are typically large, thus it requires long-term modelling. Moreover, this data is often sparse and non-uniform, making classic approaches for time series processing inapplicable. Existing solutions include recurrent and transformer architectures in such cases. To allow continuous time, the authors introduce specific parametric intensity functions defined at each moment on top of existing models. Due to the parametric nature, these intensities represent only a limited class of event sequences. We propose the COTIC method based on a continuous convolution neural network suitable for non-uniform occurrence of events in time. In COTIC, dilations and multi-layer architecture efficiently handle dependencies between events. Furthermore, the model provides general intensity dynamics in continuous time - including self-excitement encountered in practice. The COTIC model outperforms existing approaches on majority of the considered datasets, producing embeddings for an event sequence that can be used to solve downstream tasks - e.g. predicting next event type and return time. The code of the proposed method can be found in the GitHub repository (https://github.com/VladislavZh/COTIC).