Anomaly detection for time-series data has been an important research field for a long time. Seminal work on anomaly detection methods has been focussing on statistical approaches. In recent years an increasing number of machine learning algorithms have been developed to detect anomalies on time-series. Subsequently, researchers tried to improve these techniques using (deep) neural networks. In the light of the increasing number of anomaly detection methods, the body of research lacks a broad comparative evaluation of statistical, machine learning and deep learning methods. This paper studies 20 univariate anomaly detection methods from the all three categories. The evaluation is conducted on publicly available datasets, which serve as benchmarks for time-series anomaly detection. By analyzing the accuracy of each method as well as the computation time of the algorithms, we provide a thorough insight about the performance of these anomaly detection approaches, alongside some general notion of which method is suited for a certain type of data.
Its impact is drastic and real: Youtube's AIdriven recommendation system would present sports videos for days if one happens to watch a live baseball game on the platform ; email writing becomes much faster with machine learning (ML) based auto-completion ; many businesses have adopted natural language processing based chatbots as part of their customer services . AI has also greatly advanced human capabilities in complex decision-making processes ranging from determining how to allocate security resources to protect airports  to games such as poker  and Go . All such tangible and stunning progress suggests that an "AI summer" is happening. As some put it, "AI is the new electricity" . Meanwhile, in the past decade, an emerging theme in the AI research community is the so-called "AI for social good" (AI4SG): researchers aim at developing AI methods and tools to address problems at the societal level and improve the wellbeing of the society.
Multi-horizon forecasting problems often contain a complex mix of inputs -- including static (i.e. time-invariant) covariates, known future inputs, and other exogenous time series that are only observed historically -- without any prior information on how they interact with the target. While several deep learning models have been proposed for multi-step prediction, they typically comprise black-box models which do not account for the full range of inputs present in common scenarios. In this paper, we introduce the Temporal Fusion Transformer (TFT) -- a novel attention-based architecture which combines high-performance multi-horizon forecasting with interpretable insights into temporal dynamics. To learn temporal relationships at different scales, the TFT utilizes recurrent layers for local processing and interpretable self-attention layers for learning long-term dependencies. The TFT also uses specialized components for the judicious selection of relevant features and a series of gating layers to suppress unnecessary components, enabling high performance in a wide range of regimes. On a variety of real-world datasets, we demonstrate significant performance improvements over existing benchmarks, and showcase three practical interpretability use-cases of TFT.
Kairouz, Peter, McMahan, H. Brendan, Avent, Brendan, Bellet, Aurélien, Bennis, Mehdi, Bhagoji, Arjun Nitin, Bonawitz, Keith, Charles, Zachary, Cormode, Graham, Cummings, Rachel, D'Oliveira, Rafael G. L., Rouayheb, Salim El, Evans, David, Gardner, Josh, Garrett, Zachary, Gascón, Adrià, Ghazi, Badih, Gibbons, Phillip B., Gruteser, Marco, Harchaoui, Zaid, He, Chaoyang, He, Lie, Huo, Zhouyuan, Hutchinson, Ben, Hsu, Justin, Jaggi, Martin, Javidi, Tara, Joshi, Gauri, Khodak, Mikhail, Konečný, Jakub, Korolova, Aleksandra, Koushanfar, Farinaz, Koyejo, Sanmi, Lepoint, Tancrède, Liu, Yang, Mittal, Prateek, Mohri, Mehryar, Nock, Richard, Özgür, Ayfer, Pagh, Rasmus, Raykova, Mariana, Qi, Hang, Ramage, Daniel, Raskar, Ramesh, Song, Dawn, Song, Weikang, Stich, Sebastian U., Sun, Ziteng, Suresh, Ananda Theertha, Tramèr, Florian, Vepakomma, Praneeth, Wang, Jianyu, Xiong, Li, Xu, Zheng, Yang, Qiang, Yu, Felix X., Yu, Han, Zhao, Sen
Federated learning (FL) is a machine learning setting where many clients (e.g. mobile devices or whole organizations) collaboratively train a model under the orchestration of a central server (e.g. service provider), while keeping the training data decentralized. FL embodies the principles of focused data collection and minimization, and can mitigate many of the systemic privacy risks and costs resulting from traditional, centralized machine learning and data science approaches. Motivated by the explosive growth in FL research, this paper discusses recent advances and presents an extensive collection of open problems and challenges.
Within the Private Equity (PE) market, the event of a private company undertaking an Initial Public Offering (IPO) is usually a very high-return one for the investors in the company. For this reason, an effective predictive model for the IPO event is considered as a valuable tool in the PE market, an endeavor in which publicly available quantitative information is generally scarce. In this paper, we describe a data-analytic procedure for predicting the probability with which a company will go public in a given forward period of time. The proposed method is based on the interplay of a neural network (NN) model for estimating the overall event probability, and Survival Analysis (SA) for further modeling the probability of the IPO event in any given interval of time. The proposed neuro-survival model is tuned and tested across nine industrial sectors using real data from the Thomson Reuters Eikon PE database.
Design and build personalization engines/learning systems using advanced machine learning and statistical techniques Help the company in identifying tools and components, and building the infrastructure for AI/ML Research and brainstorm with internal partners to identify advanced analytics opportunities to advance automation, help with knowledge discovery, support decision-making, gain insights from data, streamline business processes, and enable new capabilities Perform hands-on data exploration and modeling work on massive data sets. Perform feature engineering, train the algorithms, back-test models, compare model performances and communicate the results Work with senior leaders from all functions to explore opportunities for using advance analytics Provide technical leadership mentoring to talented data scientists and analytics professionals Guide data scientists and engineers in the use of advanced statistical, machine learning, and artificial intelligence methodologies Provide thought leadership by researching best practices, extending and building new machine learning and statistical methodologies, conducting experiments, and collaborating with cross functional teams Develop end-to-end efficient model solutions that drive measurable outcomes. These technical skills include, but not limited to, regression techniques, neural networks, decision trees, clustering, pattern recognition, probability theory, stochastic systems, Bayesian inference, statistical techniques, deep learning, supervised learning, unsupervised learning Solid understanding and hands on experience working with big data, and the related ecosystem, both relational and unstructured. Executing on complex projects, extracting, cleansing, and manipulating large, diverse structured and unstructured data sets on relational – SQL, NOSQL databases Working in an agile environment with iterative development & business feedback Providing insights to support strategic decisions, including offering and delivering insights and recommendations Experience in statistics & analytical modeling, time-series data analysis, forecasting modeling, machine learning algorithms, and deep learning approaches and frameworks. Deliver robust, scale and quality data analytical applications in a cloud environment.
Some recent studies have suggested using GANs for numeric data generation such as to generate data for completing the imbalanced numeric data. Considering the significant difference between the dimensions of the numeric data and images, as well as the strong correlations between features of numeric data, the conventional GANs normally face an overfitting problem, consequently leads to an ill-conditioning problem in generating numeric and structured data. This paper studies the constrained network structures between generator G and discriminator D in WGAN, designs several structures including isomorphic, mirror and self-symmetric structures. We evaluates the performances of the constrained WGANs in data augmentations, taking the non-constrained GANs and WGANs as the baselines. Experiments prove the constrained structures have been improved in 17/20 groups of experiments. In twenty experiments on four UCI Machine Learning Repository datasets, Australian Credit Approval data, German Credit data, Pima Indians Diabetes data and SPECT heart data facing five conventional classifiers. Especially, Isomorphic WGAN is the best in 15/20 experiments. Finally, we theoretically proves that the effectiveness of constrained structures by the directed graphic model (DGM) analysis.
Financial fraud detection represents the challenge of finding anomalies in networks of financial transactions. In general, the anomaly detection (AD) is the problem of distinguishing between normal data samples with well defined patterns or signatures and those that do not conform to the expected profiles. The fraudulent behaviour in money laundering may manifest itself through unusual patterns in financial transaction networks. In such networks, nodes represents customers and the edges are transactions: a directed edge between two nodes illustrates that there is a money transfer in the respective direction, where the weight on the edge is the transferred amount. In this paper we present a survey on the fundamental anomaly detection techniques and then present briefly the relevant literature in connection with fraud detection context.
Many irregular domains such as social networks, financial transactions, neuron connections, and natural language structures are represented as graphs. In recent years, a variety of graph neural networks (GNNs) have been successfully applied for representation learning and prediction on such graphs. However, in many of the applications, the underlying graph changes over time and existing GNNs are inadequate for handling such dynamic graphs. In this paper we propose a novel technique for learning embeddings of dynamic graphs based on a tensor algebra framework. Our method extends the popular graph convolutional network (GCN) for learning representations of dynamic graphs using the recently proposed tensor M-product technique. Theoretical results that establish the connection between the proposed tensor approach and spectral convolution of tensors are developed. Numerical experiments on real datasets demonstrate the usefulness of the proposed method for an edge classification task on dynamic graphs.
Fraud causes substantial costs and losses for companies and clients in the finance and insurance industries. Examples are fraudulent credit card transactions or fraudulent claims. It has been estimated that roughly $10$ percent of the insurance industry's incurred losses and loss adjustment expenses each year stem from fraudulent claims. The rise and proliferation of digitization in finance and insurance have lead to big data sets, consisting in particular of text data, which can be used for fraud detection. In this paper, we propose architectures for text embeddings via deep learning, which help to improve the detection of fraudulent claims compared to other machine learning methods. We illustrate our methods using a data set from a large international health insurance company. The empirical results show that our approach outperforms other state-of-the-art methods and can help make the claims management process more efficient. As (unstructured) text data become increasingly available to economists and econometricians, our proposed methods will be valuable for many similar applications, particularly when variables have a large number of categories as is typical for example of the International Classification of Disease (ICD) codes in health economics and health services.