Clustering
Data Driven Aircraft Trajectory Prediction with Deep Imitation Learning
Bastas, Alevizos, Kravaris, Theocharis, Vouros, George A.
The current Air Traffic Management (ATM) system worldwide has reached its limits in terms of predictability, efficiency and cost effectiveness. Different initiatives worldwide propose trajectory-oriented transformations that require high fidelity aircraft trajectory planning and prediction capabilities, supporting the trajectory life cycle at all stages efficiently. Recently proposed data-driven trajectory prediction approaches provide promising results. In this paper we approach the data-driven trajectory prediction problem as an imitation learning task, where we aim to imitate experts "shaping" the trajectory. Towards this goal we present a comprehensive framework comprising the Generative Adversarial Imitation Learning state of the art method, in a pipeline with trajectory clustering and classification methods. This approach, compared to other approaches, can provide accurate predictions for the whole trajectory (i.e. with a prediction horizon until reaching the destination) both at the pre-tactical (i.e. starting at the departure airport at a specific time instant) and at the tactical (i.e. from any state while flying) stages, compared to state of the art approaches.
Unsupervised Severe Weather Detection Via Joint Representation Learning Over Textual and Weather Data
Davvetas, Athanasios, Klampanos, Iraklis A.
When observing a phenomenon, severe cases or anomalies are often characterised by deviation from the expected data distribution. However, non-deviating data samples may also implicitly lead to severe outcomes. In the case of unsupervised severe weather detection, these data samples can lead to mispredictions, since the predictors of severe weather are often not directly observed as features. We posit that incorporating external or auxiliary information, such as the outcome of an external task or an observation, can improve the decision boundaries of an unsupervised detection algorithm. In this paper, we increase the effectiveness of a clustering method to detect cases of severe weather by learning augmented and linearly separable latent representations.We evaluate our solution against three individual cases of severe weather, namely windstorms, floods and tornado outbreaks.
Patient Similarity Analysis with Longitudinal Health Data
Allam, Ahmed, Dittberner, Matthias, Sintsova, Anna, Brodbeck, Dominique, Krauthammer, Michael
Healthcare professionals have long envisioned using the enormous processing powers of computers to discover new facts and medical knowledge locked inside electronic health records. These vast medical archives contain time-resolved information about medical visits, tests and procedures, as well as outcomes, which together form individual patient journeys. By assessing the similarities among these journeys, it is possible to uncover clusters of common disease trajectories with shared health outcomes. The assignment of patient journeys to specific clusters may in turn serve as the basis for personalized outcome prediction and treatment selection. This procedure is a non-trivial computational problem, as it requires the comparison of patient data with multi-dimensional and multi-modal features that are captured at different times and resolutions. In this review, we provide a comprehensive overview of the tools and methods that are used in patient similarity analysis with longitudinal data and discuss its potential for improving clinical decision making.
Know Your Clients' behaviours: a cluster analysis of financial transactions
Thompson, John R. J., Feng, Longlong, Reesor, R. Mark, Grace, Chuck
In Canada, financial advisors and dealers are required by provincial securities commissions and self-regulatory organizations--charged with direct regulation over investment dealers and mutual fund dealers--to respectively collect and maintain Know Your Client (KYC) information, such as their age or risk tolerance, for investor accounts. With this information, investors, under their advisor's guidance, make decisions on their investments which are presumed to be beneficial to their investment goals. Our unique dataset is provided by a financial investment dealer with over 50,000 accounts for over 23,000 clients. We use a modified behavioural finance recency, frequency, monetary model for engineering features that quantify investor behaviours, and machine learning clustering algorithms to find groups of investors that behave similarly. We show that the KYC information collected does not explain client behaviours, whereas trade and transaction frequency and volume are most informative. We believe the results shown herein encourage financial regulators and advisors to use more advanced metrics to better understand and predict investor behaviours.
Enabling Edge Cloud Intelligence for Activity Learning in Smart Home
Huang, Bing, Bouguettaya, Athman, Dong, Hai
We propose a novel activity learning framework based on Edge Cloud architecture for the purpose of recognizing and predicting human activities. Although activity recognition has been vastly studied by many researchers, the temporal features that constitute an activity, which can provide useful insights for activity models, have not been exploited to their full potentials by mining algorithms. In this paper, we utilize temporal features for activity recognition and prediction in a single smart home setting. We discover activity patterns and temporal relations such as the order of activities from real data to develop a prompting system. Analysis of real data collected from smart homes was used to validate the proposed method.
Many-Objective Software Remodularization using NSGA-III
Mkaouer, Mohamed Wiem, Kessentini, Marouane, Shaout, Adnan, Koligheu, Patrice, Bechikh, Slim, Deb, Kalyanmoy, Ouni, Ali
Software systems nowadays are complex and difficult to maintain due to continuous changes and bad design choices. To handle the complexity of systems, software products are, in general, decomposed in terms of packages/modules containing classes that are dependent. However, it is challenging to automatically remodularize systems to improve their maintainability. The majority of existing remodularization work mainly satisfy one objective which is improving the structure of packages by optimizing coupling and cohesion. In addition, most of existing studies are limited to only few operation types such as move class and split packages. Many other objectives, such as the design semantics, reducing the number of changes and maximizing the consistency with development change history, are important to improve the quality of the software by remodularizing it. In this paper, we propose a novel many-objective search-based approach using NSGA-III. The process aims at finding the optimal remodularization solutions that improve the structure of packages, minimize the number of changes, preserve semantics coherence, and re-use the history of changes. We evaluate the efficiency of our approach using four different open-source systems and one automotive industry project, provided by our industrial partner, through a quantitative and qualitative study conducted with software engineers.
SimpleMKKM: Simple Multiple Kernel K-means
Liu, Xinwang, Zhu, En, Liu, Jiyuan, Hospedales, Timothy, Wang, Yang, Wang, Meng
We propose a simple yet effective multiple kernel clustering algorithm, termed simple multiple kernel k-means (SimpleMKKM). It extends the widely used supervised kernel alignment criterion to multi-kernel clustering. Our criterion is given by an intractable minimization-maximization problem in the kernel coefficient and clustering partition matrix. To optimize it, we re-formulate the problem as a smooth minimization one, which can be solved efficiently using a reduced gradient descent algorithm. We theoretically analyze the performance of SimpleMKKM in terms of its clustering generalization error. Comprehensive experiments on 11 benchmark datasets demonstrate that SimpleMKKM outperforms state of the art multi-kernel clustering alternatives.
Learning the Associations of MITRE ATT&CK Adversarial Techniques
Al-Shaer, Rawan, Spring, Jonathan M., Christou, Eliana
The MITRE ATT&CK Framework provides a rich and actionable repository of adversarial tactics, techniques, and procedures (TTP). However, this information would be highly useful for attack diagnosis (i.e., forensics) and mitigation (i.e., intrusion response) if we can reliably construct technique associations that will enable predicting unobserved attack techniques based on observed ones. In this paper, we present our statistical machine learning analysis on APT and Software attack data reported by MITRE ATT&CK to infer the technique clustering that represents the significant correlation that can be used for technique prediction. Due to the complex multidimensional relationships between techniques, many of the traditional clustering methods could not obtain usable associations. Our approach, using hierarchical clustering for inferring attack technique associations with 95% confidence, provides statistically significant and explainable technique correlations. Our analysis discovers 98 different technique associations (i.e., clusters) for both APT and Software attacks. Our evaluation results show that 78% of the techniques associated by our algorithm exhibit significant mutual information that indicates reasonably high predictability.
Exchangeability, Conformal Prediction, and Rank Tests
Although these two concepts are very closely related, the fact that exchangeability allows for a specific type of dependence between the random variables leads to numerous implications/applications of this concept. One of the most important implications of exchangeability is that the indexing of random variables is immaterial. In technical words, this means that the ranks of real-valued exchangeable random variables are uniform over the set of all permutations. Just this one implication has pioneered two very different fields in statistics and machine learning, namely, nonparametric rank tests and conformal prediction. The main purpose of this article is to define exchangeability, discuss its implications (rigorously), and then exposit the uses of this concept for conformal prediction and rank tests. To our knowledge, conformal prediction (starting from Vovk et al. (2005)) is the first field to apply the full strength of exchangeability.
Visual Analytics and Human Involvement in Machine Learning
Eisler, Salomon, Meyer, Joachim
The rapidly developing AI systems and applications still require human involvement in practically all parts of the analytics process. Human decisions are largely based on visualizations, providing data scientists details of data properties and the results of analytical procedures. Different visualizations are used in the different steps of the Machine Learning (ML) process. The decision which visualization to use depends on factors, such as the data domain, the data model and the step in the ML process. In this chapter, we describe the seven steps in the ML process and review different visualization techniques that are relevant for the different steps for different types of data, models and purposes.