Performance Analysis
Integrative conformal p-values for powerful out-of-distribution testing with labeled outliers
Liang, Ziyi, Sesia, Matteo, Sun, Wenguang
This paper develops novel conformal methods to test whether a new observation was sampled from the same distribution as a reference set. Blending inductive and transductive conformal inference in an innovative way, the described methods can re-weight standard conformal p-values based on dependent side information from known out-of-distribution data in a principled way, and can automatically take advantage of the most powerful model from any collection of one-class and binary classifiers. The solution can be implemented either through sample splitting or via a novel transductive cross-validation+ scheme which may also be useful in other applications of conformal inference, due to tighter guarantees compared to existing cross-validation approaches. After studying false discovery rate control and power within a multiple testing framework with several possible outliers, the proposed solution is shown to outperform standard conformal p-values through simulations as well as applications to image recognition and tabular data.
Learning to predict synchronization of coupled oscillators on randomly generated graphs
Bassi, Hardeep, Yim, Richard, Kodukula, Rohith, Vendrow, Joshua, Zhu, Cherlin, Lyu, Hanbaek
Suppose we are given a system of coupled oscillators on an unknown graph along with the trajectory of the system during some period. Can we predict whether the system will eventually synchronize? Even with a known underlying graph structure, this is an important yet analytically intractable question in general. In this work, we take an alternative approach to the synchronization prediction problem by viewing it as a classification problem based on the fact that any given system will eventually synchronize or converge to a non-synchronizing limit cycle. By only using some basic statistics of the underlying graphs such as edge density and diameter, our method can achieve perfect accuracy when there is a significant difference in the topology of the underlying graphs between the synchronizing and the non-synchronizing examples. However, in the problem setting where these graph statistics cannot distinguish the two classes very well (e.g., when the graphs are generated from the same random graph model), we find that pairing a few iterations of the initial dynamics along with the graph statistics as the input to our classification algorithms can lead to significant improvement in accuracy; far exceeding what is known by the classical oscillator theory. More surprisingly, we find that in almost all such settings, dropping out the basic graph statistics and training our algorithms with only initial dynamics achieves nearly the same accuracy. We demonstrate our method on three models of continuous and discrete coupled oscillators -- the Kuramoto model, Firefly Cellular Automata, and Greenberg-Hastings model. Finally, we also propose an "ensemble prediction" algorithm that successfully scales our method to large graphs by training on dynamics observed from multiple random subgraphs.
Covy: An AI-powered Robot with a Compound Vision System for Detecting Breaches in Social Distancing
Saaybi, Serge, Majid, Amjad Yousef, Prasad, R Venkatesha, Koubaa, Anis, Verhoeven, Chris
This paper introduces a compound vision system that enables robots to localize people up to 15m away using a cheap camera. And, it proposes a robust navigation stack that combines Deep Reinforcement Learning (DRL) and a probabilistic localization method. To test the efficacy of these systems, we prototyped a low-cost mobile robot that we call Covy. Covy can be used for applications such as promoting social distancing during pandemics or estimating the density of a crowd. We evaluated Covy's performance through extensive sets of experiments both in simulated and realistic environments. Our results show that Covy's compound vision algorithm doubles the range of the used depth camera, and its hybrid navigation stack is more robust than a pure DRL-based one.
Transfer Learning Application of Self-supervised Learning in ARPES
Ekahana, Sandy Adhitia, Winata, Genta Indra, Soh, Y., Aeppli, Gabriel, Milan, Radovic, Shi, Ming
Equal contribution *To whom correspondence should be addressed. Abstract Recent development in angle-resolved photoemission spectroscopy (ARPES) technique involves spatially resolving samples while maintaining the high-resolution feature of momentum space. This development easily expands the data size and its complexity for data analysis, where one of it is to label similar dispersion cuts and map them spatially. In this work, we demonstrate that the recent development in representational learning (self-supervised learning) model combined with k-means clustering can help automate that part of data analysis and save precious time, albeit with low performance. Finally, we introduce a few-shot learning (k-nearest neighbour or kNN) in representational space where we selectively choose one (k=1) image reference for each known label and subsequently label the rest of the data with respect to the nearest reference image. This last approach demonstrates the strength of the selfsupervised learning to automate the image analysis in ARPES in particular and can be generalized into any science data analysis that heavily involves image data.
Link prediction with continuous-time classical and quantum walks
Goldsmith, Mark, Garcรญa-Pรฉrez, Guillermo, Malmi, Joonas, Rossi, Matteo A. C., Saarinen, Harto, Maniscalco, Sabrina
Protein-protein interaction (PPI) networks consist of the physical and/or functional interactions between the proteins of an organism. Since the biophysical and high-throughput methods used to form PPI networks are expensive, time-consuming, and often contain inaccuracies, the resulting networks are usually incomplete. In order to infer missing interactions in these networks, we propose a novel class of link prediction methods based on continuous-time classical and quantum random walks. In the case of quantum walks, we examine the usage of both the network adjacency and Laplacian matrices for controlling the walk dynamics. We define a score function based on the corresponding transition probabilities and perform tests on four real-world PPI datasets. Our results show that continuous-time classical random walks and quantum walks using the network adjacency matrix can successfully predict missing protein-protein interactions, with performance rivalling the state of the art.
20 Most Asked Interview Questions of Machine Learning - Analytics Vidhya
This article was published as a part of the Data Science Blogathon. Companies are trying to disrupt the technological and business market by introducing new and smart products and techniques in society by adopting new age-technologies like Artificial intelligence and Machine learning. Each organization is searching for well-talented and experienced people who can serve them on their demands. Today data scientists, data analysts, machine learning engineers, and computer vision engineers are more in-demand organizational roles. If you wish to apply and grab a job in the tech domain, it's crucial to know common machine learning interview questions that recruiters ask. The article covers some popular machine learning interview questions that will force you to think one step ahead of your knowledge, and you will like to encounter and achieve your dream job.
Understanding the use of ROC Curves(Artificial Intelligence)
Abstract: Likelihood ratio ordering has been identified as a reasonable assumption in the two-sample problem in many practical scenarios. With this assumption, statisticians have proposed various methods in the estimation of the distributions of subpopulations, which consequently benefit the downstream inferences, such as the ROC curve and the associated summary statistic estimation. In this paper, under the likelihood ratio ordering assumption, we first propose a Bernstein polynomial method to model the distributions of both samples; we then estimate the distributions by the maximum empirical likelihood principle. The ROC curve estimate and the associated summary statistics are obtained subsequently. We compare the performance of our method with existing methods by extensive simulation studies.
Graph-Embedded Subspace Support Vector Data Description
Sohrab, Fahad, Iosifidis, Alexandros, Gabbouj, Moncef, Raitoharju, Jenni
In this paper, we propose a novel subspace learning framework for one-class classification. The proposed framework presents the problem in the form of graph embedding. It includes the previously proposed subspace one-class techniques as its special cases and provides further insight on what these techniques actually optimize. The framework allows to incorporate other meaningful optimization goals via the graph preserving criterion and reveals a spectral solution and a spectral regression-based solution as alternatives to the previously used gradient-based technique. We combine the subspace learning framework iteratively with Support Vector Data Description applied in the subspace to formulate Graph-Embedded Subspace Support Vector Data Description. We experimentally analyzed the performance of newly proposed different variants. We demonstrate improved performance against the baselines and the recently proposed subspace learning methods for one-class classification.
Evaluation of group fairness measures in student performance prediction problems
Quy, Tai Le, Nguyen, Thi Huyen, Friege, Gunnar, Ntoutsi, Eirini
Predicting students' academic performance is one of the key tasks of educational data mining (EDM). Traditionally, the high forecasting quality of such models was deemed critical. More recently, the issues of fairness and discrimination w.r.t. protected attributes, such as gender or race, have gained attention. Although there are several fairness-aware learning approaches in EDM, a comparative evaluation of these measures is still missing. In this paper, we evaluate different group fairness measures for student performance prediction problems on various educational datasets and fairness-aware learning models. Our study shows that the choice of the fairness measure is important, likewise for the choice of the grade threshold.
A semi-supervised methodology for fishing activity detection using the geometry behind the trajectory of multiple vessels
Ferreira, Martha Dais, Spadon, Gabriel, Soares, Amilcar, Matwin, Stan
Automatic Identification System (AIS) messages are useful for tracking vessel activity across oceans worldwide using radio links and satellite transceivers. Such data plays a significant role in tracking vessel activity and mapping mobility patterns such as those found in fishing. Accordingly, this paper proposes a geometric-driven semi-supervised approach for fishing activity detection from AIS data. Through the proposed methodology we show how to explore the information included in the messages to extract features describing the geometry of the vessel route. To this end, we leverage the unsupervised nature of cluster analysis to label the trajectory geometry highlighting the changes in the vessel's moving pattern which tends to indicate fishing activity. The labels obtained by the proposed unsupervised approach are used to detect fishing activities, which we approach as a time-series classification task. In this context, we propose a solution using recurrent neural networks on AIS data streams with roughly 87% of the overall $F$-score on the whole trajectories of 50 different unseen fishing vessels. Such results are accompanied by a broad benchmark study assessing the performance of different Recurrent Neural Network (RNN) architectures. In conclusion, this work contributes by proposing a thorough process that includes data preparation, labeling, data modeling, and model validation. Therefore, we present a novel solution for mobility pattern detection that relies upon unfolding the trajectory in time and observing their inherent geometry.