Goto

Collaborating Authors

 Performance Analysis


Testing Conditional Independence on Discrete Data using Stochastic Complexity

arXiv.org Machine Learning

Testing for conditional independence is a core aspect of constraint-based causal discovery. Although commonly used tests are perfect in theory, they often fail to reject independence in practice, especially when conditioning on multiple variables. We focus on discrete data and propose a new test based on the notion of algorithmic independence that we instantiate using stochastic complexity. Amongst others, we show that our proposed test, SCI, is an asymptotically unbiased as well as $L_2$ consistent estimator for conditional mutual information (CMI). Further, we show that SCI can be reformulated to find a sensible threshold for CMI that works well on limited samples. Empirical evaluation shows that SCI has a lower type II error than commonly used tests. As a result, we obtain a higher recall when we use SCI in causal discovery algorithms, without compromising the precision.


Exploiting Reuse in Pipeline-Aware Hyperparameter Tuning

arXiv.org Machine Learning

Hyperparameter tuning of multistage pipelines introduces a significant computational burden. Motivated by the observation that work can be reused across pipelines if the intermediate computations are the same, we propose a pipeline-aware approach to hyperparameter tuning. Our approach optimizes both the design and execution of pipelines to maximize reuse. We design pipelines amenable for reuse by (i) introducing a novel hybrid hyperparameter tuning method called gridded random search, and (ii) reducing the average training time in pipelines by adapting early-stopping hyperparameter tuning approaches. We then realize the potential for reuse during execution by introducing a novel caching problem for ML workloads which we pose as a mixed integer linear program (ILP), and subsequently evaluating various caching heuristics relative to the optimal solution of the ILP. We conduct experiments on simulated and real-world machine learning pipelines to show that a pipeline-aware approach to hyperparameter tuning can offer over an order-of-magnitude speedup over independently evaluating pipeline configurations. Modern machine learning workflows combine multiple stages of data-preprocessing, feature extraction, and supervised and unsupervised learning (Sรกnchez et al., 2013; The methods in each of these stages typically have configuration parameters, or hyperparameters, that influence their output and ultimately predictive accuracy.


Detection of LDDoS Attacks Based on TCP Connection Parameters

arXiv.org Machine Learning

Low-rate application layer distributed denial of service (LDDoS) attacks are both powerful and stealthy. They force vulnerable webservers to open all available connections to the adversary, denying resources to real users. Mitigation advice focuses on solutions that potentially degrade quality of service for legitimate connections. Furthermore, without accurate detection mechanisms, distributed attacks can bypass these defences. A methodology for detection of LDDoS attacks, based on characteristics of malicious TCP flows, is proposed within this paper. Research will be conducted using combinations of two datasets: one generated from a simulated network, the other from the publically available CIC DoS dataset. Both contain the attacks slowread, slowheaders and slowbody, alongside legitimate web browsing. TCP flow features are extracted from all connections. Experimentation was carried out using six supervised AI algorithms to categorise attack from legitimate flows. Decision trees and k-NN accurately classified up to 99.99% of flows, with exceptionally low false positive and false negative rates, demonstrating the potential of AI in LDDoS detection.


An Exponential Efron-Stein Inequality for Lq Stable Learning Rules

arXiv.org Machine Learning

There is accumulating evidence in the literature that stability of learning algorithms is a key characteristic that permits a learning algorithm to generalize. Despite various insightful results in this direction, there seems to be an overlooked dichotomy in the type of stability-based generalization bounds we have in the literature. On one hand, the literature seems to suggest that exponential generalization bounds for the estimated risk, which are optimal, can be only obtained through stringent, distribution independent and computationally intractable notions of stability such as uniform stability. On the other hand, it seems that weaker notions of stability such as hypothesis stability, although it is distribution dependent and more amenable to computation, can only yield polynomial generalization bounds for the estimated risk, which are suboptimal. In this paper, we address the gap between these two regimes of results. In particular, the main question we address here is whether it is possible to derive exponential generalization bounds for the estimated risk using a notion of stability that is computationally tractable and distribution dependent, but weaker than uniform stability. Using recent advances in concentration inequalities, and using a notion of stability that is weaker than uniform stability but distribution dependent and amenable to computation, we derive an exponential tail bound for the concentration of the estimated risk of a hypothesis returned by a general learning rule, where the estimated risk is expressed in terms of either the resubstitution estimate (empirical error), or the deleted (or, leave-one-out) estimate. As an illustration, we derive exponential tail bounds for ridge regression with unbounded responses -- a setting where uniform stability results of Bousquet and Elisseeff (2002) are not applicable.


Generalized Sparse Additive Models

arXiv.org Machine Learning

We present a unified framework for estimation and analysis of generalized additive models in high dimensions. The framework defines a large class of penalized regression estimators, encompassing many existing methods. An efficient computational algorithm for this class is presented that easily scales to thousands of observations and features. We prove minimax optimal convergence bounds for this class under a weak compatibility condition. In addition, we characterize the rate of convergence when this compatibility condition is not met. Finally, we also show that the optimal penalty parameters for structure and sparsity penalties in our framework are linked, allowing cross-validation to be conducted over only a single tuning parameter. We complement our theoretical results with empirical studies comparing some existing methods within this framework.


Detecting drug-drug interactions using artificial neural networks and classic graph similarity measures

arXiv.org Machine Learning

Drug-drug interactions are preventable causes of medical injuries and often result in doctor and emergency room visits. Computational techniques can be used to predict potential drug-drug interactions. We approach the drug-drug interaction prediction problem as a link prediction problem and present two novel methods for drug-drug interaction prediction based on artificial neural networks and factor propagation over graph nodes: adjacency matrix factorization (AMF) and adjacency matrix factorization with propagation (AMFP). We conduct a retrospective analysis by training our models on a previous release of the DrugBank database with 1,141 drugs and 45,296 drug-drug interactions and evaluate the results on a later version of DrugBank with 1,440 drugs and 248,146 drug-drug interactions. Additionally, we perform a holdout analysis using DrugBank. We report an area under the receiver operating characteristic curve score of 0.807 and 0.990 for the retrospective and holdout analyses respectively. Finally, we create an ensemble-based classifier using AMF, AMFP, and existing link prediction methods and obtain an area under the receiver operating characteristic curve of 0.814 and 0.991 for the retrospective and the holdout analyses. We demonstrate that AMF and AMFP provide state of the art results compared to existing methods and that the ensemble-based classifier improves the performance by combining various predictors. These results suggest that AMF, AMFP, and the proposed ensemble-based classifier can provide important information during drug development and regarding drug prescription given only partial or noisy data. These methods can also be used to solve other link prediction problems. Drug embeddings (compressed representations) created when training our models using the interaction network have been made public.


Nuanced Metrics for Measuring Unintended Bias with Real Data for Text Classification

arXiv.org Machine Learning

Machine learning systems, if not constrained, will compounding existing challenges to fairness in society often learn the simplest associations that can predict the labels, so at large. In this paper, we introduce a suite of threshold-agnostic any incorrect associations present in the training data can produce metrics that provide a nuanced view of this unintended bias, by unintended associations in the final model. Toxicity models specifically considering the various ways that a classifier's score distribution have been shown to capture and reproduce biases common can vary across designated groups. We also introduce a large new in society, for example mis-associating the names of frequently test set of online comments with crowd-sourced annotations for attacked identity groups (such as "gay", and "muslim" etc.) with identity references. We use this to show how our metrics can be toxicity [5, 17]. This unintended model bias could be due to the used to find new and potentially subtle unintended bias in existing demographic composition of the online user pool, the latent or public models.


Pragmatic classification of movement primitives for stroke rehabilitation

arXiv.org Machine Learning

Rehabilitation training is the primary intervention to improve motor recovery after stroke, but a tool to measure functional training does not currently exist. To bridge this gap, we previously developed an approach to classify functional movement primitives using wearable sensors and a machine learning (ML) algorithm. We found that this approach had encouraging classification performance but had computational and practical limitations, such as training time, sensor cost, and magnetic drift. Here, we sought to refine this approach and determine the algorithm, sensor configurations, and data requirements needed to maximize computational and practical performance. Motion data had been previously collected from 6 stroke patients wearing 11 inertial measurement units (IMUs) as they moved objects on a target array. To identify optimal ML performance, we evaluated 4 algorithms that are commonly used in activity recognition (linear discriminant analysis (LDA), na\"ive Bayes, support vector machine, and k-nearest neighbors). We compared their classification accuracy, computational complexity, and tuning requirements. To identify optimal sensor configuration, we progressively sampled fewer sensors and compared classification accuracy. To identify optimal data requirements, we compared accuracy using data from IMUs versus accelerometers. We found that LDA had the highest classification accuracy (92%) of the algorithms tested. It also was the most pragmatic, with low training and testing times and modest tuning requirements. We found that 7 sensors on the paretic arm and back resulted in the best accuracy. Using this array, accelerometers had a lower accuracy (84%). We refined strategies to accurately and pragmatically quantify functional movement primitives in stroke patients. We propose that this optimized ML-sensor approach could be a means to quantify training dose after stroke.


Variational Inference of Joint Models using Multivariate Gaussian Convolution Processes

arXiv.org Machine Learning

We present a non-parametric prognostic framework for individualized event prediction based on joint modeling of both longitudinal and time-to-event data. Our approach exploits a multivariate Gaussian convolution process (MGCP) to model the evolution of longitudinal signals and a Cox model to map time-to-event data with longitudinal data modeled through the MGCP. Taking advantage of the unique structure imposed by convolved processes, we provide a variational inference framework to simultaneously estimate parameters in the joint MGCP-Cox model. This significantly reduces computational complexity and safeguards against model overfitting. Experiments on synthetic and real world data show that the proposed framework outperforms state-of-the art approaches built on two-stage inference and strong parametric assumptions.


Cause Identification of Electromagnetic Transient Events using Spatiotemporal Feature Learning

arXiv.org Machine Learning

This paper presents a spatiotemporal unsupervised feature learning method for cause identification of electromagnetic transient events (EMTE) in power grids. The proposed method is formulated based on the availability of time-synchronized high-frequency measurement, and using the convolutional neural network (CNN) as the spatiotemporal feature representation along with softmax function. Despite the existing threshold-based, or energy-based events analysis methods, such as support vector machine (SVM), autoencoder, and tapered multi-layer perception (t-MLP) neural network, the proposed feature learning is carried out with respect to both time and space. The effectiveness of the proposed feature learning and the subsequent cause identification is validated through the EMTP simulation of different events such as line energization, capacitor bank energization, lightning, fault, and high-impedance fault in the IEEE 30-bus, and the real-time digital simulation (RTDS) of the WSCC 9-bus system.