Goto

Collaborating Authors

 Country


Generative ODE Modeling with Known Unknowns

arXiv.org Machine Learning

In several crucial applications, domain knowledge is encoded by a system of ordinary differential equations (ODE). A motivating example is intensive care unit patients: The dynamics of some vital physiological variables such as heart rate, blood pressure and arterial compliance can be approximately described by a known system of ODEs. Typically, some of the ODE variables are directly observed while some are unobserved, and in addition many other variables are observed but not modeled by the ODE, for example body temperature. Importantly, the unobserved ODE variables are ``known-unknowns'': We know they exist and their functional dynamics, but cannot measure them directly, nor do we know the function tying them to all observed measurements. Estimating these known-unknowns is often highly valuable to physicians. Under this scenario we wish to: (i) learn the static parameters of the ODE generating each observed time-series (ii) infer the dynamic sequence of all ODE variables including the known-unknowns, and (iii) extrapolate the future of the ODE variables and the observations of the time-series. We address this task with a variational autoencoder incorporating the known ODE function, called GOKU-net for Generative ODE modeling with Known Unknowns. We test our method on videos of pendulums with unknown length, and a model of the cardiovascular system.


Mirrored Autoencoders with Simplex Interpolation for Unsupervised Anomaly Detection

arXiv.org Machine Learning

Use of deep generative models for unsupervised anomaly detection has shown great promise partially owing to their ability to learn proper representations of complex input data distributions. Current methods, however, lack a strong latent representation of the data, thereby resulting in sub-optimal unsupervised anomaly detection results. In this work, we propose a novel representation learning technique using deep autoencoders to tackle the problem of unsupervised anomaly detection. Our approach replaces the $L_{p}$ reconstruction loss in the autoencoder optimization objective with a novel adversarial loss to enforce semantic-level reconstruction. In addition, we propose a novel simplex interpolation loss to improve the structure of the latent space representation in the autoencoder. Our technique improves the state-of-the-art unsupervised anomaly detection performance by a large margin on several image datasets including MNIST, fashion MNIST, CIFAR and Coil-100 as well as on several non-image datasets including KDD99, Arrhythmia and Thyroid. For example, On the CIFAR-10 dataset, using a standard leave-one-out evaluation protocol, our method achieves a substantial performance gain of 0.23 AUC points compared to the state-of-the-art.


Towards Safer Self-Driving Through Great PAIN (Physically Adversarial Intelligent Networks)

arXiv.org Machine Learning

Automated vehicles' neural networks suffer from overfit, poor generalizability, and untrained edge cases due to limited data availability. Researchers synthesize randomized edge-case scenarios to assist in the training process, though simulation introduces potential for overfit to latent rules and features. Automating worst-case scenario generation could yield informative data for improving self driving. To this end, we introduce a "Physically Adversarial Intelligent Network" (PAIN), wherein self-driving vehicles interact aggressively in the CARLA simulation environment. We train two agents, a protagonist and an adversary, using dueling double deep Q networks (DDDQNs) with prioritized experience replay. The coupled networks alternately seek-to-collide and to avoid collisions such that the "defensive" avoidance algorithm increases the mean-time-to-failure and distance traveled under non-hostile operating conditions. The trained protagonist becomes more resilient to environmental uncertainty and less prone to corner case failures resulting in collisions than the agent trained without an adversary.


Training a U-Net based on a random mode-coupling matrix model to recover acoustic interference striations

arXiv.org Machine Learning

A U-Net is trained to recover acoustic interference striations (AISs) from distorted ones. A random mode-coupling matrix model is introduced to generate a large number of training data quickly, which are used to train the U-Net. The performance of AIS recovery of the U-Net is tested in range-dependent waveguides with nonlinear internal waves (NLIWs). Although the random mode-coupling matrix model is not an accurate physical model, the test results show that the U-Net successfully recovers AISs under different signal-to-noise ratios (SNRs) and different amplitudes and widths of NLIWs for different shapes.


Unsupervised Domain Adaptation Through Transferring both the Source-Knowledge and Target-Relatedness Simultaneously

arXiv.org Machine Learning

Unsupervised domain adaptation (UDA) is an emerging research topic in the field of machine learning and pattern recognition, which aims to help the learning of unlabeled target domain by transferring knowledge from the source domain. To perform UDA, a variety of methods have been proposed, most of which concentrate on the scenario of single source and single target domain (1S1T). However, in real applications, usually single source domain with multiple target domains are involved (1SmT), which cannot be handled directly by those 1S1T models. Unfortunately, although a few related works on 1SmT UDA have been proposed, nearly none of them model the source domain knowledge and leverage the target-relatedness jointly. To overcome these shortcomings, we herein propose a more general 1SmT UDA model through transferring both the Source-Knowledge and Target-Relatedness, UDA-SKTR for short. In this way, not only the supervision knowledge from the source domain, but also the potential relatedness among the target domains are simultaneously modeled for exploitation in the process of 1SmT UDA. In addition, we construct an alternating optimization algorithm to solve the variables of the proposed model with convergence guarantee. Finally, through extensive experiments on both benchmark and real datasets, we validate the effectiveness and superiority of the proposed method.


Diagonal Preconditioning: Theory and Algorithms

arXiv.org Machine Learning

Diagonal preconditioning has been a staple technique in optimization and machine learning. It often reduces the condition number of the design or Hessian matrix it is applied to, thereby speeding up convergence. However, rigorous analyses of how well various diagonal preconditioning procedures improve the condition number of the preconditioned matrix and how that translates into improvements in optimization are rare. In this paper, we first provide an analysis of a popular diagonal preconditioning technique based on column standard deviation and its effect on the condition number using random matrix theory. Then we identify a class of design matrices whose condition numbers can be reduced significantly by this procedure. We then study the problem of optimal diagonal preconditioning to improve the condition number of any full-rank matrix and provide a bisection algorithm and a potential reduction algorithm with $O(\log(\frac{1}{\epsilon}))$ iteration complexity, where each iteration consists of an SDP feasibility problem and a Newton update using the Nesterov-Todd direction, respectively. Finally, we extend the optimal diagonal preconditioning algorithm to an adaptive setting and compare its empirical performance at reducing the condition number and speeding up convergence for regression and classification problems with that of another adaptive preconditioning technique, namely batch normalization, that is essential in training machine learning models.


Merge-split Markov chain Monte Carlo for community detection

arXiv.org Machine Learning

We present a Markov chain Monte Carlo scheme based on merges and splits of groups that is capable of efficiently sampling from the posterior distribution of network partitions, defined according to the stochastic block model (SBM). We demonstrate how schemes based on the move of single nodes between groups systematically fail at correctly sampling from the posterior distribution even on small networks, and how our merge-split approach behaves significantly better, and improves the mixing time of the Markov chain by several orders of magnitude in typical cases. We also show how the scheme can be straightforwardly extended to nested versions of the SBM, yielding asymptotically exact samples of hierarchical network partitions.


Planning with Brain-inspired AI

arXiv.org Artificial Intelligence

This article surveys engineering and neuroscientific models of planning as a cognitive function, which is regarded as a typical function of fluid intelligence in the discussion of general intelligence. It aims to present existing planning models as references for realizing the planning function in brain-inspired AI or artificial general intelligence (AGI). It also proposes themes for the research and development of brain-inspired AI from the viewpoint of tasks and architecture.


Scalable Deployment of AI Time-series Models for IoT

arXiv.org Artificial Intelligence

IBM Research Castor, a cloud-native system for managing and deploying large numbers of AI time-series models in IoT applications, is described. Modelling code templates, in Python and R, following a typical machine-learning workflow are supported. A knowledge-based approach to managing model and time-series data allows the use of general semantic concepts for expressing feature engineering tasks. Model templates can be programmatically deployed against specific instances of semantic concepts, thus supporting model reuse and automated replication as the IoT application grows. Deployed models are automatically executed in parallel leveraging a serverless cloud computing framework. The complete history of trained model versions and rolling-horizon predictions is persisted, thus enabling full model lineage and traceability. Results from deployments in real-world smart-grid live forecasting applications are reported. Scalability of executing up to tens of thousands of AI modelling tasks is also evaluated.


TRACER: A Framework for Facilitating Accurate and Interpretable Analytics for High Stakes Applications

arXiv.org Artificial Intelligence

In high stakes applications such as healthcare and finance analytics, the interpretability of predictive models is required and necessary for domain practitioners to trust the predictions. Traditional machine learning models, e.g., logistic regression (LR), are easy to interpret in nature. However, many of these models aggregate time-series data without considering the temporal correlations and variations. Therefore, their performance cannot match up to recurrent neural network (RNN) based models, which are nonetheless difficult to interpret. In this paper, we propose a general framework TRACER to facilitate accurate and interpretable predictions, with a novel model TITV devised for healthcare analytics and other high stakes applications such as financial investment and risk management. Different from LR and other existing RNN-based models, TITV is designed to capture both the time-invariant and the time-variant feature importance using a feature-wise transformation subnetwork and a self-attention subnetwork, for the feature influence shared over the entire time series and the time-related importance respectively. Healthcare analytics is adopted as a driving use case, and we note that the proposed TRACER is also applicable to other domains, e.g., fintech. We evaluate the accuracy of TRACER extensively in two real-world hospital datasets, and our doctors/clinicians further validate the interpretability of TRACER in both the patient level and the feature level. Besides, TRACER is also validated in a high stakes financial application and a critical temperature forecasting application. The experimental results confirm that TRACER facilitates both accurate and interpretable analytics for high stakes applications.