AITopics | Melnyk, Igor

Plotting

Melnyk, Igor

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Optimizing Mode Connectivity via Neuron Alignment

Tatro, N. Joseph, Chen, Pin-Yu, Das, Payel, Melnyk, Igor, Sattigeri, Prasanna, Lai, Rongjie

arXiv.org Machine LearningNov-2-2020

The loss landscapes of deep neural networks are not well understood due to their high nonconvexity. Empirically, the local minima of these loss functions can be connected by a learned curve in model space, along which the loss remains nearly constant; a feature known as mode connectivity. Yet, current curve finding algorithms do not consider the influence of symmetry in the loss surface created by model weight permutations. We propose a more general framework to investigate the effect of symmetry on landscape connectivity by accounting for the weight permutations of the networks being connected. To approximate the optimal permutation, we introduce an inexpensive heuristic referred to as neuron alignment. Neuron alignment promotes similarity between the distribution of intermediate activations of models along the curve. We provide theoretical analysis establishing the benefit of alignment to mode connectivity based on this simple heuristic. We empirically verify that the permutation given by alignment is locally optimal via a proximal alternating minimization scheme. Empirically, optimizing the weight permutation is critical for efficiently learning a simple, planar, low-loss curve between networks that successfully generalizes. Our alignment method can significantly alleviate the recently identified robust loss barrier on the path connecting two adversarial robust models and find more robust and accurate models on the path.

alignment, deep learning, neural network, (17 more...)

arXiv.org Machine Learning

2009.02439

Country: North America > Canada (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Wasserstein Barycenter Model Ensembling

Dognin, Pierre, Melnyk, Igor, Mroueh, Youssef, Ross, Jerret, Santos, Cicero Dos, Sercu, Tom

arXiv.org Machine LearningFeb-13-2019

In this paper we propose to perform model ensembling in a multiclass or a multilabel learning setting using Wasserstein (W.) barycenters. Optimal transport metrics, such as the Wasserstein distance, allow incorporating semantic side information such as word embeddings. Using W. barycenters to find the consensus between models allows us to balance confidence and semantics in finding the agreement between the models. We show applications of Wasserstein ensembling in attribute-based classification, multilabel learning and image captioning generation. These results show that the W. ensembling is a viable alternative to the basic geometric or arithmetic mean ensembling.

barycenter, deep learning, neural network, (22 more...)

arXiv.org Machine Learning

1902.04999

Country: North America > United States (0.14)

Genre: Research Report > New Finding (0.34)

Industry:

Transportation > Passenger (0.93)
Transportation > Ground > Road (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Estimating Information Flow in Neural Networks

Goldfeld, Ziv, Berg, Ewout van den, Greenewald, Kristjan, Melnyk, Igor, Nguyen, Nam, Kingsbury, Brian, Polyanskiy, Yury

arXiv.org Machine LearningNov-14-2018

We study the flow of information and the evolution of internal representations during deep neural network (DNN) training, aiming to demystify the compression aspect of the information bottleneck theory. The theory suggests that DNN training comprises a rapid fitting phase followed by a slower compression phase, in which the mutual information $I(X;T)$ between the input $X$ and internal representations $T$ decreases. Several papers observe compression of estimated mutual information on different DNN models, but the true $I(X;T)$ over these networks is provably either constant (discrete $X$) or infinite (continuous $X$). This work explains the discrepancy between theory and experiments, and clarifies what was actually measured by these past works. To this end, we introduce an auxiliary (noisy) DNN framework for which $I(X;T)$ is a meaningful quantity that depends on the network's parameters. This noisy framework is shown to be a good proxy for the original (deterministic) DNN both in terms of performance and the learned representations. We then develop a rigorous estimator for $I(X;T)$ in noisy DNNs and observe compression in various models. By relating $I(X;T)$ in the noisy DNN to an information-theoretic communication problem, we show that compression is driven by the progressive clustering of hidden representations of inputs from the same class. Several methods to directly monitor clustering of hidden representations, both in noisy and deterministic DNNs, are used to show that meaningful clusters form in the $T$ space. Finally, we return to the estimator of $I(X;T)$ employed in past works, and demonstrate that while it fails to capture the true (vacuous) mutual information, it does serve as a measure for clustering. This clarifies the past observations of compression and isolates the geometric clustering of hidden representations as the true phenomenon of interest.

deep learning, mutual information, neural network, (20 more...)

arXiv.org Machine Learning

1810.05728

Country:

North America > United States (0.14)
Asia > Middle East > Israel (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Improved Image Captioning with Adversarial Semantic Alignment

Melnyk, Igor, Sercu, Tom, Dognin, Pierre L., Ross, Jarret, Mroueh, Youssef

arXiv.org Machine LearningApr-30-2018

In this paper we propose a new conditional GAN for image captioning that enforces semantic alignment between images and captions through a co-attentive discriminator and a context-aware LSTM sequence generator. In order to train these sequence GANs, we empirically study two algorithms: Self-critical Sequence Training (SCST) and Gumbel Straight-Through. Both techniques are confirmed to be viable for training sequence GANs. However, SCST displays better gradient behavior despite not directly leveraging gradients from the discriminator. This ensures a stronger stability of sequence GANs training and ultimately produces models with improved results under human evaluation. Automatic evaluation of GAN trained captioning models is an open question. To remedy this, we introduce a new semantic score with strong correlation to human judgement. As a paradigm for evaluation, we suggest that the generalization ability of the captioner to Out of Context (OOC) scenes is an important criterion to assess generalization and composition. To this end, we propose an OOC dataset which, combined with our automatic metric of semantic score, is a new benchmark for the captioning community to measure the generalization ability of automatic image captioning. Under this new OOC benchmark, and on the traditional MSCOCO dataset, our models trained with SCST have strong performance in both semantic score and human evaluation.

deep learning, neural network, semantic score, (21 more...)

arXiv.org Machine Learning

1805.00063

Genre: Research Report (0.40)

Industry:

Transportation (0.95)
Leisure & Entertainment > Sports > Tennis (0.48)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

Deep learning algorithm for data-driven simulation of noisy dynamical system

Yeo, Kyongmin, Melnyk, Igor

arXiv.org Machine LearningFeb-22-2018

We present a deep learning model, DE-LSTM, for the simulation of a stochastic process with underlying nonlinear dynamics. The deep learning model aims to approximate the probability density function of a stochastic process via numerical discretization and the underlying nonlinear dynamics is modeled by the Long Short-Term Memory (LSTM) network. After the numerical discretization by a softmax function, the function estimation problem is solved by a multi-label classification problem. A penalized maximum log likelihood method is proposed to impose smoothness in the predicted probability distribution. It is shown that LSTM is a state space model, where the internal dynamics consists of a system of relaxation processes. A sequential Monte Carlo method is outlined to compute the time evolution of the probability distribution. The behavior of DE-LSTM is investigated by using the Ornstein-Uhlenbeck process and noisy observations of Mackey-Glass equation and forced Van der Pol oscillators. While the probability distribution computed by the conventional maximum log likelihood method makes a good prediction of the first and second moments, the Kullback-Leibler divergence shows that the penalized maximum log likelihood method results in a probability distribution closer to the ground truth. It is shown that DE-LSTM makes a good prediction of the probability distribution without assuming any distributional properties of the noise. For a multiple-step forecast, it is found that the prediction uncertainty, denoted by the 95% confidence interval, does not grow monotonically in time. For a chaotic system, Mackey-Glass time series, the 95% confidence interval first grows, then exhibits an oscillatory behavior, instead of growing indefinitely, while for the forced Van der Pol oscillator, the prediction uncertainty does not grow in time even for 3,000-step forecast.

deep learning, neural network, probability distribution, (20 more...)

arXiv.org Machine Learning

1802.08323

Country:

North America > United States (0.14)
Europe > United Kingdom > England (0.14)

Genre: Research Report > New Finding (0.87)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

R2N2: Residual Recurrent Neural Networks for Multivariate Time Series Forecasting

Goel, Hardik, Melnyk, Igor, Banerjee, Arindam

arXiv.org Machine LearningSep-10-2017

Multivariate time-series modeling and forecasting is an important problem with numerous applications. Traditional approaches such as VAR (vector auto-regressive) models and more recent approaches such as RNNs (recurrent neural networks) are indispensable tools in modeling time-series data. In many multivariate time series modeling problems, there is usually a significant linear dependency component, for which VARs are suitable, and a nonlinear component, for which RNNs are suitable. Modeling such times series with only VAR or only RNNs can lead to poor predictive performance or complex models with large training times. In this work, we propose a hybrid model called R2N2 (Residual RNN), which first models the time series with a simple linear model (like VAR) and then models its residual errors using RNNs. R2N2s can be trained using existing algorithms for VARs and RNNs. Through an extensive empirical evaluation on two real world datasets (aviation and climate domains), we show that R2N2 is competitive, usually better than VAR or RNN, used alone. We also show that R2N2 is faster to train as compared to an RNN, while requiring less number of hidden units.

air transportation, deep learning, prediction, (20 more...)

arXiv.org Machine Learning

1709.03159

Country: North America > United States (0.46)

Genre: Research Report (0.64)

Industry:

Transportation > Air (0.70)
Health & Medicine (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

SenGen: Sentence Generating Neural Variational Topic Model

Nallapati, Ramesh, Melnyk, Igor, Kumar, Abhishek, Zhou, Bowen

arXiv.org Machine LearningAug-1-2017

We present a new topic model that generates documents by sampling a topic for one whole sentence at a time, and generating the words in the sentence using an RNN decoder that is conditioned on the topic of the sentence. We argue that this novel formalism will help us not only visualize and model the topical discourse structure in a document better, but also potentially lead to more interpretable topics since we can now illustrate topics by sampling representative sentences instead of bag of words or phrases. We present a variational auto-encoder approach for learning in which we use a factorized variational encoder that independently models the posterior over topical mixture vectors of documents using a feed-forward network, and the posterior over topic assignments to sentences using an RNN. Our preliminary experiments on two different datasets indicate early promise, but also expose many challenges that remain to be addressed.

artificial intelligence, neural network, posterior, (16 more...)

arXiv.org Machine Learning

1708.00308

Country: North America > United States (0.28)

Genre: Research Report (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

A Spectral Algorithm for Inference in Hidden Semi-Markov Models

Melnyk, Igor, Banerjee, Arindam

arXiv.org Machine LearningFeb-28-2016

Hidden semi-Markov models (HSMMs) are latent variable models which allow latent state persistence and can be viewed as a generalization of the popular hidden Markov models (HMMs). In this paper, we introduce a novel spectral algorithm to perform inference in HSMMs. Unlike expectation maximization (EM), our approach correctly estimates the probability of given observation sequence based on a set of training sequences. Our approach is based on estimating moments from the sample, whose number of dimensions depends only logarithmically on the maximum length of the hidden state persistence. Moreover, the algorithm requires only a few matrix inversions and is therefore computationally efficient. Empirical evaluations on synthetic and real data demonstrate the advantage of the algorithm over EM in terms of speed and accuracy, especially for large datasets.

air transportation, algorithm, artificial intelligence, (18 more...)

arXiv.org Machine Learning

1407.3422

Country: North America > United States (0.67)

Genre: Research Report (0.81)

Industry:

Transportation > Air (0.46)
Government > Regional Government > North America Government > United States Government (0.46)
Aerospace & Defense (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Estimating Structured Vector Autoregressive Model

Melnyk, Igor, Banerjee, Arindam

arXiv.org Machine LearningFeb-28-2016

While considerable advances have been made in estimating high-dimensional structured models from independent data using Lasso-type models, limited progress has been made for settings when the samples are dependent. We consider estimating structured VAR (vector auto-regressive models), where the structure can be captured by any suitable norm, e.g., Lasso, group Lasso, order weighted Lasso, sparse group Lasso, etc. In VAR setting with correlated noise, although there is strong dependence over time and covariates, we establish bounds on the non-asymptotic estimation error of structured VAR parameters. Surprisingly, the estimation error is of the same order as that of the corresponding Lasso-type estimator with independent samples, and the analysis holds for any norm. Our analysis relies on results in generic chaining, sub-exponential martingales, and spectral representation of VAR models. Experimental results on synthetic data with a variety of structures as well as real aviation data are presented, validating theoretical results.

artificial intelligence, cone, health & medicine, (19 more...)

arXiv.org Machine Learning

1602.06606

Country: North America > United States (0.28)

Genre: Research Report (0.81)

Industry: Health & Medicine > Therapeutic Area (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Data Science > Data Mining (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Semi-Markov Switching Vector Autoregressive Model-based Anomaly Detection in Aviation Systems

Melnyk, Igor, Banerjee, Arindam, Matthews, Bryan, Oza, Nikunj

arXiv.org Machine LearningFeb-28-2016

In this work we consider the problem of anomaly detection in heterogeneous, multivariate, variable-length time series datasets. Our focus is on the aviation safety domain, where data objects are flights and time series are sensor readings and pilot switches. In this context the goal is to detect anomalous flight segments, due to mechanical, environmental, or human factors in order to identifying operationally significant events and provide insights into the flight operations and highlight otherwise unavailable potential safety risks and precursors to accidents. For this purpose, we propose a framework which represents each flight using a semi-Markov switching vector autoregressive (SMS-VAR) model. Detection of anomalies is then based on measuring dissimilarities between the model's prediction and data observation. The framework is scalable, due to the inherent parallel nature of most computations, and can be used to perform online anomaly detection. Extensive experimental results on simulated and real datasets illustrate that the framework can detect various types of anomalies along with the key parameters involved.

air transportation, flight, us government, (21 more...)

arXiv.org Machine Learning

1602.0655

Country: North America > United States (0.69)

Genre: Research Report (1.00)

Industry:

Transportation > Air (1.00)
Government > Regional Government > North America Government > United States Government (0.47)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback