AITopics

Generative adversarial networks (GANs) have shown promise in image generation and classification given limited supervision. Existing methods extend the unsupervised GAN framework to incorporate supervision heuristically. Specifically, a single discriminator plays two incompatible roles of identifying fake samples and predicting labels and it only estimates the data without considering the labels. The formulation intrinsically causes two problems: (1) the generator and the discriminator (i.e., the classifier) may not converge to the data distribution at the same time; and (2) the generator cannot control the semantics of the generated samples. In this paper, we present the triple generative adversarial network (Triple-GAN), which consists of three players---a generator, a classifier, and a discriminator. The generator and the classifier characterize the conditional distributions between images and labels, and the discriminator solely focuses on identifying fake image-label pairs. We design compatible objective functions to ensure that the distributions characterized by the generator and the classifier converge to the data distribution. We evaluate Triple-GAN in two challenging settings, namely, semi-supervised learning and the extreme low data regime. In both settings, Triple-GAN can achieve state-of-the-art classification results among deep generative models and generate meaningful samples in a specific class simultaneously.

arxiv preprint arxiv, dataset, learning, (14 more...)

1912.09784

Country:

Asia > China > Beijing > Beijing (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.68)

Industry:

Information Technology (0.48)
Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

An adaptive simulated annealing EM algorithm for inference on non-homogeneous hidden Markov models

Hubin, Aliaksandr

Non-homogeneous hidden Markov models (NHHMM) are a subclass of dependent mixture models used for semi-supervised learning, where both transition probabilities between the latent states and mean parameter of the probability distribution of the responses (for a given state) depend on the set of $p$ covariates. A priori we do not know which (and how) covariates influence the transition probabilities and the mean parameters. This induces a complex combinatorial optimization problem for model selection with $4^p$ potential configurations. To address the problem, in this article we propose an adaptive (A) simulated annealing (SA) expectation maximization (EM) algorithm (ASA-EM) for joint optimization of models and their parameters with respect to a criterion of interest.

algorithm, markov model, probability, (17 more...)

doi: 10.1145/3371425.3371641

1912.09733

Country:

Asia > China (0.05)
North America > United States > California > San Francisco County > San Francisco (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Norway > Eastern Norway > Oslo (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine (0.68)
Banking & Finance (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Robust Data Preprocessing for Machine-Learning-Based Disk Failure Prediction in Cloud Production Environments

Han, Shujie, Wu, Jun, Xu, Erci, He, Cheng, Lee, Patrick P. C., Qiang, Yi, Zheng, Qixing, Huang, Tao, Huang, Zixi, Li, Rui

To provide proactive fault tolerance for modern cloud data centers, extensive studies have proposed machine learning (ML) approaches to predict imminent disk failures for early remedy and evaluated their approaches directly on public datasets (e.g., Backblaze SMART logs). However, in real-world production environments, the data quality is imperfect (e.g., inaccurate labeling, missing data samples, and complex failure types), thereby degrading the prediction accuracy. We present RODMAN, a robust data preprocessing pipeline that refines data samples before feeding them into ML models. We start with a large-scale trace-driven study of over three million disks from Alibaba Cloud's data centers, and motivate the practical challenges in ML-based disk failure prediction. We then design RODMAN with three data preprocessing echniques, namely failure-type filtering, spline-based data filling, and automated pre-failure backtracking, that are applicable for general ML models. Evaluation on both the Alibaba and Backblaze datasets shows that RODMAN improves the prediction accuracy compared to without data preprocessing under various settings.

dataset, disk failure, failure type, (14 more...)

1912.09722

Country:

Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report > New Finding (0.93)

Industry: Information Technology > Services (0.69)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Cloud Computing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Measuring Compositional Generalization: A Comprehensive Method on Realistic Data

Keysers, Daniel, Schärli, Nathanael, Scales, Nathan, Buisman, Hylke, Furrer, Daniel, Kashubin, Sergii, Momchev, Nikola, Sinopalnikov, Danila, Stafiniak, Lukasz, Tihon, Tibor, Tsarkov, Dmitry, Wang, Xiao, van Zee, Marc, Bousquet, Olivier

State-of-the-art machine learning methods exhibit limited compositional generalization. At the same time, there is a lack of realistic benchmarks that comprehensively measure this ability, which makes it challenging to find and evaluate improvements. We introduce a novel method to systematically construct such benchmarks by maximizing compound divergence while guaranteeing a small atom divergence between train and test sets, and we quantitatively compare this method to other approaches for creating compositional generalization benchmarks. We present a large and realistic natural language question answering dataset that is constructed according to this method, and we use it to analyze the compositional generalization ability of three machine learning architectures. We find that they fail to generalize compositionally and that there is a surprisingly strong negative correlation between compound divergence and accuracy. We also demonstrate how our method can be used to create new compositionality benchmarks on top of the existing SCAN dataset, which confirms these findings.

compound divergence, conference paper, logical form, (12 more...)

1912.09713

Country:

Oceania > Australia (0.04)
North America > United States > Tennessee (0.04)
North America > United States > Massachusetts > Norfolk County > Brookline (0.04)
(6 more...)

Genre: Research Report > New Finding (0.45)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Yuan, Deming, Proutiere, Alexandre, Shi, Guodong

Distributed Online Optimization with Long-Term Constraints

We consider distributed online convex optimization problems, where the distributed system consists of various computing units connected through a time-varying communication graph. In each time step, each computing unit selects a constrained vector, experiences a loss equal to an arbitrary convex function evaluated at this vector, and may communicate to its neighbors in the graph. The objective is to minimize the system-wide loss accumulated over time. We propose a decentralized algorithm with regret and cumulative constraint violation in $\mathcal{O}(T^{\max\{c,1-c\} })$ and $\mathcal{O}(T^{1-c/2})$, respectively, for any $c\in (0,1)$, where $T$ is the time horizon. When the loss functions are strongly convex, we establish improved regret and constraint violation upper bounds in $\mathcal{O}(\log(T))$ and $\mathcal{O}(\sqrt{T\log(T)})$. These regret scalings match those obtained by state-of-the-art algorithms and fundamental limits in the corresponding centralized online optimization problem (for both convex and strongly convex loss functions). In the case of bandit feedback, the proposed algorithms achieve a regret and constraint violation in $\mathcal{O}(T^{\max\{c,1-c/3 \} })$ and $\mathcal{O}(T^{1-c/2})$ for any $c\in (0,1)$. We numerically illustrate the performance of our algorithms for the particular case of distributed online regularized linear regression problems.

algorithm, constraint violation, optimization, (15 more...)

1912.09705

Country:

Oceania > Australia > New South Wales > Sydney (0.14)
Asia > China > Jiangsu Province > Nanjing (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.86)

Granziol, Diego, Wan, Xingchen, Garipov, Timur, Vetrov, Dmitry, Roberts, Stephen

MLRG Deep Curvature

We present MLRG Deep Curvature suite, a PyTorch-based, open-source package for analysis and visualisation of neural network curvature and loss landscape. Despite of providing rich information into properties of neural network and useful for a various designed tasks, curvature information is still not made sufficient use for various reasons, and our method aims to bridge this gap. We present a primer, including its main practical desiderata and common misconceptions, of \textit{Lanczos algorithm}, the theoretical backbone of our package, and present a series of examples based on synthetic toy examples and realistic modern neural networks tested on CIFAR datasets, and show the superiority of our package against existing competing approaches for the similar purposes.

eigenvalue, matrix, vector, (13 more...)

1912.09656

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Cyanure: An Open-Source Toolbox for Empirical Risk Minimization for Python, C++, and soon more

Mairal, Julien

Cyanure is an open-source C software package with a Python interface. The goal of Cyanure is to provide state-of-the-art solvers for learning linear models, based on stochastic variance-reduced stochastic optimization with acceleration mechanisms. It provides a simple Python API, which is very close to that of scikit-learn, which should be extended to other languages such as R or Matlab in a near future. Cyanure is distributed under BSD-3-Clause license. Even though this is non-legally binding, the author kindly ask users to cite the present arXiv document in their publications, as well as the publication related to the algorithm they have chosen (see Section 4 for the related publications).

best relative duality gap, objective, relative duality gap, (14 more...)

1912.08165

Country: Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.05)

Genre: Research Report (0.54)

Technology:

Information Technology > Software (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.49)

Demko, Christophe, Bertet, Karell, Faucher, Cyril, Viaud, Jean-François, Kuznetsov, Sergeï

Next Priority Concept: A new and generic algorithm computing concepts from complex and heterogeneous data

arXiv.org Artificial IntelligenceDec-20-2019

In this article, we present a new data type agnostic algorithm calculating a concept lattice from heterogeneous and complex data. Our NextPriorityConcept algorithm is first introduced and proved in the binary case as an extension of Bordat's algorithm with the notion of strategies to select only some predecessors of each concept, avoiding the generation of unreasonably large lattices. The algorithm is then extended to any type of data in a generic way. It is inspired from pattern structure theory, where data are locally described by predicates independent of their types, allowing the management of heterogeneous data.

algorithm, predecessor, predicate, (16 more...)

arXiv.org Artificial Intelligence

1912.11038

Country:

Oceania > Australia > Victoria > Melbourne (0.04)
Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
Europe > France > Auvergne-Rhône-Alpes > Lyon > Lyon (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Databases (0.91)
Information Technology > Artificial Intelligence > Machine Learning (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.46)

Lourenço, Inês, Wahlberg, Bo, Ventura, Rodrigo

Teaching robots to perceive time -- A reinforcement learning approach (Extended version)

arXiv.org Artificial IntelligenceDec-20-2019

Time perception is the phenomenological experience of time by an individual. In this paper, we study how to replicate neural mechanisms involved in time perception, allowing robots to take a step towards temporal cognition. Our framework follows a twofold biologically inspired approach. The first step consists of estimating the passage of time from sensor measurements, since environmental stimuli influence the perception of time. Sensor data is modeled as Gaussian processes that represent the second-order statistics of the natural environment. The estimated elapsed time between two events is computed from the maximum likelihood estimate of the joint distribution of the data collected between them. Moreover, exactly how time is encoded in the brain remains unknown, but there is strong evidence of the involvement of dopaminergic neurons in timing mechanisms. Since their phasic activity has a similar behavior to the reward prediction error of temporal-difference learning models, the latter are used to replicate this behavior. The second step of this approach consists therefore of applying the agent's estimate of the elapsed time in a reinforcement learning problem, where a feature representation called Microstimuli is used. We validate our framework by applying it to an experiment that was originally conducted with mice, and conclude that a robot using this framework is able to reproduce the timing mechanisms of the animal's brain.

agent, mechanism, robot, (16 more...)

arXiv.org Artificial Intelligence

1912.10113

Country:

Europe > Sweden > Stockholm > Stockholm (0.04)
North America > United States > New Jersey (0.04)
North America > United States > California > Santa Clara County > Stanford (0.04)
(3 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)

Butz, Cory J., Oliveira, Jhonatan S., Peharz, Robert

Sum-Product Network Decompilation

arXiv.org Artificial IntelligenceDec-20-2019

There exists a dichotomy between classical probabilistic graphical models, such as Bayesian networks (BNs), and modern tractable models, such as sum-product networks (SPNs). The former have generally intractable inference, but allow a high level of interpretability, while the latter admits a wide range of tractable inference routines, but are typically harder to interpret. Due to this dichotomy, tools to convert between BNs and SPNs are desirable. While one direction -- compiling BNs into SPNs -- is well discussed in Darwiche's seminal work on arithmetic circuit compilation, the converse direction -- decompiling SPNs into BNs -- has received surprisingly little attention. In this paper, we fill this gap by proposing SPN2BN, an algorithm that decompiles an SPN into a BN. SPN2BN has several salient features when compared to the only other two works decompiling SPNs. Most significantly, the BNs returned by SPN2BN are minimal independence-maps. Secondly, SPN2BN is more parsimonious with respect to the introduction of latent variables. Thirdly, the output BN produced by SPN2BN can be precisely characterized with respect to the compiled BN. More specifically, a certain set of directed edges will be added to the input BN, giving what we will call the moral-closure. It immediately follows that there is a set of BNs related to the input BN that will also return the same moral closure. Lastly, it is established that our compilation-decompilation process is idempotent. We confirm our results with systematic experiments on a number of synthetic BNs.

bn2spn2bn, spn, sum node, (13 more...)

arXiv.org Artificial Intelligence

1912.10092

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)
(3 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.88)