AITopics

The Statistical Learning Theory (SLT) provides the theoretical background to ensure that a supervised algorithm generalizes the mapping $f: \mathcal{X} \to \mathcal{Y}$ given $f$ is selected from its search space bias $\mathcal{F}$. This formal result depends on the Shattering coefficient function $\mathcal{N}(\mathcal{F},2n)$ to upper bound the empirical risk minimization principle, from which one can estimate the necessary training sample size to ensure the probabilistic learning convergence and, most importantly, the characterization of the capacity of $\mathcal{F}$, including its under and overfitting abilities while addressing specific target problems. In this context, we propose a new approach to estimate the maximal number of hyperplanes required to shatter a given sample, i.e., to separate every pair of points from one another, based on the recent contributions by Har-Peled and Jones in the dataset partitioning scenario, and use such foundation to analytically compute the Shattering coefficient function for both binary and multi-class problems. As main contributions, one can use our approach to study the complexity of the search space bias $\mathcal{F}$, estimate training sample sizes, and parametrize the number of hyperplanes a learning algorithm needs to address some supervised task, what is specially appealing to deep neural networks. Experiments were performed to illustrate the advantages of our approach while studying the search space $\mathcal{F}$ on synthetic and one toy datasets and on two widely-used deep learning benchmarks (MNIST and CIFAR-10). In order to permit reproducibility and the use of our approach, our source code is made available at~\url{https://bitbucket.org/rodrigo_mello/shattering-rcode}.

hyperplane, sample size, scenario, (14 more...)

1911.05461

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Austria > Vienna (0.14)
South America > Brazil (0.04)
(3 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

Zaïem, Mohamed Salah, Bennequin, Etienne

Learning to Communicate in Multi-Agent Reinforcement Learning : A Review

We consider the issue of multiple agents learning to communicate through reinforcement learning within partially observable environments, with a focus on information asymmetry in the second part of our work. We provide a review of the recent algorithms developed to improve the agents' policy by allowing the sharing of information between agents and the learning of communication strategies, with a focus on Deep Recurrent Q-Network-based models. We also describe recent efforts to interpret the languages generated by these agents and study their properties in an attempt to generate human-language-like sentences. We discuss the metrics used to evaluate the generated communication strategies and propose a novel entropy-based evaluation metric. Finally, we address the issue of the cost of communication and introduce the idea of an experimental setup to expose this cost in cooperative-competitive game.

agent, algorithm, communication, (14 more...)

1911.05438

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > District of Columbia > Washington (0.04)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Banville, Hubert, Albuquerque, Isabela, Hyvärinen, Aapo, Moffat, Graeme, Engemann, Denis-Alexander, Gramfort, Alexandre

Self-supervised representation learning from electroencephalography signals

The supervised learning paradigm is limited by the cost - and sometimes the impracticality - of data collection and labeling in multiple domains. Self-supervised learning, a paradigm which exploits the structure of unlabeled data to create learning problems that can be solved with standard supervised approaches, has shown great promise as a pretraining or feature learning approach in fields like computer vision and time series processing. In this work, we present self-supervision strategies that can be used to learn informative representations from multivariate time series. One successful approach relies on predicting whether time windows are sampled from the same temporal context or not. As demonstrated on a clinically relevant task (sleep scoring) and with two electroencephalography datasets, our approach outperforms a purely supervised approach in low data regimes, while capturing important physiological information without any access to labels.

dataset, representation, ssl task, (14 more...)

1911.05419

Country:

North America > Canada > Quebec > Montreal (0.04)
North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
North America > Canada > Ontario > Toronto (0.04)
(3 more...)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.61)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.88)

Vignac, Clément, Ortiz-Jiménez, Guillermo, Frossard, Pascal

On the choice of graph neural network architectures

Seminal works on graph neural networks have primarily targeted semi-supervised node classification problems with few observed labels and high-dimensional signals. With the development of graph networks, this setup has become a de facto benchmark for a significant body of research. Interestingly, several works have recently shown that graph neural networks do not perform much better than predefined low-pass filters followed by a linear classifier in these particular settings. However, when learning with little data in a high-dimensional space, it is not surprising that simple and heavily regularized learning methods are near-optimal. In this paper, we show empirically that in settings with fewer features and more training data, more complex graph networks significantly outperform simpler architectures, and propose a few insights towards to the proper choice of graph neural networks architectures. We finally outline the importance of using sufficiently diverse benchmarks (including lower dimensional signals as well) when designing and studying new types of graph neural networks.

architecture, graph neural network, neural network, (14 more...)

1911.05384

Country:

Europe > Switzerland > Vaud > Lausanne (0.05)
North America > United States > California > Los Angeles County > Long Beach (0.04)

Genre: Research Report (0.65)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Yashima, Shingo, Nitanda, Atsushi, Suzuki, Taiji

Exponential Convergence Rates of Classification Errors on Learning with SGD and Random Features

Although kernel methods are widely used in many learning problems, they have poor scalability to large datasets. To address this problem, sketching and stochastic gradient methods are the most commonly used techniques to derive efficient large-scale learning algorithms. In this study, we consider solving a binary classification problem using random features and stochastic gradient descent. In recent research, an exponential convergence rate of the expected classification error under the strong low-noise condition has been shown. We extend these analyses to a random features setting, analyzing the error induced by the approximation of random features in terms of the distance between the generated hypothesis including population risk minimizers and empirical risk minimizers when using general Lipschitz loss functions, to show that an exponential convergence of the expected classification error is achieved even if random features approximation is applied. Additionally, we demonstrate that the convergence rate does not depend on the number of features and there is a significant computational benefit in using random features in classification problems because of the strong low-noise condition.

approximation, classification error, random feature, (13 more...)

1911.0535

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.05)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.75)

Bacry, Emmanuel, Gaïffas, Stéphane, Kabeshova, Anastasiia, Yu, Yiyang

ZiMM: a deep learning model for long term adverse events with non-clinical claims data

This paper considers the problem of modeling long-term adverse events following prostatic surgery performed on patients with urination problems, using the French national health insurance database (SNIIRAM), which is a non-clinical claims database built around healthcare reimbursements of more than 65 million people. This makes the problem particularly challenging compared to what could be done using clinical hospital data, albeit a much smaller sample, while we exploit here the claims of almost all French citizens diagnosed with prostatic problems (with between 1.5 and 5 years of history). We introduce a new model, called ZiMM (Zero-inflated Mixture of Multinomial distributions) to capture such long-term adverse events, and we build a deep-learning architecture on top of it to deal with the complex, highly heterogeneous and sparse patterns observable in such a large claims database. This architecture combines several ingredients: embedding layers for drugs, medical procedures, and diagnosis codes; embeddings aggregation through a self-attention mechanism; recurrent layers to encode the health pathways of patients before their surgery and a final decoder layer which outputs the ZiMM's parameters.

architecture, database, surgery, (14 more...)

1911.05346

Country:

Europe > France > Île-de-France > Paris > Paris (0.05)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.46)
Health & Medicine > Health Care Technology > Medical Record (0.31)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Convergence to minima for the continuous version of Backtracking Gradient Descent

Truong, Tuyen Trung

The main result of this paper is: {\bf Theorem.} Let $f:\mathbb{R}^k\rightarrow \mathbb{R}$ be a $C^{1}$ function, so that $\nabla f$ is locally Lipschitz continuous. Assume moreover that $f$ is $C^2$ near its generalised saddle points. Fix real numbers $\delta_0>0$ and $0<\alpha <1$. Then there is a smooth function $h:\mathbb{R}^k\rightarrow (0,\delta_0]$ so that the map $H:\mathbb{R}^k\rightarrow \mathbb{R}^k$ defined by $H(x)=x-h(x)\nabla f(x)$ has the following property: (i) For all $x\in \mathbb{R}^k$, we have $f(H(x)))-f(x)\leq -\alpha h(x)||\nabla f(x)||^2$. (ii) For every $x_0\in \mathbb{R}^k$, the sequence $x_{n+1}=H(x_n)$ either satisfies $\lim_{n\rightarrow\infty}||x_{n+1}-x_n||=0$ or $ \lim_{n\rightarrow\infty}||x_n||=\infty$. Each cluster point of $\{x_n\}$ is a critical point of $f$. If moreover $f$ has at most countably many critical points, then $\{x_n\}$ either converges to a critical point of $f$ or $\lim_{n\rightarrow\infty}||x_n||=\infty$. (iii) There is a set $\mathcal{E}_1\subset \mathbb{R}^k$ of Lebesgue measure $0$ so that for all $x_0\in \mathbb{R}^k\backslash \mathcal{E}_1$, the sequence $x_{n+1}=H(x_n)$, {\bf if converges}, cannot converge to a {\bf generalised} saddle point. (iv) There is a set $\mathcal{E}_2\subset \mathbb{R}^k$ of Lebesgue measure $0$ so that for all $x_0\in \mathbb{R}^k\backslash \mathcal{E}_2$, any cluster point of the sequence $x_{n+1}=H(x_n)$ is not a saddle point, and more generally cannot be an isolated generalised saddle point. Some other results are proven.

critical point, generalised saddle point, saddle point, (15 more...)

1911.04221

Country:

Europe > Norway > Eastern Norway > Oslo (0.04)
North America > United States > New York (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
(4 more...)

Genre:

Research Report (0.50)
Overview (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.41)

Aiken, Emily L., Nguyen, Andre T., Santillana, Mauricio

Towards the Use of Neural Networks for Influenza Prediction at Multiple Spatial Resolutions

We introduce the use of a Gated Recurrent Unit (GRU) for influenza prediction at the state- and city-level in the US, and experiment with the inclusion of real-time flu-related Internet search data. We find that a GRU has lower prediction error than current state-of-the-art methods for data-driven influenza prediction at time horizons of over two weeks. In contrast with other machine learning approaches, the inclusion of real-time Internet search data does not improve GRU predictions.

gt data, neural network, prediction, (14 more...)

1911.02673

Country:

North America > United States > Pennsylvania (0.05)
Oceania > New Zealand (0.04)
Oceania > Australia (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Semi-Supervised Natural Language Approach for Fine-Grained Classification of Medical Reports

Deshmukh, Neil, Gumustop, Selin, Gauriau, Romane, Buch, Varun, Wright, Bradley, Bridge, Christopher, Naidu, Ram, Andriole, Katherine, Bizzo, Bernardo

Although machine learning has become a powerful tool to augment doctors in clinical analysis, the immense amount of labeled data that is necessary to train supervised learning approaches burdens each development task as time and resource intensive. The vast majority of dense clinical information is stored in written reports, detailing pertinent patient information. The challenge with utilizing natural language data for standard model development is due to the complex nature of the modality. In this research, a model pipeline was developed to utilize an unsupervised approach to train an encoder-language model, a recurrent network, to generate document encodings; which then can be used as features passed into a decoder-classifier model that requires magnitudes less labeled data than previous approaches to differentiate between fine-grained disease classes accurately. The language model was trained on unlabeled radiology reports from the Massachusetts General Hospital Radiology Department (n=218,159) and terminated with a loss of 1.62. The classification models were trained on three labeled datasets of head CT studies of reported patients, presenting large vessel occlusion (n=1403), acute ischemic strokes (n=331), and intracranial hemorrhage (n=4350), to identify a variety of different findings directly from the radiology report data; resulting in AUCs of 0.98, 0.95, and 0.99, respectively, for the large vessel occlusion, acute ischemic stroke, and intracranial hemorrhage datasets. The output encodings are able to be used in conjunction with imaging data, to create models that can process a multitude of different modalities. The ability to automatically extract relevant features from textual data allows for faster model development and integration of textual modality, overall, allowing clinical reports to become a more viable input for more encompassing and accurate deep learning models.

classification, dataset, information, (13 more...)

1910.13573

Country: North America > United States > Massachusetts > Suffolk County > Boston (0.04)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Therapeutic Area > Hematology (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

McDaid, Edward, McDaid, Sarah

Zoea -- Composable Inductive Programming Without Limits

arXiv.org Artificial IntelligenceNov-13-2019

The abstraction levels represent a general progression from the test cases through available and derived values to partial and complete solutions. The abstraction levels include: - test cases; - input and output elements; - derived values (symbolic and numeric); - code fragments; - target values; - case solutions; - case set solutions; - program solutions; - solution code. The data on the blackboard represents a set of more or less promising solution fragments at different stages of identification, characterisation and elaboration. It is worth noting that progression from test cases to solution code is not a strictly linear process. Instead knowledge sources respond to changes at one or more specific abstraction levels to produce, enhance or remove elements on different levels. The blackboard model allows this to happen in more or less any order.

software, test case, zoea, (17 more...)

arXiv.org Artificial Intelligence

1911.08286

Country: North America > United States > California > San Francisco County > San Francisco (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Blackboard Systems (0.47)