AITopics | Fischer, Asja

Plotting

Fischer, Asja

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Predictive Uncertainty Quantification with Compound Density Networks

Kristiadi, Agustinus, Fischer, Asja

arXiv.org Machine LearningFeb-4-2019

Despite the huge success of deep neural networks (NNs), finding good mechanisms for quantifying their prediction uncertainty is still an open problem. Bayesian neural networks are one of the most popular approaches to uncertainty quantification. On the other hand, it was recently shown that ensembles of NNs, which belong to the class of mixture models, can be used to quantify prediction uncertainty. In this paper, we build upon these two approaches. First, we increase the mixture model's flexibility by replacing the fixed mixing weights by an adaptive, input-dependent distribution (specifying the probability of each component) represented by NNs, and by considering uncountably many mixture components. The resulting class of models can be seen as the continuous counterpart to mixture density networks and is therefore referred to as compound density networks (CDNs). We employ both maximum likelihood and variational Bayesian inference to train CDNs, and empirically show that they yield better uncertainty estimates on out-of-distribution data and are more robust to adversarial examples than the previous approaches.

bayesian inference, neural network, uncertainty quantification, (17 more...)

arXiv.org Machine Learning

1902.0108

Country:

Europe > Germany (0.14)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

On the Relation Between the Sharpest Directions of DNN Loss and the SGD Step Length

Jastrzębski, Stanisław, Kenton, Zachary, Ballas, Nicolas, Fischer, Asja, Bengio, Yoshua, Storkey, Amos

arXiv.org Machine LearningDec-13-2018

The training of deep neural networks with Stochastic Gradient Descent (SGD) with a large learning rate or a small batch-size typically ends in flat regions of the weight space, as indicated by small eigenvalues of the Hessian of the training loss. This was found to correlate with a good final generalization performance. In this paper we extend previous work by investigating the curvature of the loss surface along the whole training trajectory, rather than only at the endpoint. We find that initially SGD visits increasingly sharp regions, reaching a maximum sharpness determined by both the learning rate and the batch-size of SGD. At this peak value SGD starts to fail to minimize the loss along directions in the loss surface corresponding to the largest curvature (sharpest directions). To further investigate the effect of these dynamics in the training process, we study a variant of SGD using a reduced learning rate along the sharpest directions which we show can improve training speed while finding both a sharper and better generalizing solution, compared to vanilla SGD. Overall, our results show that the SGD dynamics in the subspace of the sharpest directions influence the regions that SGD steers to (where larger learning rate or smaller batch size result in wider regions visited), the overall training speed, and the generalization ability of the final model.

deep learning, neural network, sharpest direction, (21 more...)

arXiv.org Machine Learning

1807.05031

Country: North America > Canada > Quebec > Montreal (0.14)

Genre: Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Translating Natural Language to SQL using Pointer-Generator Networks and How Decoding Order Matters

Lukovnikov, Denis, Chakraborty, Nilesh, Lehmann, Jens, Fischer, Asja

arXiv.org Artificial IntelligenceNov-13-2018

Translating natural language to SQL queries for table-based question answering is a challenging problem and has received significant attention from the research community. In this work, we extend a pointer-generator and investigate the order-matters problem in semantic parsing for SQL. Even though our model is a straightforward extension of a general-purpose pointer-generator, it outperforms early works for WikiSQL and remains competitive to concurrently introduced, more complex models. Moreover, we provide a deeper investigation of the potential order-matters problem that could arise due to having multiple correct decoding paths, and investigate the use of REINFORCE as well as a dynamic oracle in this context.

deep learning, neural network, oracle, (18 more...)

arXiv.org Artificial Intelligence

1811.05303

Country: Europe > Germany (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)

Add feedback

Learning to Rank Query Graphs for Complex Question Answering over Knowledge Graphs

Maheshwari, Gaurav, Trivedi, Priyansh, Lukovnikov, Denis, Chakraborty, Nilesh, Fischer, Asja, Lehmann, Jens

arXiv.org Artificial IntelligenceNov-2-2018

In this paper, we conduct an empirical investigation of neural query graph ranking approaches for the task of complex question answering over knowledge graphs. We experiment with six different ranking models and propose a novel self-attention based slot matching model which exploits the inherent structure of query graphs, our logical form of choice. Our proposed model generally outperforms the other models on two QA datasets over the DBpedia knowledge graph, evaluated in different settings. In addition, we show that transfer learning from the larger of those QA datasets to the smaller dataset yields substantial improvements, effectively offsetting the general lack of training data.

core chain, deep learning, neural network, (22 more...)

arXiv.org Artificial Intelligence

1811.01118

Country: Europe > Germany (0.29)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.72)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.72)

Add feedback

Improving Response Selection in Multi-turn Dialogue Systems

Chaudhuri, Debanjan, Kristiadi, Agustinus, Lehmann, Jens, Fischer, Asja

arXiv.org Artificial IntelligenceSep-19-2018

Building systems that can communicate with humans is a core problem in Artificial Intelligence. This work proposes a novel neural network architecture for response selection in an end-to-end multi-turn conversational dialogue setting. The architecture applies context level attention and incorporates additional external knowledge provided by descriptions of domain-specific words. It uses a bi-directional Gated Recurrent Unit (GRU) for encoding context and responses and learns to attend over the context words given the latent response representation and vice versa.In addition, it incorporates external domain specific information using another GRU for encoding the domain keyword descriptions. This allows better representation of domain-specific keywords in responses and hence improves the overall performance. Experimental results show that our model outperforms all other state-of-the-art methods for response selection in multi-turn conversations.

deep learning, dialogue system, neural network, (20 more...)

arXiv.org Artificial Intelligence

1809.03194

Country: Europe > Germany (0.14)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

On the regularization of Wasserstein GANs

Petzka, Henning, Fischer, Asja, Lukovnicov, Denis

arXiv.org Machine LearningMar-5-2018

Since their invention, generative adversarial networks (GANs) have become a popular approach for learning to model a distribution of real (unlabeled) data. Convergence problems during training are overcome by Wasserstein GANs which minimize the distance between the model and the empirical distribution in terms of a different metric, but thereby introduce a Lipschitz constraint into the optimization problem. A simple way to enforce the Lipschitz constraint on the class of functions, which can be modeled by the neural network, is weight clipping. It was proposed that training can be improved by instead augmenting the loss by a regularization term that penalizes the deviation of the gradient of the critic (as a function of the network's input) from one. We present theoretical arguments why using a weaker regularization term enforcing the Lipschitz constraint is preferable. These arguments are supported by experimental results on toy data sets.

artificial intelligence, neural network, regularization term, (18 more...)

arXiv.org Machine Learning

1709.08894

Country:

Europe > Germany (0.14)
Oceania > Australia (0.14)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Incorporating Literals into Knowledge Graph Embeddings

Kristiadi, Agustinus, Khan, Mohammad Asif, Lukovnikov, Denis, Lehmann, Jens, Fischer, Asja

arXiv.org Machine LearningFeb-3-2018

Knowledge graphs, on top of entities and their relationships, contain another important element: literals. Literals encode interesting properties (e.g. the height) of entities that are not captured by links between entities alone. Most of the existing work on embedding (or latent feature) based knowledge graph modeling focuses mainly on the relations between entities. In this work, we study the effect of incorporating literal information into existing knowledge graph models. Our approach, which we name LiteralE, is an extension that can be plugged into existing latent feature methods. LiteralE merges entity embeddings with their literal information using a learnable, parametrized function, such as a simple linear or nonlinear transformation, or a multilayer neural network. We extend several popular embedding models using LiteralE and evaluate the performance on the task of link prediction. Despite its simplicity, LiteralE proves to be an effective way to incorporate literal information into existing embedding based models, improving their performance on different standard datasets, which we augmented with their literals and provide as testbed for further research.

artificial intelligence, literale, neural network, (16 more...)

arXiv.org Machine Learning

1802.00934

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Three Factors Influencing Minima in SGD

Jastrzębski, Stanisław, Kenton, Zachary, Arpit, Devansh, Ballas, Nicolas, Fischer, Asja, Bengio, Yoshua, Storkey, Amos

arXiv.org Machine LearningNov-13-2017

We study the properties of the endpoint of stochastic gradient descent (SGD). By approximating SGD as a stochastic differential equation (SDE) we consider the Boltzmann-Gibbs equilibrium distribution of that SDE under the assumption of isotropic variance in loss gradients. Through this analysis, we find that three factors - learning rate, batch size and the variance of the loss gradients - control the trade-off between the depth and width of the minima found by SGD, with wider minima favoured by a higher ratio of learning rate to batch size. We have direct control over the learning rate and batch size, while the variance is determined by the choice of model architecture, model parameterization and dataset. In the equilibrium distribution only the ratio of learning rate to batch size appears, implying that the equilibrium distribution is invariant under a simultaneous rescaling of learning rate and batch size by the same amount. We then explore experimentally how learning rate and batch size affect SGD from two perspectives: the endpoint of SGD and the dynamics that lead up to it. For the endpoint, the experiments suggest the endpoint of SGD is invariant under simultaneous rescaling of batch size and learning rate, and also that a higher ratio leads to flatter minima, both findings are consistent with our theoretical analysis. We note experimentally that the dynamics also seem to be invariant under the same rescaling of learning rate and batch size, which we explore showing that one can exchange batch size and learning rate for cyclical learning rate schedule. Next, we illustrate how noise affects memorization, showing that high noise levels lead to better generalization. Finally, we find experimentally that the invariance under simultaneous rescaling of learning rate and batch size breaks down if the learning rate gets too large or the batch size gets too small.

deep learning, learning rate, neural network, (21 more...)

arXiv.org Machine Learning

1711.04623

Country:

Oceania > Australia (0.14)
Europe (0.14)
North America (0.14)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)

Add feedback

A Closer Look at Memorization in Deep Networks

Arpit, Devansh, Jastrzębski, Stanisław, Ballas, Nicolas, Krueger, David, Bengio, Emmanuel, Kanwal, Maxinder S., Maharaj, Tegan, Fischer, Asja, Courville, Aaron, Bengio, Yoshua, Lacoste-Julien, Simon

arXiv.org Machine LearningJul-1-2017

We examine the role of memorization in deep learning, drawing connections to capacity, generalization, and adversarial robustness. While deep networks are capable of memorizing noise data, our results suggest that they tend to prioritize learning simple patterns first. In our experiments, we expose qualitative differences in gradient-based optimization of deep neural networks (DNNs) on noise vs. real data. We also demonstrate that for appropriately tuned explicit regularization (e.g., dropout) we can degrade DNN training performance on noise datasets without compromising generalization on real data. Our analysis suggests that the notions of effective capacity which are dataset independent are unlikely to explain the generalization performance of deep networks when trained with gradient based methods because training data itself plays an important role in determining the degree of memorization.

deep learning, neural network, real data, (17 more...)

arXiv.org Machine Learning

1706.05394

Country:

North America > United States (0.46)
North America > Canada > Quebec > Montreal (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.86)

Add feedback

Population-Contrastive-Divergence: Does Consistency help with RBM training?

Krause, Oswin, Fischer, Asja, Igel, Christian

arXiv.org Machine LearningJun-28-2017

Estimating the log-likelihood gradient with respect to the parameters of a Restricted Boltzmann Machine (RBM) typically requires sampling using Markov Chain Monte Carlo (MCMC) techniques. To save computation time, the Markov chains are only run for a small number of steps, which leads to a biased estimate. This bias can cause RBM training algorithms such as Contrastive Divergence (CD) learning to deteriorate. We adopt the idea behind Population Monte Carlo (PMC) methods to devise a new RBM training algorithm termed Population-Contrastive-Divergence (pop-CD). Compared to CD, it leads to a consistent estimate and may have a significantly lower bias. Its computational overhead is negligible compared to CD. However, the variance of the gradient estimate increases. We experimentally show that pop-CD can significantly outperform CD. In many cases, we observed a smaller bias and achieved higher log-likelihood values. However, when the RBM distribution has many hidden neurons, the consistent estimate of pop-CD may still have a considerable bias and the variance of the gradient estimate requires a smaller learning rate. Thus, despite its superior theoretical properties, it is not advisable to use pop-CD in its current form on large problems.

artificial intelligence, experiment, neural network, (18 more...)

arXiv.org Machine Learning

1510.01624

Country:

Europe (0.28)
North America > Canada > Quebec > Montreal (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.89)

Add feedback