AITopics

Disentanglement with Hyperspherical Latent Spaces using Diffusion Variational Autoencoders

Rey, Luis A. Pérez

A disentangled representation of a data set should be capable of recovering the underlying factors that generated it. One question that arises is whether using Euclidean space for latent variable models can produce a disentangled representation when the underlying generating factors have a certain geometrical structure. Take for example the images of a car seen from different angles. The angle has a periodic structure but a 1-dimensional representation would fail to capture this topology. How can we address this problem? The submissions presented for the first stage of the NeurIPS2019 Disentanglement Challenge consist of a Diffusion Variational Autoencoder ($\Delta$VAE) with a hyperspherical latent space which can, for example, recover periodic true factors. The training of the $\Delta$VAE is enhanced by incorporating a modified version of the Evidence Lower Bound (ELBO) for tailoring the encoding capacity of the posterior approximate.

artificial intelligence, machine learning, variational autoencoder, (15 more...)

2003.08996

Country:

Europe > Netherlands > North Brabant > Eindhoven (0.05)
Europe > Netherlands > North Holland > Amsterdam (0.05)

Genre: Research Report (0.43)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Journal of Artificial Intelligence ResearchMar-19-2020

Agreement on Target-Bidirectional Recurrent Neural Networks for Sequence-to-Sequence Learning

Liu, Lemao, Finch, Andrew, Utiyama, Masao, Sumita, Eiichiro

Recurrent neural networks are extremely appealing for sequence-to-sequence learning tasks. Despite their great success, they typically suffer from a shortcoming: they are prone to generate unbalanced targets with good prefixes but bad suffixes, and thus performance suffers when dealing with long sequences. We propose a simple yet effective approach to overcome this shortcoming. Our approach relies on the agreement between a pair of target-directional RNNs, which generates more balanced targets. In addition, we develop two efficient approximate search methods for agreement that are empirically shown to be almost optimal in terms of either sequence level or non-sequence level metrics. Extensive experiments were performed on three standard sequence-to-sequence transduction tasks: machine transliteration, grapheme-to-phoneme transformation and machine translation. The results show that the proposed approach achieves consistent and substantial improvements, compared to many state-of-the-art systems.

machine translation, proceedings, sequence, (13 more...)

doi: 10.1613/jair.1.12008

AI Access Foundation

12008

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
(6 more...)

Genre: Research Report > New Finding (0.88)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Castro, Margarita Paz (University of Toronto) | Piacentini, Chiara (Autodesk Research) | Cire, Andre Augusto (Dept. of Management, University of Toronto Scarborough and Rotman School of Management) | Beck, J. Christopher (Department of Mechanical and Industrial Engineering, University of Toronto)

Solving Delete Free Planning with Relaxed Decision Diagram Based Heuristics

Journal of Artificial Intelligence ResearchMar-19-2020

We investigate the use of relaxed decision diagrams (DDs) for computing admissible heuristics for the cost-optimal delete-free planning (DFP) problem. Our main contributions are the introduction of two novel DD encodings for a DFP task: a multivalued decision diagram that includes the sequencing aspect of the problem and a binary decision diagram representation of its sequential relaxation. We present construction algorithms for each DD that leverage these different perspectives of the DFP task and provide theoretical and empirical analyses of the associated heuristics. We further show that relaxed DDs can be used beyond heuristic computation to extract delete-free plans, find action landmarks, and identify redundant actions. Our empirical analysis shows that while DD-based heuristics trail the state of the art, even small relaxed DDs are competitive with the linear programming heuristic for the DFP task, thus, revealing novel ways of designing admissible heuristics.

cost-optimal plan, procedure, proposition, (14 more...)

doi: 10.1613/jair.1.11659

AI Access Foundation

11659

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > France (0.04)
South America > Chile (0.04)

Genre: Research Report (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.93)

Kulkarni, Viraj, Kulkarni, Milind, Pant, Aniruddha

Survey of Personalization Techniques for Federated Learning

Federated learning enables machine learning models to learn from private decentralized data without compromising privacy. The standard formulation of federated learning produces one shared model for all clients. Statistical heterogeneity due to non-IID distribution of data across devices often leads to scenarios where, for some clients, the local models trained solely on their private data perform better than the global shared model thus taking away their incentive to participate in the process. Several techniques have been proposed to personalize global models to work better for individual clients. This paper highlights the need for personalization and surveys recent research on this topic.

global model, learning, personalization, (14 more...)

2003.08673

Country: North America > Aruba (0.04)

Genre: Research Report (0.70)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)

Lyu, Xiong, Ludkovski, Mike

Adaptive Batching for Gaussian Process Surrogates with Application in Noisy Level Set Estimation

Metamodels offer a cheap statistical representation of complex and/or expensive stochastic simulators that arise in applications ranging from engineering to environmental science and finance [Santner et al., 2013]. Gaussian process (GP) frameworks have emerged as the leading family of metamodels thanks to their flexibility, analytical tractability and superior empirical performance. However, for GP metamodels to be fast, it is imperative to keep the respective design size A manageable. In particular, unless the simulator is truly expensive or the input domain is vast, the typical recommendation is to restrict to hundreds of inputs, A 10 3 . This creates a major tension as frequently the stochastic simulator has low signal-to-noise ratio or a complex noise structure. A prototypical example is where the simulator Y (x) F (X [0, t]) X0 x involves functionals of a continuous-time Markov chain or stochastic differential equation solution (X t), whereby the stochasticity tends to dominate the trend/drift term for short t, and moreover simulation noise is non-Gaussian and state-dependent (heteroskedastic). Both authors are partially supported by NSF DMS-1521743.

experiment, simulation, tgp, (16 more...)

2003.08579

Country:

North America > United States > California > Santa Barbara County > Santa Barbara (0.04)
North America > United States > Florida > Broward County > Fort Lauderdale (0.04)
Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.34)

Causal Interpretability for Machine Learning -- Problems, Methods and Evaluation

Moraffah, Raha, Karami, Mansooreh, Guo, Ruocheng, Raglin, Adrienne, Liu, Huan

Machine learning models have had discernible achievements in a myriad of applications. However, most of these models are black-boxes, and it is obscure how the decisions are made by them. This makes the models unreliable and untrustworthy. To provide insights into the decision making processes of these models, a variety of traditional interpretable models have been proposed. Moreover, to generate more human-friendly explanations, recent work on interpretability tries to answer questions related to causality such as "Why does this model makes such decisions?" or "Was it a specific feature that caused the decision made by the model?". In this work, models that aim to answer causal questions are referred to as causal interpretable models. The existing surveys have covered concepts and methodologies of traditional interpretability. In this work, we present a comprehensive survey on causal interpretable models from the aspects of the problems and methods. In addition, this survey provides in-depth insights into the existing evaluation metrics for measuring interpretability, which can help practitioners understand for what scenarios each evaluation metric is suitable.

counterfactual explanation, explanation, interpretability, (14 more...)

2003.03934

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
(5 more...)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Health & Medicine (0.93)
Information Technology (0.67)
Education > Focused Education > Special Education (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Odense, Simon, Garcez, Artur d'Avila

Layerwise Knowledge Extraction from Deep Convolutional Networks

arXiv.org Artificial IntelligenceMar-19-2020

Knowledge extraction is used to convert neural networks into symbolic descriptions with the objective of producing more comprehensible learning models. The central challenge is to find an explanation which is more comprehensible than the original model while still representing that model faithfully. The distributed nature of deep networks has led many to believe that the hidden features of a neural network cannot be explained by logical descriptions simple enough to be comprehensible. In this paper, we propose a novel layerwise knowledge extraction method using M-of-N rules which seeks to obtain the best trade-off between the complexity and accuracy of rules describing the hidden features of a deep network. We show empirically that this approach produces rules close to an optimal complexity-error tradeoff. We apply this method to a variety of deep networks and find that in the internal layers we often cannot find rules with a satisfactory complexity and accuracy, suggesting that rule extraction as a general purpose method for explaining the internal logic of a neural network may be impossible. However, we also find that the softmax layer in Convolutional Neural Networks and Autoencoders using either tanh or relu activation functions is highly explainable by rule extraction, with compact rules consisting of as little as 3 units out of 128 often reaching over 99% accuracy. This shows that rule extraction can be a useful component for explaining parts (or modules) of a deep neural network.

complexity, extraction, rule extraction, (17 more...)

arXiv.org Artificial Intelligence

2003.09

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

arXiv.org Artificial IntelligenceMar-19-2020

VisuoSpatial Foresight for Multi-Step, Multi-Task Fabric Manipulation

Hoque, Ryan, Seita, Daniel, Balakrishna, Ashwin, Ganapathi, Aditya, Tanwani, Ajay Kumar, Jamali, Nawid, Yamane, Katsu, Iba, Soshi, Goldberg, Ken

Robotic fabric manipulation has applications in cloth and cable management, senior care, surgery and more. Existing fabric manipulation techniques, however, are designed for specific tasks, making it difficult to generalize across different but related tasks. We address this problem by extending the recently proposed Visual Foresight framework to learn fabric dynamics, which can be efficiently reused to accomplish a variety of different fabric manipulation tasks with a single goal-conditioned policy. We introduce VisuoSpatial Foresight (VSF), which extends prior work by learning visual dynamics on domain randomized RGB images and depth maps simultaneously and completely in simulation. We experimentally evaluate VSF on multi-step fabric smoothing and folding tasks both in simulation and on the da Vinci Research Kit (dVRK) surgical robot without any demonstrations at train or test time. Furthermore, we find that leveraging depth significantly improves performance for cloth manipulation tasks, and results suggest that leveraging RGBD data for video prediction and planning yields an 80% improvement in fabric folding success rate over pure RGB data. Supplementary material is available at https://sites.google.com/view/fabric-vsf/.

fabric, international conference, learning, (15 more...)

arXiv.org Artificial Intelligence

2003.09044

Country:

Europe > Spain > Aragón (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
Asia (0.04)

Genre: Research Report (0.84)

Industry:

Health & Medicine (0.48)
Energy (0.47)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Efficient Deep Representation Learning by Adaptive Latent Space Sampling

Mo, Yuanhan, Wang, Shuo, Dai, Chengliang, Zhou, Rui, Bai, Wenjia, Guo, Yike

Supervised deep learning requires a large amount of training samples with annotations (e.g. label class for classification task, pixel- or voxel-wised label map for segmentation tasks), which are expensive and time-consuming to obtain. During the training of a deep neural network, the annotated samples are fed into the network in a mini-batch way, where they are often regarded of equal importance. However, some of the samples may become less informative during training, as the magnitude of the gradient start to vanish for these samples. In the meantime, other samples of higher utility or hardness may be more demanded for the training process to proceed and require more exploitation. To address the challenges of expensive annotations and loss of sample informativeness, here we propose a novel training framework which adaptively selects informative samples that are fed to the training process. The adaptive selection or sampling is performed based on a hardness-aware strategy in the latent space constructed by a generative model. To evaluate the proposed training framework, we perform experiments on three different datasets, including MNIST and CIFAR-10 for image classification task and a medical image dataset IVUS for biophysical simulation task. On all three datasets, the proposed framework outperforms a random sampling method, which demonstrates the effectiveness of proposed framework.

batch normalization, relu activation, training sample, (14 more...)

2004.02757

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.40)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (0.67)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)