AITopics | Schwing, Alexander

Collaborating Authors

Schwing, Alexander

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

GridToPix: Training Embodied Agents with Minimal Supervision

Jain, Unnat, Liu, Iou-Jen, Lazebnik, Svetlana, Kembhavi, Aniruddha, Weihs, Luca, Schwing, Alexander

arXiv.org Artificial IntelligenceApr-14-2021

While deep reinforcement learning (RL) promises freedom from hand-labeled data, great successes, especially for Embodied AI, require significant work to create supervision via carefully shaped rewards. Indeed, without shaped rewards, i.e., with only terminal rewards, present-day Embodied AI results degrade significantly across Embodied AI problems from single-agent Habitat-based PointGoal Navigation (SPL drops from 55 to 0) and two-agent AI2-THOR-based Furniture Moving (success drops from 58% to 1%) to three-agent Google Football-based 3 vs. 1 with Keeper (game score drops from 0.6 to 0.1). As training from shaped rewards doesn't scale to more realistic tasks, the community needs to improve the success of training with terminal rewards. For this we propose GridToPix: 1) train agents with terminal rewards in gridworlds that generically mirror Embodied AI environments, i.e., they are independent of the task; 2) distill the learned policy into agents that reside in complex visual worlds. Despite learning from only terminal rewards with identical models and RL algorithms, GridToPix significantly improves results across tasks: from PointGoal Navigation (SPL improves from 0 to 64) and Furniture Moving (success improves from 1% to 25%) to football gameplay (game score improves from 0.1 to 0.6). GridToPix even helps to improve the results of shaped reward training.

agent, deep learning, neural network, (19 more...)

arXiv.org Artificial Intelligence

2105.00931

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Games (0.87)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

NCP-VAE: Variational Autoencoders with Noise Contrastive Priors

Aneja, Jyoti, Schwing, Alexander, Kautz, Jan, Vahdat, Arash

arXiv.org Machine LearningOct-6-2020

Variational autoencoders (VAEs) are one of the powerful likelihood-based generative models with applications in various domains. However, they struggle to generate high-quality images, especially when samples are obtained from the prior without any tempering. One explanation for VAEs' poor generative quality is the prior hole problem: the prior distribution fails to match the aggregate approximate posterior. Due to this mismatch, there exist areas in the latent space with high density under the prior that do not correspond to any encoded image. Samples from those areas are decoded to corrupted images. To tackle this issue, we propose an energy-based prior defined by the product of a base prior distribution and a reweighting factor, designed to bring the base closer to the aggregate posterior. We train the reweighting factor by noise contrastive estimation, and we generalize it to hierarchical VAEs with many latent variable groups. Our experiments confirm that the proposed noise contrastive priors improve the generative performance of state-of-the-art VAEs by a large margin on the MNIST, CIFAR-10, CelebA 64, and CelebA HQ 256 datasets.

deep learning, international conference, neural network, (15 more...)

arXiv.org Machine Learning

2010.02917

Country: North America > United States > Illinois (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Bridging the Imitation Gap by Adaptive Insubordination

Weihs, Luca, Jain, Unnat, Salvador, Jordi, Lazebnik, Svetlana, Kembhavi, Aniruddha, Schwing, Alexander

arXiv.org Artificial IntelligenceJul-23-2020

Why do agents often obtain better reinforcement learning policies when imitating a worse expert? We show that privileged information used by the expert is marginalized in the learned agent policy, resulting in an "imitation gap." Prior work bridges this gap via a progression from imitation learning to reinforcement learning. While often successful, gradual progression fails for tasks that require frequent switches between exploration and memorization skills. To better address these tasks and alleviate the imitation gap we propose 'Adaptive Insubordination' (ADVISOR), which dynamically reweights imitation and reward-based reinforcement learning losses during training, enabling switching between imitation and exploration. On a suite of challenging tasks, we show that ADVISOR outperforms pure imitation, pure reinforcement learning, as well as sequential combinations of these approaches.

deep learning, exp, neural network, (19 more...)

arXiv.org Artificial Intelligence

2007.12173

Country: North America > United States (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

A Cordial Sync: Going Beyond Marginal Policies for Multi-Agent Embodied Tasks

Jain, Unnat, Weihs, Luca, Kolve, Eric, Farhadi, Ali, Lazebnik, Svetlana, Kembhavi, Aniruddha, Schwing, Alexander

arXiv.org Artificial IntelligenceJul-9-2020

Autonomous agents must learn to collaborate. It is not scalable to develop a new centralized agent every time a task's difficulty outpaces a single agent's abilities. While multi-agent collaboration research has flourished in gridworld-like environments, relatively little work has considered visually rich domains. Addressing this, we introduce the novel task FurnMove in which agents work together to move a piece of furniture through a living room to a goal. Unlike existing tasks, FurnMove requires agents to coordinate at every timestep. We identify two challenges when training agents to complete FurnMove: existing decentralized action sampling procedures do not permit expressive joint action policies and, in tasks requiring close coordination, the number of failed actions dominates successful actions. To confront these challenges we introduce SYNC-policies (synchronize your actions coherently) and CORDIAL (coordination loss). Using SYNC-policies and CORDIAL, our agents achieve a 58% completion rate on FurnMove, an impressive absolute gain of 25 percentage points over competitive decentralized baselines. Our dataset, code, and pretrained models are available at https://unnat.github.io/cordial-sync .

agent, artificial intelligence, survey article, (16 more...)

arXiv.org Artificial Intelligence

2007.04979

Country:

North America > United States > Illinois (0.14)
Europe > Austria > Vienna (0.14)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games (0.92)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)

Add feedback

Pipe-SGD: A Decentralized Pipelined SGD Framework for Distributed Deep Net Training

Li, Youjie, Yu, Mingchao, Li, Songze, Avestimehr, Salman, Kim, Nam Sung, Schwing, Alexander

Neural Information Processing SystemsFeb-15-2020, 19:42:46 GMT

Distributed training of deep nets is an important technique to address some of the present day computing challenges like memory consumption and computational demands. Classical distributed approaches, synchronous or asynchronous, are based on the parameter server architecture, i.e., worker nodes compute gradients which are communicated to the parameter server while updated parameters are returned. Recently, distributed training with AllReduce operations gained popularity as well. While many of those operations seem appealing, little is reported about wall-clock training time improvements. In this paper, we carefully analyze the AllReduce based setup, propose timing models which include network latency, bandwidth, cluster size and compute time, and demonstrate that a pipelined training with a width of two combines the best of both synchronous and asynchronous training.

artificial intelligence, decentralized pipelined sgd framework, machine learning, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.51)

Add feedback

Asynchronous Parallel Coordinate Minimization for MAP Inference

Meshi, Ofer, Schwing, Alexander

Neural Information Processing SystemsFeb-14-2020, 17:56:32 GMT

Finding the maximum a-posteriori (MAP) assignment is a central task in graphical models. Since modern applications give rise to very large problem instances, there is increasing need for efficient solvers. In this work we propose to improve the efficiency of coordinate-minimization-based dual-decomposition solvers by running their updates asynchronously in parallel. In this case message-passing inference is performed by multiple processing units simultaneously without coordination, all reading and writing to shared memory. We analyze the convergence properties of the resulting algorithms and identify settings where speedup gains can be expected.

artificial intelligence, asynchronous parallel coordinate minimization, map inference, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Vision (0.50)

Add feedback

Interpretable and Globally Optimal Prediction for Textual Grounding using Image Concepts

Yeh, Raymond, Xiong, Jinjun, Hwu, Wen-Mei, Do, Minh, Schwing, Alexander

Neural Information Processing SystemsFeb-14-2020, 09:13:35 GMT

Textual grounding is an important but challenging task for human-computer inter- action, robotics and knowledge mining. Existing algorithms generally formulate the task as selection from a set of bounding box proposals obtained from deep net based systems. In this work, we demonstrate that we can cast the problem of textual grounding into a unified framework that permits efficient search over all possible bounding boxes. Hence, the method is able to consider significantly more proposals and doesn't rely on a successful first stage hypothesizing bounding box proposals. Beyond, we demonstrate that the trained parameters of our model can be used as word-embeddings which capture spatial-image relationships and provide interpretability.

artificial intelligence, interpretable and globally optimal prediction, textual, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.52)

Add feedback

Sequential Latent Spaces for Modeling the Intention During Diverse Image Captioning

Aneja, Jyoti, Agrawal, Harsh, Batra, Dhruv, Schwing, Alexander

arXiv.org Machine LearningAug-22-2019

Diverse and accurate vision+language modeling is an important goal to retain creative freedom and maintain user engagement. However, adequately capturing the intricacies of diversity in language models is challenging. Recent works commonly resort to latent variable models augmented with more or less supervision from object detectors or part-of-speech tags. Common to all those methods is the fact that the latent variable either only initializes the sentence generation process or is identical across the steps of generation. Both methods offer no fine-grained control. To address this concern, we propose Seq-CVAE which learns a latent space for every word position. We encourage this temporal latent space to capture the 'intention' about how to complete the sentence by mimicking a representation which summarizes the future. We illustrate the efficacy of the proposed approach to anticipate the sentence continuation on the challenging MSCOCO dataset, significantly improving diversity metrics compared to baselines while performing on par w.r.t sentence quality.

caption, deep learning, neural network, (20 more...)

arXiv.org Machine Learning

1908.08529

Country: North America > United States > Illinois (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.77)

Add feedback

Factor Graph Attention

Schwartz, Idan, Yu, Seunghak, Hazan, Tamir, Schwing, Alexander

arXiv.org Artificial IntelligenceApr-11-2019

Dialog is an effective way to exchange information, but subtle details and nuances are extremely important. While significant progress has paved a path to address visual dialog with algorithms, details and nuances remain a challenge. Attention mechanisms have demonstrated compelling results to extract details in visual question answering and also provide a convincing framework for visual dialog due to their interpretability and effectiveness. However, the many data utilities that accompany visual dialog challenge existing attention techniques. We address this issue and develop a general attention mechanism for visual dialog which operates on any number of data utilities. To this end, we design a factor graph based attention mechanism which combines any number of utility representations. We illustrate the applicability of the proposed approach on the challenging and recently introduced VisDial datasets, outperforming recent state-of-the-art methods by 1.1% for VisDial0.9 and by 2% for VisDial1.0 on MRR. Our ensemble model improved the MRR score on VisDial1.0 by more than 6%.

deep learning, neural network, question answering, (19 more...)

arXiv.org Artificial Intelligence

1904.0588

Country: North America > United States (0.28)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Vision (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Max-Sliced Wasserstein Distance and its use for GANs

Deshpande, Ishan, Hu, Yuan-Ting, Sun, Ruoyu, Pyrros, Ayis, Siddiqui, Nasir, Koyejo, Sanmi, Zhao, Zhizhen, Forsyth, David, Schwing, Alexander

arXiv.org Machine LearningApr-11-2019

Generative adversarial nets (GANs) and variational auto-encoders have significantly improved our distribution modeling capabilities, showing promise for dataset augmentation, image-to-image translation and feature learning. However, to model high-dimensional distributions, sequential training and stacked architectures are common, increasing the number of tunable hyper-parameters as well as the training time. Nonetheless, the sample complexity of the distance metrics remains one of the factors affecting GAN training. We first show that the recently proposed sliced Wasserstein distance has compelling sample complexity properties when compared to the Wasserstein distance. To further improve the sliced Wasserstein distance we then analyze its `projection complexity' and develop the max-sliced Wasserstein distance which enjoys compelling sample complexity while reducing projection complexity, albeit necessitating a max estimation. We finally illustrate that the proposed distance trains GANs on high-dimensional images up to a resolution of 256x256 easily.

artificial intelligence, neural network, wasserstein distance, (18 more...)

arXiv.org Machine Learning

1904.05877

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback