AITopics | Schwing, Alexander

Two Body Problem: Collaborative Visual Task Completion

Jain, Unnat, Weihs, Luca, Kolve, Eric, Rastegari, Mohammad, Lazebnik, Svetlana, Farhadi, Ali, Schwing, Alexander, Kembhavi, Aniruddha

arXiv.org Artificial IntelligenceApr-11-2019

Collaboration is a necessary skill to perform tasks that are beyond one agent's capabilities. Addressed extensively in both conventional and modern AI, multi-agent collaboration has often been studied in the context of simple grid worlds. We argue that there are inherently visual aspects to collaboration which should be studied in visually rich environments. A key element in collaboration is communication that can be either explicit, through messages, or implicit, through perception of the other agents and the visual world. Learning to collaborate in a visual environment entails learning (1) to perform the task, (2) when and what to communicate, and (3) how to act based on these communications and the perception of the visual world. In this paper we study the problem of learning to collaborate directly from pixels in AI2-THOR and demonstrate the benefits of explicit and implicit modes of communication to perform visual tasks. Refer to our project page for more details: https://prior.allenai.org/projects/two-body-problem

agent, artificial intelligence, machine learning, (18 more...)

arXiv.org Artificial Intelligence

1904.05879

Genre:

Research Report > New Finding (0.93)
Research Report > Experimental Study (0.67)

Industry:

Leisure & Entertainment (1.00)
Education (0.93)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.68)

Add feedback

A Simple Baseline for Audio-Visual Scene-Aware Dialog

Schwartz, Idan, Schwing, Alexander, Hazan, Tamir

arXiv.org Artificial IntelligenceApr-11-2019

The recently proposed audio-visual scene-aware dialog task paves the way to a more data-driven way of learning virtual assistants, smart speakers and car navigation systems. However, very little is known to date about how to effectively extract meaningful information from a plethora of sensors that pound the computational engine of those devices. Therefore, in this paper, we provide and carefully analyze a simple baseline for audio-visual scene-aware dialog which is trained end-to-end. Our method differentiates in a data-driven manner useful signals from distracting ones using an attention mechanism. We evaluate the proposed approach on the recently introduced and challenging audio-visual scene-aware dataset, and demonstrate the key features that permit to outperform the current state-of-the-art by more than 20\% on CIDEr.

deep learning, neural network, proc, (21 more...)

arXiv.org Artificial Intelligence

1904.05876

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.55)

Add feedback

GradiVeQ: Vector Quantization for Bandwidth-Efficient Gradient Aggregation in Distributed CNN Training

Yu, Mingchao, Lin, Zhifeng, Narra, Krishna, Li, Songze, Li, Youjie, Kim, Nam Sung, Schwing, Alexander, Annavaram, Murali, Avestimehr, Salman

Neural Information Processing SystemsDec-31-2018

Data parallelism can boost the training speed of convolutional neural networks (CNN), but could suffer from significant communication costs caused by gradient aggregation. To alleviate this problem, several scalar quantization techniques have been developed to compress the gradients. But these techniques could perform poorly when used together with decentralized aggregation protocols like ring all-reduce (RAR), mainly due to their inability to directly aggregate compressed gradients. In this paper, we empirically demonstrate the strong linear correlations between CNN gradients, and propose a gradient vector quantization technique, named GradiVeQ, to exploit these correlations through principal component analysis (PCA) for substantial gradient dimension reduction. Gradi-VeQ enables direct aggregation of compressed gradients, hence allows us to build a distributed learning system that parallelizes GradiVeQ gradient compression and RAR communications. Extensive experiments on popular CNNs demonstrate that applying GradiVeQ slashes the wall-clock gradient aggregation time of the original RAR by more than 5X without noticeable accuracy loss, and reduces the end-to-end training time by almost 50%. The results also show that GradiVeQ is compatible with scalar quantization techniques such as QSGD (Quantized SGD), and achieves a much higher speed-up gain under the same compression ratio.

artificial intelligence, deep learning, machine learning, (14 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.14)

Industry: Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Out of the Box: Reasoning with Graph Convolution Nets for Factual Visual Question Answering

Narasimhan, Medhini, Lazebnik, Svetlana, Schwing, Alexander

Neural Information Processing SystemsDec-31-2018

Accurately answering a question about a given image requires combining observations with general knowledge. While this is effortless for humans, reasoning with general knowledge remains an algorithmic challenge. To advance research in this direction a novel `fact-based' visual question answering (FVQA) task has been introduced recently along with a large set of curated facts which link two entities, i.e., two possible answers, via a relation. Given a question-image pair, deep network techniques have been employed to successively reduce the large set of facts until one of the two entities of the final remaining fact is predicted as the answer. We observe that a successive process which considers one fact at a time to form a local decision is sub-optimal. Instead, we develop an entity graph and use a graph convolutional network to `reason' about the correct answer by jointly considering all entities. We show on the challenging FVQA dataset that this leads to an improvement in accuracy of around 7% compared to the state-of-the-art.

deep learning, neural network, survey article, (23 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois (0.14)
North America > Canada (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Pipe-SGD: A Decentralized Pipelined SGD Framework for Distributed Deep Net Training

Li, Youjie, Yu, Mingchao, Li, Songze, Avestimehr, Salman, Kim, Nam Sung, Schwing, Alexander

Neural Information Processing SystemsDec-31-2018

Distributed training of deep nets is an important technique to address some of the present day computing challenges like memory consumption and computational demands. Classical distributed approaches, synchronous or asynchronous, are based on the parameter server architecture, i.e., worker nodes compute gradients which are communicated to the parameter server while updated parameters are returned. Recently, distributed training with AllReduce operations gained popularity as well. While many of those operations seem appealing, little is reported about wall-clock training time improvements. In this paper, we carefully analyze the AllReduce based setup, propose timing models which include network latency, bandwidth, cluster size and compute time, and demonstrate that a pipelined training with a width of two combines the best of both synchronous and asynchronous training. Specifically, for a setup consisting of a four-node GPU cluster we show wall-clock time training improvements of up to 5.4x compared to conventional approaches.

artificial intelligence, communication, machine learning, (19 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.14)

Industry:

Information Technology (0.46)
Government > Regional Government > North America Government > United States Government (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Out of the Box: Reasoning with Graph Convolution Nets for Factual Visual Question Answering

Narasimhan, Medhini, Lazebnik, Svetlana, Schwing, Alexander

Neural Information Processing SystemsDec-31-2018

Accurately answering a question about a given image requires combining observations with general knowledge. While this is effortless for humans, reasoning with general knowledge remains an algorithmic challenge. To advance research in this direction a novel `fact-based' visual question answering (FVQA) task has been introduced recently along with a large set of curated facts which link two entities, i.e., two possible answers, via a relation. Given a question-image pair, deep network techniques have been employed to successively reduce the large set of facts until one of the two entities of the final remaining fact is predicted as the answer. We observe that a successive process which considers one fact at a time to form a local decision is sub-optimal. Instead, we develop an entity graph and use a graph convolutional network to `reason' about the correct answer by jointly considering all entities. We show on the challenging FVQA dataset that this leads to an improvement in accuracy of around 7% compared to the state-of-the-art.

deep learning, neural network, survey article, (22 more...)

Neural Information Processing Systems

Country: North America > United States > Illinois (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Pipe-SGD: A Decentralized Pipelined SGD Framework for Distributed Deep Net Training

Li, Youjie, Yu, Mingchao, Li, Songze, Avestimehr, Salman, Kim, Nam Sung, Schwing, Alexander

Neural Information Processing SystemsDec-31-2018

Distributed training of deep nets is an important technique to address some of the present day computing challenges like memory consumption and computational demands. Classical distributed approaches, synchronous or asynchronous, are based on the parameter server architecture, i.e., worker nodes compute gradients which are communicated to the parameter server while updated parameters are returned. Recently, distributed training with AllReduce operations gained popularity as well. While many of those operations seem appealing, little is reported about wall-clock training time improvements. In this paper, we carefully analyze the AllReduce based setup, propose timing models which include network latency, bandwidth, cluster size and compute time, and demonstrate that a pipelined training with a width of two combines the best of both synchronous and asynchronous training. Specifically, for a setup consisting of a four-node GPU cluster we show wall-clock time training improvements of up to 5.4x compared to conventional approaches.

communication, deep learning, neural network, (20 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.14)

Industry:

Information Technology (0.46)
Government > Regional Government > North America Government > United States Government (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

GradiVeQ: Vector Quantization for Bandwidth-Efficient Gradient Aggregation in Distributed CNN Training

Yu, Mingchao, Lin, Zhifeng, Narra, Krishna, Li, Songze, Li, Youjie, Kim, Nam Sung, Schwing, Alexander, Annavaram, Murali, Avestimehr, Salman

Neural Information Processing SystemsDec-31-2018

Data parallelism can boost the training speed of convolutional neural networks (CNN), but could suffer from significant communication costs caused by gradient aggregation. To alleviate this problem, several scalar quantization techniques have been developed to compress the gradients. But these techniques could perform poorly when used together with decentralized aggregation protocols like ring all-reduce (RAR), mainly due to their inability to directly aggregate compressed gradients. In this paper, we empirically demonstrate the strong linear correlations between CNN gradients, and propose a gradient vector quantization technique, named GradiVeQ, to exploit these correlations through principal component analysis (PCA) for substantial gradient dimension reduction. GradiveQ enables direct aggregation of compressed gradients, hence allows us to build a distributed learning system that parallelizes GradiveQ gradient compression and RAR communications. Extensive experiments on popular CNNs demonstrate that applying GradiveQ slashes the wall-clock gradient aggregation time of the original RAR by more than 5x without noticeable accuracy loss, and reduce the end-to-end training time by almost 50%. The results also show that \GradiveQ is compatible with scalar quantization techniques such as QSGD (Quantized SGD), and achieves a much higher speed-up gain under the same compression ratio.

deep learning, gradiveq, neural network, (17 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.14)

Industry: Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Deep Structured Prediction with Nonlinear Output Transformations

Graber, Colin, Meshi, Ofer, Schwing, Alexander

Neural Information Processing SystemsDec-31-2018

Deep structured models are widely used for tasks like semantic segmentation, where explicit correlations between variables provide important prior information which generally helps to reduce the data needs of deep nets. However, current deep structured models are restricted by oftentimes very local neighborhood structure, which cannot be increased for computational complexity reasons, and by the fact that the output configuration, or a representation thereof, cannot be transformed further. Very recent approaches which address those issues include graphical model inference inside deep nets so as to permit subsequent non-linear output space transformations. However, optimization of those formulations is challenging and not well understood. Here, we develop a novel model which generalizes existing approaches, such as structured prediction energy networks, and discuss a formulation which maintains applicability of existing inference techniques.

deep learning, inference, neural network, (20 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
North America > Canada (0.14)

Genre: Research Report (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.73)
(2 more...)

Add feedback

Pipe-SGD: A Decentralized Pipelined SGD Framework for Distributed Deep Net Training

Li, Youjie, Yu, Mingchao, Li, Songze, Avestimehr, Salman, Kim, Nam Sung, Schwing, Alexander

arXiv.org Machine LearningNov-8-2018

Distributed training of deep nets is an important technique to address some of the present day computing challenges like memory consumption and computational demands. Classical distributed approaches, synchronous or asynchronous, are based on the parameter server architecture, i.e., worker nodes compute gradients which are communicated to the parameter server while updated parameters are returned. Recently, distributed training with AllReduce operations gained popularity as well. While many of those operations seem appealing, little is reported about wall-clock training time improvements. In this paper, we carefully analyze the AllReduce based setup, propose timing models which include network latency, bandwidth, cluster size and compute time, and demonstrate that a pipelined training with a width of two combines the best of both synchronous and asynchronous training. Specifically, for a setup consisting of a four-node GPU cluster we show wall-clock time training improvements of up to 5.4x compared to conventional approaches.

communication, deep learning, neural network, (21 more...)

arXiv.org Machine Learning

1811.03619

Country: North America > United States > California (0.14)

Genre: Research Report (0.64)

Industry:

Information Technology (0.46)
Government > Regional Government > North America Government > United States Government (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Filters

Schwing, Alexander

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Two Body Problem: Collaborative Visual Task Completion

A Simple Baseline for Audio-Visual Scene-Aware Dialog

GradiVeQ: Vector Quantization for Bandwidth-Efficient Gradient Aggregation in Distributed CNN Training

Out of the Box: Reasoning with Graph Convolution Nets for Factual Visual Question Answering

Pipe-SGD: A Decentralized Pipelined SGD Framework for Distributed Deep Net Training

Out of the Box: Reasoning with Graph Convolution Nets for Factual Visual Question Answering

Pipe-SGD: A Decentralized Pipelined SGD Framework for Distributed Deep Net Training

GradiVeQ: Vector Quantization for Bandwidth-Efficient Gradient Aggregation in Distributed CNN Training

Deep Structured Prediction with Nonlinear Output Transformations

Pipe-SGD: A Decentralized Pipelined SGD Framework for Distributed Deep Net Training