AITopics | Jianwei Yang

Best of Both Worlds: Transferring Knowledge from Discriminative Learning to a Generative Visual Dialog Model

Jiasen Lu, Anitha Kannan, Jianwei Yang, Devi Parikh, Dhruv Batra

Neural Information Processing SystemsMay-27-2025, 22:53:10 GMT

Neural Information Processing Systems http://nips.cc/

arxiv preprint arxiv, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country: North America > United States (0.14)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Cross-channel Communication Networks

Jianwei Yang, Zhile Ren, Chuang Gan, Hongyuan Zhu, Devi Parikh

Neural Information Processing SystemsMar-27-2025, 00:21:17 GMT

While a lot of progress has been made by making networks deeper, filters at each layer independently generate responses given the input and do not communicate with each other. In this paper, we introduce a novel network unit called Cross-channel Communication (C3) block, a simple yet effective module to encourage the communication across filters within the same layer. The C3 block enables filters to exchange information through a micro neural network, which consists of a feature encoder, a message passer, and a feature decoder, before sending the information to the next layer. With C3 block, each channel response is modulated by accounting for the responses at other channels. Extensive experiments on multiple vision tasks show that our proposed block brings improvements for different CNN architectures, and learns more diverse and complementary representations.

artificial intelligence, computer vision, machine learning, (15 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Communications > Networks (0.83)

Add feedback

Cross-channel Communication Networks

Jianwei Yang, Zhile Ren, Chuang Gan, Hongyuan Zhu, Devi Parikh

Neural Information Processing SystemsJan-27-2025, 10:20:59 GMT

While a lot of progress has been made by making networks deeper, filters at each layer independently generate responses given the input and do not communicate with each other. In this paper, we introduce a novel network unit called Cross-channel Communication (C3) block, a simple yet effective module to encourage the communication across filters within the same layer. The C3 block enables filters to exchange information through a micro neural network, which consists of a feature encoder, a message passer, and a feature decoder, before sending the information to the next layer. With C3 block, each channel response is modulated by accounting for the responses at other channels. Extensive experiments on multiple vision tasks show that our proposed block brings improvements for different CNN architectures, and learns more diverse and complementary representations.

artificial intelligence, computer vision, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Communications > Networks (0.83)

Add feedback

Hierarchical Question-Image Co-Attention for Visual Question Answering

Jiasen Lu, Jianwei Yang, Dhruv Batra, Devi Parikh

Neural Information Processing SystemsJan-20-2025, 16:27:44 GMT

A number of recent works have proposed attention models for Visual Question Answering (VQA) that generate spatial maps highlighting image regions relevant to answering the question. In this paper, we argue that in addition to modeling "where to look" or visual attention, it is equally important to model "what words to listen to" or question attention. We present a novel co-attention model for VQA that jointly reasons about image and question attention.

machine learning, mechanism, natural language, (19 more...)

Neural Information Processing Systems

Country:

North America > United States (0.47)
Europe (0.28)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.74)

Add feedback

Best of Both Worlds: Transferring Knowledge from Discriminative Learning to a Generative Visual Dialog Model

Jiasen Lu, Anitha Kannan, Jianwei Yang, Devi Parikh, Dhruv Batra

Neural Information Processing SystemsOct-2-2024, 16:55:31 GMT

We present a novel training framework for neural sequence models, particularly for grounded dialog generation. The standard training paradigm for these models is maximum likelihood estimation (MLE), or minimizing the cross-entropy of the human responses. Across a variety of domains, a recurring problem with MLE trained generative neural dialog models (G) is that they tend to produce'safe' and generic responses ('I don't know', 'I can't tell'). In contrast, discriminative dialog models (D) that are trained to rank a list of candidate human responses outperform their generative counterparts; in terms of automatic metrics, diversity, and informativeness of the responses. However, D is not useful in practice since it can not be deployed to have real conversations with users. Our work aims to achieve the best of both worlds - the practical usefulness of G and the strong performance of D - via knowledge transfer from D to G. Our primary contribution is an end-to-end trainable generative visual dialog model, where G receives gradients from D as a perceptual (not adversarial) loss of the sequence sampled from G. We leverage the recently proposed Gumbel-Softmax (GS) approximation to the discrete distribution - specifically, a RNN augmented with a sequence of GS samplers, coupled with the straight-through gradient estimator to enable end-to-end differentiability. We also introduce a stronger encoder for visual dialog, and employ a self-attention mechanism for answer encoding along with a metric learning loss to aid D in better capturing semantic similarities in answer responses. Overall, our proposed model outperforms state-of-the-art on the VisDial dataset by a significant margin (2.67% on recall@10).

Add feedback

Filters

Collaborating Authors

Jianwei Yang

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Best of Both Worlds: Transferring Knowledge from Discriminative Learning to a Generative Visual Dialog Model

Cross-channel Communication Networks

Cross-channel Communication Networks

Hierarchical Question-Image Co-Attention for Visual Question Answering

Best of Both Worlds: Transferring Knowledge from Discriminative Learning to a Generative Visual Dialog Model