AITopics | Jiajun Wu

Modeling Expectation Violation in Intuitive Physics with Coarse Probabilistic Object Representations

Kevin Smith, Lingjie Mei, Shunyu Yao, Jiajun Wu, Elizabeth Spelke, Josh Tenenbaum, Tomer Ullman

Neural Information Processing SystemsMar-27-2025, 02:51:59 GMT

From infancy, humans have expectations about how objects will move and interact. Even young children expect objects not to move through one another, teleport, or disappear. They are surprised by mismatches between physical expectations and perceptual observations, even in unfamiliar scenes with completely novel objects. A model that exhibits human-like understanding of physics should be similarly surprised, and adjust its beliefs accordingly. We propose ADEPT, a model that uses a coarse (approximate geometry) object-centric representation for dynamic 3D scene understanding. Inference integrates deep recognition networks, extended probabilistic physical simulation, and particle filtering for forming predictions and expectations across occlusion. We also present a new test set for measuring violations of physical expectations, using a range of scenarios derived from developmental psychology. We systematically compare ADEPT, baseline models, and human expectations on this test set. ADEPT outperforms standard network architectures in discriminating physically implausible scenes, and often performs this discrimination at the same level as people.

artificial intelligence, machine learning, object-oriented architecture, (17 more...)

Neural Information Processing Systems

Country: North America (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (0.90)
(2 more...)

Add feedback

Visual Object Networks: Image Generation with Disentangled 3D Representations

Jun-Yan Zhu, Zhoutong Zhang, Chengkai Zhang, Jiajun Wu, Antonio Torralba, Josh Tenenbaum, Bill Freeman

Neural Information Processing SystemsMar-26-2025, 14:26:50 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, texture, (18 more...)

Neural Information Processing Systems

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Visual Concept-Metaconcept Learning

Chi Han, Jiayuan Mao, Chuang Gan, Josh Tenenbaum, Jiajun Wu

Neural Information Processing SystemsMar-26-2025, 07:41:10 GMT

Humans reason with concepts and metaconcepts: we recognize red and green from visual input; we also understand that they describe the same property of objects (i.e., the color). In this paper, we propose the visual concept-metaconcept learner (VCML) for joint learning of concepts and metaconcepts from images and associated question-answer pairs. The key is to exploit the bidirectional connection between visual concepts and metaconcepts. Visual representations provide grounding cues for predicting relations between unseen pairs of concepts. Knowing that red and green describe the same property of objects, we generalize to the fact that cube and sphere also describe the same property of objects, since they both categorize the shape of objects. Meanwhile, knowledge about metaconcepts empowers visual concept learning from limited, noisy, and even biased data. From just a few examples of purple cubes we can understand a new color purple, which resembles the hue of the cubes instead of the shape of them.

machine learning, metaconcept, natural language, (17 more...)

Neural Information Processing Systems

Country: North America (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback

3D-Aware Scene Manipulation via Inverse Graphics

Shunyu Yao, Tzu Ming Hsu, Jun-Yan Zhu, Jiajun Wu, Antonio Torralba, Bill Freeman, Josh Tenenbaum

Neural Information Processing SystemsMar-26-2025, 03:53:28 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, representation, (19 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding

Kexin Yi, Jiajun Wu, Chuang Gan, Antonio Torralba, Pushmeet Kohli, Josh Tenenbaum

Neural Information Processing SystemsMar-26-2025, 02:56:27 GMT

We marry two powerful ideas: deep representation learning for visual recognition and language understanding, and symbolic program execution for reasoning. Our neural-symbolic visual question answering (NS-VQA) system first recovers a structural scene representation from the image and a program trace from the question. It then executes the program on the scene representation to obtain an answer. Incorporating symbolic structure as prior knowledge offers three unique advantages. First, executing programs on a symbolic space is more robust to long program traces; our model can solve complex reasoning tasks better, achieving an accuracy of 99.8% on the CLEVR dataset. Second, the model is more data-and memory-efficient: it performs well after learning on a small number of training data; it can also encode an image into a compact representation, requiring less storage than existing methods for offline question answering. Third, symbolic program execution offers full transparency to the reasoning process; we are thus able to interpret and diagnose each execution step.

artificial intelligence, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country: North America (0.28)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Learning to Exploit Stability for 3D Scene Parsing

Yilun Du, Zhijian Liu, Hector Basevi, Ales Leonardis, Bill Freeman, Josh Tenenbaum, Jiajun Wu

Neural Information Processing SystemsMar-25-2025, 10:15:39 GMT

Human scene understanding uses a variety of visual and non-visual cues to perform inference on object types, poses, and relations. Physics is a rich and universal cue that we exploit to enhance scene understanding. In this paper, we integrate the physical cue of stability into the learning process by looping in a physics engine into bottom-up recognition models, and apply it to the problem of 3D scene parsing. We first show that applying physics supervision to an existing scene understanding model increases performance, produces more stable predictions, and allows training to an equivalent performance level with fewer annotated training examples. We then present a novel architecture for 3D scene parsing named Prim R-CNN, learning to predict bounding boxes as well as their 3D size, translation, and rotation. With physics supervision, Prim R-CNN outperforms existing scene understanding approaches on this problem. Finally, we show that finetuning with physics supervision on unlabeled real images improves real domain transfer of models training on synthetic data.

machine learning, natural language, prediction, (19 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.81)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Learning to Reconstruct Shapes from Unseen Classes

Xiuming Zhang, Zhoutong Zhang, Chengkai Zhang, Josh Tenenbaum, Bill Freeman, Jiajun Wu

Neural Information Processing SystemsMar-23-2025, 18:39:23 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, image understanding, machine learning, (20 more...)

Neural Information Processing Systems

Country: North America (0.28)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Probabilistic Modeling of Future Frames from a Single Image

Tianfan Xue, Jiajun Wu, Katherine Bouman, Bill Freeman

Neural Information Processing SystemsFeb-11-2025, 18:33:28 GMT

We study the problem of synthesizing a number of likely future frames from a single input image. In contrast to traditional methods, which have tackled this problem in a deterministic or non-parametric way, we propose to model future frames in a probabilistic manner. Our probabilistic model makes it possible for us to sample and synthesize many possible future frames from a single input image. To synthesize realistic movement of objects, we propose a novel network structure, namely a Cross Convolutional Network; this network encodes image and motion information as feature maps and convolutional kernels, respectively. In experiments, our model performs well on synthetic data, such as 2D shapes and animated game sprites, as well as on real-world video frames. We also show that our model can be applied to visual analogy-making, and present an analysis of the learned network representations.

Add feedback

Modeling Expectation Violation in Intuitive Physics with Coarse Probabilistic Object Representations

Kevin Smith, Lingjie Mei, Shunyu Yao, Jiajun Wu, Elizabeth Spelke, Josh Tenenbaum, Tomer Ullman

Neural Information Processing SystemsJan-27-2025, 20:03:09 GMT

From infancy, humans have expectations about how objects will move and interact. Even young children expect objects not to move through one another, teleport, or disappear. They are surprised by mismatches between physical expectations and perceptual observations, even in unfamiliar scenes with completely novel objects. A model that exhibits human-like understanding of physics should be similarly surprised, and adjust its beliefs accordingly. We propose ADEPT, a model that uses a coarse (approximate geometry) object-centric representation for dynamic 3D scene understanding. Inference integrates deep recognition networks, extended probabilistic physical simulation, and particle filtering for forming predictions and expectations across occlusion. We also present a new test set for measuring violations of physical expectations, using a range of scenarios derived from developmental psychology. We systematically compare ADEPT, baseline models, and human expectations on this test set. ADEPT outperforms standard network architectures in discriminating physically implausible scenes, and often performs this discrimination at the same level as people.

artificial intelligence, machine learning, object-oriented architecture, (17 more...)

Neural Information Processing Systems

Country: North America (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (0.90)
(2 more...)

Add feedback

Visual Concept-Metaconcept Learning

Chi Han, Jiayuan Mao, Chuang Gan, Josh Tenenbaum, Jiajun Wu

Neural Information Processing SystemsJan-25-2025, 22:31:12 GMT

Humans reason with concepts and metaconcepts: we recognize red and green from visual input; we also understand that they describe the same property of objects (i.e., the color). In this paper, we propose the visual concept-metaconcept learner (VCML) for joint learning of concepts and metaconcepts from images and associated question-answer pairs. The key is to exploit the bidirectional connection between visual concepts and metaconcepts. Visual representations provide grounding cues for predicting relations between unseen pairs of concepts. Knowing that red and green describe the same property of objects, we generalize to the fact that cube and sphere also describe the same property of objects, since they both categorize the shape of objects. Meanwhile, knowledge about metaconcepts empowers visual concept learning from limited, noisy, and even biased data. From just a few examples of purple cubes we can understand a new color purple, which resembles the hue of the cubes instead of the shape of them.

machine learning, metaconcept, natural language, (17 more...)

Neural Information Processing Systems

Country: North America (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback

Filters

Collaborating Authors

Jiajun Wu

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Modeling Expectation Violation in Intuitive Physics with Coarse Probabilistic Object Representations

Visual Object Networks: Image Generation with Disentangled 3D Representations

Visual Concept-Metaconcept Learning

3D-Aware Scene Manipulation via Inverse Graphics

Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding

Learning to Exploit Stability for 3D Scene Parsing

Learning to Reconstruct Shapes from Unseen Classes

Probabilistic Modeling of Future Frames from a Single Image

Modeling Expectation Violation in Intuitive Physics with Coarse Probabilistic Object Representations

Visual Concept-Metaconcept Learning