AITopics | Zhou, Zheming

Collaborating Authors

Zhou, Zheming

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

The Power of Next-Frame Prediction for Learning Physical Laws

Winterbottom, Thomas, Hudson, G. Thomas, Kluvanec, Daniel, Slack, Dean, Sterling, Jamie, Shentu, Junjie, Xiao, Chenghao, Zhou, Zheming, Moubayed, Noura Al

arXiv.org Artificial IntelligenceMay-21-2024

Next-frame prediction is a useful and powerful method for modelling and understanding the dynamics of video data. Inspired by the empirical success of causal language modelling and next-token prediction in language modelling, we explore the extent to which next-frame prediction serves as a strong foundational learning strategy (analogous to language modelling) for inducing an understanding of the visual world. In order to quantify the specific visual understanding induced by next-frame prediction, we introduce six diagnostic simulation video datasets derived from fundamental physical laws created by varying physical constants such as gravity and mass. We demonstrate that our models trained only on next-frame prediction are capable of predicting the value of these physical constants (e.g. gravity) without having been trained directly to learn these constants via a regression task. We find that the generative training phase alone induces a model state that can predict physical constants significantly better than that of a random model, improving the loss by a factor of between 1.28 to 6.24. We conclude that next-frame prediction shows great promise as a general learning strategy to induce understanding of the many `laws' that govern the visual domain without the need for explicit labelling.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2405.1745

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.94)

Add feedback

Tabletop Transparent Scene Reconstruction via Epipolar-Guided Optical Flow with Monocular Depth Completion Prior

Chen, Xiaotong, Zhou, Zheming, Deng, Zhuo, Ghasemalizadeh, Omid, Sun, Min, Kuo, Cheng-Hao, Sen, Arnie

arXiv.org Artificial IntelligenceOct-15-2023

Reconstructing transparent objects using affordable RGB-D cameras is a persistent challenge in robotic perception due to inconsistent appearances across views in the RGB domain and inaccurate depth readings in each single-view. We introduce a two-stage pipeline for reconstructing transparent objects tailored for mobile platforms. In the first stage, off-the-shelf monocular object segmentation and depth completion networks are leveraged to predict the depth of transparent objects, furnishing single-view shape prior. Subsequently, we propose Epipolar-guided Optical Flow (EOF) to fuse several single-view shape priors from the first stage to a cross-view consistent 3D reconstruction given camera poses estimated from opaque part of the scene. Our key innovation lies in EOF which employs boundary-sensitive sampling and epipolar-line constraints into optical flow to accurately establish 2D correspondences across multiple views on transparent objects. Quantitative evaluations demonstrate that our pipeline significantly outperforms baseline methods in 3D reconstruction quality, paving the way for more adept robotic perception and interaction with transparent objects.

artificial intelligence, conference, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2310.09956

Country: North America > United States > Michigan (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Plenoptic Monte Carlo Object Localization for Robot Grasping under Layered Translucency

Zhou, Zheming, Sui, Zhiqiang, Jenkins, Odest Chadwicke

arXiv.org Artificial IntelligenceJun-25-2018

In order to fully function in human environments, robot perception will need to account for the uncertainty caused by translucent materials. Translucency poses several open challenges in the form of transparent objects (e.g., drinking glasses), refractive media (e.g., water), and diffuse partial occlusions (e.g., objects behind stained glass panels). This paper presents Plenoptic Monte Carlo Localization (PMCL) as a method for localizing object poses in the presence of translucency using plenoptic (light-field) observations. We propose a new depth descriptor, the Depth Likelihood Volume (DLV), and its use within a Monte Carlo object localization algorithm. We present results of localizing and manipulating objects with translucent materials and objects occluded by layers of translucency. Our PMCL implementation uses observations from a Lytro first generation light field camera to allow a Michigan Progress Fetch robot to perform grasping.

artificial intelligence, localization, translucency, (15 more...)

arXiv.org Artificial Intelligence

1806.09769

Country:

North America > United States > Oklahoma > Beaver County (0.50)
North America > United States > Michigan (0.48)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Robots > Manipulation (0.82)

Add feedback