AITopics | Wang, Pengchao

Collaborating Authors

Wang, Pengchao

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations

Hu, Yucheng, Guo, Yanjiang, Wang, Pengchao, Chen, Xiaoyu, Wang, Yen-Jen, Zhang, Jianke, Sreenath, Koushil, Lu, Chaochao, Chen, Jianyu

arXiv.org Artificial IntelligenceDec-19-2024

Recent advancements in robotics have focused on developing generalist policies capable of performing multiple tasks. Typically, these policies utilize pre-trained vision encoders to capture crucial information from current observations. However, previous vision encoders, which trained on two-image contrastive learning or single-image reconstruction, can not perfectly capture the sequential information essential for embodied tasks. Recently, video diffusion models (VDMs) have demonstrated the capability to accurately predict future image sequences, exhibiting a good understanding of physical dynamics. Motivated by the strong visual prediction capabilities of VDMs, we hypothesize that they inherently possess visual representations that reflect the evolution of the physical world, which we term predictive visual representations. Building on this hypothesis, we propose the Video Prediction Policy (VPP), a generalist robotic policy conditioned on the predictive visual representations from VDMs. To further enhance these representations, we incorporate diverse human or robotic manipulation datasets, employing unified video-generation training objectives. VPP consistently outperforms existing methods across two simulated and two real-world benchmarks. Notably, it achieves a 28.1\% relative improvement in the Calvin ABC-D benchmark compared to the previous state-of-the-art and delivers a 28.8\% increase in success rates for complex real-world dexterous manipulation tasks.

artificial intelligence, machine learning, representation, (14 more...)

arXiv.org Artificial Intelligence

2412.14803

Genre: Research Report > New Finding (0.68)

Industry: Leisure & Entertainment > Sports (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Robots > Manipulation (0.34)

Add feedback

Multimodal Short Video Rumor Detection System Based on Contrastive Learning

Yang, Yuxing, Zhao, Junhao, Wang, Siyi, Min, Xiangyu, Wang, Pengchao, Wang, Haizhou

arXiv.org Artificial IntelligenceMay-17-2023

With the rise of short video platforms as prominent channels for news dissemination, major platforms in China have gradually evolved into fertile grounds for the proliferation of fake news. However, distinguishing short video rumors poses a significant challenge due to the substantial amount of information and shared features among videos, resulting in homogeneity. To address the dissemination of short video rumors effectively, our research group proposes a methodology encompassing multimodal feature fusion and the integration of external knowledge, considering the merits and drawbacks of each algorithm. The proposed detection approach entails the following steps: (1) creation of a comprehensive dataset comprising multiple features extracted from short videos; (2) development of a multimodal rumor detection model: first, we employ the Temporal Segment Networks (TSN) video coding model to extract video features, followed by the utilization of Optical Character Recognition (OCR) and Automatic Speech Recognition (ASR) to extract textual features. Subsequently, the BERT model is employed to fuse textual and video features; (3) distinction is achieved through contrast learning: we acquire external knowledge by crawling relevant sources and leverage a vector database to incorporate this knowledge into the classification output. Our research process is driven by practical considerations, and the knowledge derived from this study will hold significant value in practical scenarios, such as short video rumor identification and the management of social opinions.

information, machine learning, pattern recognition, (22 more...)

arXiv.org Artificial Intelligence

2304.08401

Country: Asia > China (0.34)

Genre: Research Report (0.50)

Industry: Media > News (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(3 more...)

Add feedback

Self-supervised Pretraining of Visual Features in the Wild

Goyal, Priya, Caron, Mathilde, Lefaudeux, Benjamin, Xu, Min, Wang, Pengchao, Pai, Vivek, Singh, Mannat, Liptchinsky, Vitaliy, Misra, Ishan, Joulin, Armand, Bojanowski, Piotr

arXiv.org Artificial IntelligenceMar-5-2021

Recently, self-supervised learning methods like MoCo, SimCLR, BYOL and SwAV have reduced the gap with supervised methods. These results have been achieved in a control environment, that is the highly curated ImageNet dataset. However, the premise of self-supervised learning is that it can learn from any random image and from any unbounded dataset. In this work, we explore if self-supervision lives to its expectation by training large models on random, uncurated images with no supervision. Our final SElf-supERvised (SEER) model, a RegNetY with 1.3B parameters trained on 1B random images with 512 GPUs achieves 84.2% top-1 accuracy, surpassing the best self-supervised pretrained model by 1% and confirming that self-supervised learning works in a real world setting. Interestingly, we also observe that self-supervised models are good few-shot learners achieving 77.9% top-1 with access to only 10% of ImageNet. Code: https://github.com/facebookresearch/vissl

architecture, deep learning, neural network, (19 more...)

arXiv.org Artificial Intelligence

2103.01988

Country: Europe > France (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.65)

Add feedback