AITopics | Wang, Weiyao

Collaborating Authors

Wang, Weiyao

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

OmniPose6D: Towards Short-Term Object Pose Tracking in Dynamic Scenes from Monocular RGB

Lin, Yunzhi, Zhao, Yipu, Chu, Fu-Jen, Chen, Xingyu, Wang, Weiyao, Tang, Hao, Vela, Patricio A., Feiszli, Matt, Liang, Kevin

arXiv.org Artificial IntelligenceOct-9-2024

To address the challenge of short-term object pose tracking in dynamic environments with monocular RGB input, we introduce a large-scale synthetic dataset OmniPose6D, crafted to mirror the diversity of real-world conditions. We additionally present a benchmarking framework for a comprehensive comparison of pose tracking algorithms. We propose a pipeline featuring an uncertainty-aware keypoint refinement network, employing probabilistic modeling to refine pose estimation. Comparative evaluations demonstrate that our approach achieves performance superior to existing baselines on real datasets, underscoring the effectiveness of our synthetic dataset and refinement technique in enhancing tracking precision in dynamic contexts. Our contributions set a new precedent for the development and assessment of object pose tracking methodologies in complex scenes.

artificial intelligence, keypoint, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2410.06694

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.87)

Add feedback

VIHE: Virtual In-Hand Eye Transformer for 3D Robotic Manipulation

Wang, Weiyao, Lei, Yutian, Jin, Shiyu, Hager, Gregory D., Zhang, Liangjun

arXiv.org Artificial IntelligenceMar-18-2024

In this work, we introduce the Virtual In-Hand Eye Transformer (VIHE), a novel method designed to enhance 3D manipulation capabilities through action-aware view rendering. VIHE autoregressively refines actions in multiple stages by conditioning on rendered views posed from action predictions in the earlier stages. These virtual in-hand views provide a strong inductive bias for effectively recognizing the correct pose for the hand, especially for challenging high-precision tasks such as peg insertion. On 18 manipulation tasks in RLBench simulated environments, VIHE achieves a new state-of-the-art, with a 12% absolute improvement, increasing from 65% to 77% over the existing state-of-the-art model using 100 demonstrations per task. In real-world scenarios, VIHE can learn manipulation tasks with just a handful of demonstrations, highlighting its practical utility. Videos and code implementation can be found at our project site: https://vihe-3d.github.io.

artificial intelligence, machine learning, prediction, (15 more...)

arXiv.org Artificial Intelligence

2403.11461

Country: North America > United States (0.14)

Genre: Research Report > Promising Solution (0.86)

Industry: Information Technology > Robotics & Automation (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

Open-world Instance Segmentation: Top-down Learning with Bottom-up Supervision

Kalluri, Tarun, Wang, Weiyao, Wang, Heng, Chandraker, Manmohan, Torresani, Lorenzo, Tran, Du

arXiv.org Artificial IntelligenceMar-9-2023

Many top-down architectures for instance segmentation achieve significant success when trained and tested on pre-defined closed-world taxonomy. However, when deployed in the open world, they exhibit notable bias towards seen classes and suffer from significant performance drop. In this work, we propose a novel approach for open world instance segmentation called bottom-Up and top-Down Open-world Segmentation (UDOS) that combines classical bottom-up segmentation algorithms within a top-down learning framework. UDOS first predicts parts of objects using a top-down network trained with weak supervision from bottom-up segmentations. The bottom-up segmentations are class-agnostic and do not overfit to specific taxonomies. The part-masks are then fed into affinity-based grouping and refinement modules to predict robust instance-level segmentations. UDOS enjoys both the speed and efficiency from the top-down architectures and the generalization ability to unseen categories from bottom-up supervision. We validate the strengths of UDOS on multiple cross-category as well as cross-dataset transfer tasks from 5 challenging datasets including MS-COCO, LVIS, ADE20k, UVO and OpenImages, achieving significant improvements over state-of-the-art across the board. Our code and models are available on our project page.

artificial intelligence, machine learning, segmentation, (19 more...)

arXiv.org Artificial Intelligence

2303.05503

Country: North America > United States (0.28)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.35)

Add feedback

Learn Proportional Derivative Controllable Latent Space from Pixels

Wang, Weiyao, Kobilarov, Marin, Hager, Gregory D.

arXiv.org Artificial IntelligenceFeb-5-2023

Recent advances in latent space dynamics model from pixels show promising progress in vision-based model predictive control (MPC). However, executing MPC in real time can be challenging due to its intensive computational cost in each timestep. We propose to introduce additional learning objectives to enforce that the learned latent space is proportional derivative controllable. In execution time, the simple PD-controller can be applied directly to the latent space encoded from pixels, to produce simple and effective control to systems with visual observations. We show that our method outperforms baseline methods to produce robust goal reaching and trajectory tracking in various environments.

artificial intelligence, latent space, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2110.08239

Country: North America > United States (0.28)

Genre: Research Report (0.64)

Industry: Energy > Oil & Gas (0.75)

Technology:

Information Technology > Artificial Intelligence > Robots (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Stabilized SVRG: Simple Variance Reduction for Nonconvex Optimization

Ge, Rong, Li, Zhize, Wang, Weiyao, Wang, Xiang

arXiv.org Machine LearningMay-1-2019

Nonconvex optimization is widely used in machine learning. Recently, for problems like matrix sensing (Bhojanapalli et al., 2016), matrix completion (Ge et al., 2016), and certain objectives for neural networks (Ge et al., 2017b), it was shown that all local minima are also globally optimal, therefore simple local search algorithms can be used to solve these problems. For a convex function f(x), a local and global minimum is achieved whenever the point has zero gradient: f(x) 0. However, for nonconvex functions, a point with zero gradient can also be a saddle point. To avoid converging to saddle points, recent results (Ge et al., 2015; Jin et al., 2017a,b) prove stronger results that show local search algorithms converge to ɛ-approximate second-order stationary points - points with small gradients and almost positive semi-definite Hessians (see Definition 1). In theory, Xu et al. (2018) and Allen-Zhu and Li (2017) independently showed that finding a second-order stationary point is not much harder than finding a first-order stationary point - they give reduction algorithms Neon/Neon2 that can converge to second-order stationary points when combined with algorithms that find first-order stationary points.

artificial intelligence, machine learning, probability, (19 more...)

arXiv.org Machine Learning

1905.00529

Genre: Research Report (0.49)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.94)

Add feedback

How You Act Tells a Lot: Privacy-Leakage Attack on Deep Reinforcement Learning

Pan, Xinlei, Wang, Weiyao, Zhang, Xiaoshuai, Li, Bo, Yi, Jinfeng, Song, Dawn

arXiv.org Machine LearningApr-24-2019

Machine learning has been widely applied to various applications, some of which involve training with privacy-sensitive data. A modest number of data breaches have been studied, including credit card information in natural language data and identities from face dataset. However, most of these studies focus on supervised learning models. As deep reinforcement learning (DRL) has been deployed in a number of real-world systems, such as indoor robot navigation, whether trained DRL policies can leak private information requires in-depth study. To explore such privacy breaches in general, we mainly propose two methods: environment dynamics search via genetic algorithm and candidate inference based on shadow policies. We conduct extensive experiments to demonstrate such privacy vulnerabilities in DRL under various settings. We leverage the proposed algorithms to infer floor plans from some trained Grid World navigation DRL agents with LiDAR perception. The proposed algorithm can correctly infer most of the floor plans and reaches an average recovery rate of 95.83% using policy gradient trained agents. In addition, we are able to recover the robot configuration in continuous control environments and an autonomous driving simulator with high accuracy. To the best of our knowledge, this is the first work to investigate privacy leakage in DRL settings and we show that DRL-based agents do potentially leak privacy-sensitive information from the trained policies.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Machine Learning

1904.11082

Country: North America > United States > Illinois (0.14)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

JointGAN: Multi-Domain Joint Distribution Learning with Generative Adversarial Nets

Pu, Yunchen, Dai, Shuyang, Gan, Zhe, Wang, Weiyao, Wang, Guoyin, Zhang, Yizhe, Henao, Ricardo, Carin, Lawrence

arXiv.org Machine LearningJun-8-2018

A new generative adversarial network is developed for joint distribution matching. Distinct from most existing approaches, that only learn conditional distributions, the proposed model aims to learn a joint distribution of multiple random variables (domains). This is achieved by learning to sample from conditional distributions between the domains, while simultaneously learning to sample from the marginals of each individual domain. The proposed framework consists of multiple generators and a single softmax-based critic, all jointly trained via adversarial learning. From a simple noise source, the proposed framework allows synthesis of draws from the marginals, conditional draws given observations from a subset of random variables, or complete draws from the full joint distribution. Most examples considered are for joint analysis of two domains, with examples for three domains also presented.

deep learning, joint distribution, neural network, (17 more...)

arXiv.org Machine Learning

1806.02978

Country: North America > United States (0.93)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Adversarial Symmetric Variational Autoencoder

Pu, Yuchen, Wang, Weiyao, Henao, Ricardo, Chen, Liqun, Gan, Zhe, Li, Chunyuan, Carin, Lawrence

Neural Information Processing SystemsDec-31-2017

A new form of variational autoencoder (VAE) is developed, in which the joint distribution of data and codes is considered in two (symmetric) forms: (i) from observed data fed through the encoder to yield codes, and (ii) from latent codes drawn from a simple prior and propagated through the decoder to manifest data. Lower bounds are learned for marginal log-likelihood fits observed data and latent codes. When learning with the variational bound, one seeks to minimize the symmetric Kullback-Leibler divergence of joint density functions from (i) and (ii), while simultaneously seeking to maximize the two marginal log-likelihoods. To facilitate learning, a new form of adversarial training is developed. An extensive set of experiments is performed, in which we demonstrate state-of-the-art data reconstruction and generation on several image benchmarks datasets.

deep learning, latent code, neural network, (19 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Triangle Generative Adversarial Networks

Gan, Zhe, Chen, Liqun, Wang, Weiyao, Pu, Yuchen, Zhang, Yizhe, Liu, Hao, Li, Chunyuan, Carin, Lawrence

Neural Information Processing SystemsDec-31-2017

A Triangle Generative Adversarial Network ($\Delta$-GAN) is developed for semi-supervised cross-domain joint distribution matching, where the training data consists of samples from each domain, and supervision of domain correspondence is provided by only a few paired samples. $\Delta$-GAN consists of four neural networks, two generators and two discriminators. The generators are designed to learn the two-way conditional distributions between the two domains, while the discriminators implicitly define a ternary discriminative function, which is trained to distinguish real data pairs and two kinds of fake data pairs. The generators and discriminators are trained together using adversarial learning. Under mild assumptions, in theory the joint distributions characterized by the two generators concentrate to the data distribution. In experiments, three different kinds of domain pairs are considered, image-label, image-image and image-attribute pairs. Experiments on semi-supervised image classification, image-to-image translation and attribute-based image generation demonstrate the superiority of the proposed approach.

artificial intelligence, neural network, triple gan, (15 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Industry: Leisure & Entertainment (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback