AITopics | Wang, Yongliang

Collaborating Authors

Wang, Yongliang

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Stick to Facts: Towards Fidelity-oriented Product Description Generation

Chan, Zhangming, Chen, Xiuying, Wang, Yongliang, Li, Juntao, Zhang, Zhiqiang, Gai, Kun, Zhao, Dongyan, Yan, Rui

arXiv.org Artificial IntelligenceMar-12-2025

Different from other text generation tasks, in product description generation, it is of vital importance to generate faithful descriptions that stick to the product attribute information. However, little attention has been paid to this problem. To bridge this gap, we propose a model named Fidelity-oriented Product Description Generator (FPDG). FPDG takes the entity label of each word into account, since the product attribute information is always conveyed by entity words. Specifically, we first propose a Recurrent Neural Network (RNN) decoder based on the Entity-label-guided Long Short-Term Memory (ELSTM) cell, taking both the embedding and the entity label of each word as input. Second, we establish a keyword memory that stores the entity labels as keys and keywords as values, allowing FPDG to attend to keywords by attending to their entity labels. Experiments conducted on a large-scale real-world product description dataset show that our model achieves state-of-the-art performance in terms of both traditional generation metrics and human evaluations. Specifically, FPDG increases the fidelity of the generated descriptions by 25%.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2503.08454

Country: Asia > China (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.91)

Add feedback

ImageRef-VL: Enabling Contextual Image Referencing in Vision-Language Models

Yi, Jingwei, Yin, Junhao, Xu, Ju, Bao, Peng, Wang, Yongliang, Fan, Wei, Wang, Hao

arXiv.org Artificial IntelligenceJan-20-2025

Vision-Language Models (VLMs) have demonstrated remarkable capabilities in understanding multimodal inputs and have been widely integrated into Retrieval-Augmented Generation (RAG) based conversational systems. While current VLM-powered chatbots can provide textual source references in their responses, they exhibit significant limitations in referencing contextually relevant images during conversations. In this paper, we introduce Contextual Image Reference -- the ability to appropriately reference relevant images from retrieval documents based on conversation context -- and systematically investigate VLMs' capability in this aspect. We conduct the first evaluation for contextual image referencing, comprising a dedicated testing dataset and evaluation metrics. Furthermore, we propose ImageRef-VL, a method that significantly enhances open-source VLMs' image referencing capabilities through instruction fine-tuning on a large-scale, manually curated multimodal conversation dataset. Experimental results demonstrate that ImageRef-VL not only outperforms proprietary models but also achieves an 88% performance improvement over state-of-the-art open-source VLMs in contextual image referencing tasks. Our code is available at https://github.com/bytedance/ImageRef-VL.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2501.12418

Country: Asia (0.14)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Learning Dual-Arm Push and Grasp Synergy in Dense Clutter

Wang, Yongliang, Kasaei, Hamidreza

arXiv.org Artificial IntelligenceDec-5-2024

Robotic grasping in densely cluttered environments is challenging due to scarce collision-free grasp affordances. Non-prehensile actions can increase feasible grasps in cluttered environments, but most research focuses on single-arm rather than dual-arm manipulation. Policies from single-arm systems fail to fully leverage the advantages of dual-arm coordination. We propose a target-oriented hierarchical deep reinforcement learning (DRL) framework that learns dual-arm push-grasp synergy for grasping objects to enhance dexterous manipulation in dense clutter. Our framework maps visual observations to actions via a pre-trained deep learning backbone and a novel CNN-based DRL model, trained with Proximal Policy Optimization (PPO), to develop a dual-arm push-grasp strategy. The backbone enhances feature mapping in densely cluttered environments. A novel fuzzy-based reward function is introduced to accelerate efficient strategy learning. Our system is developed and trained in Isaac Gym and then tested in simulations and on a real robot. Experimental results show that our framework effectively maps visual data to dual push-grasp motions, enabling the dual-arm system to grasp target objects in complex environments. Compared to other methods, our approach generates 6-DoF grasp candidates and enables dual-arm push actions, mimicking human behavior. Results show that our method efficiently completes tasks in densely cluttered environments. https://sites.google.com/view/pg4da/home

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2412.04052

Genre: Research Report > New Finding (0.54)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.88)

Add feedback

IPPO: Obstacle Avoidance for Robotic Manipulators in Joint Space via Improved Proximal Policy Optimization

Wang, Yongliang, Kasaei, Hamidreza

arXiv.org Artificial IntelligenceFeb-9-2023

Reaching tasks with random targets and obstacles is a challenging task for robotic manipulators. In this study, we propose a novel model-free reinforcement learning approach based on proximal policy optimization (PPO) for training a deep policy to map the task space to the joint space of a 6-DoF manipulator. To facilitate the training process in a large workspace, we develop an efficient representation of environmental inputs and outputs. The calculation of the distance between obstacles and manipulator links is incorporated into the state representation using a geometry-based method. Additionally, to enhance the performance of the model in reaching tasks, we introduce the action ensembles method and design the policy to directly participate in value function updates in PPO. To overcome the challenges associated with training in real-robot environments, we develop a simulation environment in Gazebo to train the model as it produces a smaller Sim-to-Real gap compared to other simulators. However, training in Gazebo is time-intensive. To address this issue, we propose a Sim-to-Sim method to significantly reduce the training time. The trained model is then directly applied in a real-robot setup without fine-tuning. To evaluate the performance of the proposed approach, we perform several rounds of experiments in both simulated and real robots. We also compare the performance of the proposed approach with six baselines. The experimental results demonstrate the effectiveness of the proposed method in performing reaching tasks with and without obstacles. our method outperformed the selected baselines by a large margin in different reaching task scenarios. A video of these experiments has been attached to the paper as supplementary material.

machine learning, obstacle, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2210.00803

Genre: Research Report > New Finding (1.00)

Industry:

Energy > Oil & Gas > Upstream (0.46)
Leisure & Entertainment > Games > Computer Games (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Robots > Manipulation (0.84)

Add feedback

Towards Personalized Review Summarization by Modeling Historical Reviews from Customer and Product Separately

Cheng, Xin, Gao, Shen, Zhang, Yuchi, Wang, Yongliang, Chen, Xiuying, Li, Mingzhe, Zhao, Dongyan, Yan, Rui

arXiv.org Artificial IntelligenceJan-27-2023

Review summarization is a non-trivial task that aims to summarize the main idea of the product review in the E-commerce website. Different from the document summary which only needs to focus on the main facts described in the document, review summarization should not only summarize the main aspects mentioned in the review but also reflect the personal style of the review author. Although existing review summarization methods have incorporated the historical reviews of both customer and product, they usually simply concatenate and indiscriminately model this two heterogeneous information into a long sequence. Moreover, the rating information can also provide a high-level abstraction of customer preference, it has not been used by the majority of methods. In this paper, we propose the Heterogeneous Historical Review aware Review Summarization Model (HHRRS) which separately models the two types of historical reviews with the rating information by a graph reasoning module with a contrastive loss. We employ a multi-task framework that conducts the review sentiment classification and summarization jointly. Extensive experiments on four benchmark datasets demonstrate the superiority of HHRRS on both tasks.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2301.11682

Country:

Europe (1.00)
North America > United States > Minnesota (0.28)

Genre:

Overview (1.00)
Research Report > Experimental Study (0.46)
Research Report > New Finding (0.46)

Industry: Information Technology > Services > e-Commerce Services (0.54)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

3D LiDAR Aided GNSS NLOS Mitigation for Reliable GNSS-RTK Positioning in Urban Canyons

Liu, Xikun, Wen, Weisong, Huang, Feng, Gao, Han, Wang, Yongliang, Hsu, Li-Ta

arXiv.org Artificial IntelligenceDec-11-2022

GNSS and LiDAR odometry are complementary as they provide absolute and relative positioning, respectively. Their integration in a loosely-coupled manner is straightforward but is challenged in urban canyons due to the GNSS signal reflections. Recent proposed 3D LiDAR-aided (3DLA) GNSS methods employ the point cloud map to identify the non-line-of-sight (NLOS) reception of GNSS signals. This facilitates the GNSS receiver to obtain improved urban positioning but not achieve a sub-meter level. GNSS real-time kinematics (RTK) uses carrier phase measurements to obtain decimeter-level positioning. In urban areas, the GNSS RTK is not only challenged by multipath and NLOS-affected measurement but also suffers from signal blockage by the building. The latter will impose a challenge in solving the ambiguity within the carrier phase measurements. In the other words, the model observability of the ambiguity resolution (AR) is greatly decreased. This paper proposes to generate virtual satellite (VS) measurements using the selected LiDAR landmarks from the accumulated 3D point cloud maps (PCM). These LiDAR-PCM-made VS measurements are tightly-coupled with GNSS pseudorange and carrier phase measurements. Thus, the VS measurements can provide complementary constraints, meaning providing low-elevation-angle measurements in the across-street directions. The implementation is done using factor graph optimization to solve an accurate float solution of the ambiguity before it is fed into LAMBDA. The effectiveness of the proposed method has been validated by the evaluation conducted on our recently open-sourced challenging dataset, UrbanNav. The result shows the fix rate of the proposed 3DLA GNSS RTK is about 30% while the conventional GNSS-RTK only achieves about 14%. In addition, the proposed method achieves sub-meter positioning accuracy in most of the data collected in challenging urban areas.

artificial intelligence, machine learning, satellite, (17 more...)

arXiv.org Artificial Intelligence

2212.05477

Country: Asia > China (0.94)

Genre: Research Report > New Finding (0.48)

Industry: Transportation > Air (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Sensing and Signal Processing (0.93)
(2 more...)

Add feedback

HeteroQA: Learning towards Question-and-Answering through Multiple Information Sources via Heterogeneous Graph Modeling

Gao, Shen, Zhang, Yuchi, Wang, Yongliang, Dong, Yang, Chen, Xiuying, Zhao, Dongyan, Yan, Rui

arXiv.org Artificial IntelligenceDec-27-2021

Community Question Answering (CQA) is a well-defined task that can be used in many scenarios, such as E-Commerce and online user community for special interests. In these communities, users can post articles, give comment, raise a question and answer it. These data form the heterogeneous information sources where each information source have their own special structure and context (comments attached to an article or related question with answers). Most of the CQA methods only incorporate articles or Wikipedia to extract knowledge and answer the user's question. However, various types of information sources in the community are not fully explored by these CQA methods and these multiple information sources (MIS) can provide more related knowledge to user's questions. Thus, we propose a question-aware heterogeneous graph transformer to incorporate the MIS in the user community to automatically generate the answer. To evaluate our proposed method, we conduct the experiments on two datasets: $\text{MSM}^{\text{plus}}$ the modified version of benchmark dataset MS-MARCO and the AntQA dataset which is the first large-scale CQA dataset with four types of MIS. Extensive experiments on two datasets show that our model outperforms all the baselines in terms of all the metrics.

machine learning, mis, question answering, (22 more...)

arXiv.org Artificial Intelligence

2112.13597

Country: North America > United States > Michigan (0.14)

Genre: Research Report (1.00)

Industry: Information Technology > Services > e-Commerce Services (0.35)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

RIO: Rotation-equivariance supervised learning of robust inertial odometry

Zhou, Caifa, Cao, Xiya, Zeng, Dandan, Wang, Yongliang

arXiv.org Machine LearningNov-23-2021

This paper introduces rotation-equivariance as a self-supervisor to train inertial odometry models. We demonstrate that the self-supervised scheme provides a powerful supervisory signal at training phase as well as at inference stage. It reduces the reliance on massive amounts of labeled data for training a robust model and makes it possible to update the model using various unlabeled data. Further, we propose adaptive Test-Time Training (TTT) based on uncertainty estimations in order to enhance the generalizability of the inertial odometry to various unseen data. We show in experiments that the Rotation-equivariance-supervised Inertial Odometry (RIO) trained with 30% data achieves on par performance with a model trained with the whole database. Adaptive TTT improves models performance in all cases and makes more than 25% improvements under several scenarios.

artificial intelligence, estimation, machine learning, (18 more...)

arXiv.org Machine Learning

2111.11676

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.46)

Add feedback

Measuring Uncertainty in Signal Fingerprinting with Gaussian Processes Going Deep

Guan, Ran, Zhang, Andi, Li, Mengchao, Wang, Yongliang

arXiv.org Artificial IntelligenceSep-13-2021

In indoor positioning, signal fluctuation is highly location-dependent. However, signal uncertainty is one critical yet commonly overlooked dimension of the radio signal to be fingerprinted. This paper reviews the commonly used Gaussian Processes (GP) for probabilistic positioning and points out the pitfall of using GP to model signal fingerprint uncertainty. This paper also proposes Deep Gaussian Processes (DGP) as a more informative alternative to address the issue. How DGP better measures uncertainty in signal fingerprinting is evaluated via simulated and realistically collected datasets.

artificial intelligence, fingerprint, machine learning, (12 more...)

arXiv.org Artificial Intelligence

2109.0436

Country: Europe (0.15)

Genre:

Research Report (0.90)
Overview (0.74)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining (0.68)

Add feedback

SeaD: End-to-end Text-to-SQL Generation with Schema-aware Denoising

Xuan, Kuan, Wang, Yongbo, Wang, Yongliang, Wen, Zujie, Dong, Yang

arXiv.org Machine LearningMay-17-2021

In text-to-SQL task, seq-to-seq models often lead to sub-optimal performance due to limitations in their architecture. In this paper, we present a simple yet effective approach that adapts transformer-based seq-to-seq model to robust text-to-SQL generation. Instead of inducing constraint to decoder or reformat the task as slot-filling, we propose to train seq-to-seq model with Schema aware Denoising (SeaD), which consists of two denoising objectives that train model to either recover input or predict output from two novel erosion and shuffle noises. These denoising objectives acts as the auxiliary tasks for better modeling the structural data in S2S generation. In addition, we improve and propose a clause-sensitive execution guided (EG) decoding strategy to overcome the limitation of EG decoding for generative model. The experiments show that the proposed method improves the performance of seq-to-seq model in both schema linking and grammar correctness and establishes new state-of-the-art on WikiSQL benchmark. The results indicate that the capacity of vanilla seq-to-seq architecture for text-to-SQL may have been under-estimated.

arxiv preprint arxiv, deep learning, neural network, (18 more...)

arXiv.org Machine Learning

2105.07911

Country: Asia > China (0.14)

Genre: Research Report (0.82)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback