AITopics | Wang, Yanan

Collaborating Authors

Wang, Yanan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Strawberry Robotic Operation Interface: An Open-Source Device for Collecting Dexterous Manipulation Data in Robotic Strawberry Farming

Hou, Linsheng, Lu, Wenwu, Wang, Yanan, Peng, Chen, Fei, Zhenghao

arXiv.org Artificial IntelligenceJan-28-2025

The strawberry farming is labor-intensive, particularly in tasks requiring dexterous manipulation such as picking occluded strawberries. To address this challenge, we present the Strawberry Robotic Operation Interface (SROI), an open-source device designed for collecting dexterous manipulation data in robotic strawberry farming. The SROI features a handheld unit with a modular end effector, a stereo robotic camera, enabling the easy collection of demonstration data in field environments. A data post-processing pipeline is introduced to extract spatial trajectories and gripper states from the collected data. Additionally, we release an open-source dataset of strawberry picking demonstrations to facilitate research in dexterous robotic manipulation. The SROI represents a step toward automating complex strawberry farming tasks, reducing reliance on manual labor.

artificial intelligence, strawberry, trajectory, (12 more...)

arXiv.org Artificial Intelligence

2501.16717

Country:

North America (0.46)
Asia > China > Zhejiang Province (0.29)

Genre: Research Report (0.50)

Industry: Food & Agriculture > Agriculture (0.47)

Technology: Information Technology > Artificial Intelligence > Robots > Manipulation (0.82)

Add feedback

Learn from Foundation Model: Fruit Detection Model without Manual Annotation

Wang, Yanan, Fei, Zhenghao, Li, Ruichen, Ying, Yibin

arXiv.org Artificial IntelligenceNov-25-2024

Recent breakthroughs in large foundation models have enabled the possibility of transferring knowledge pre-trained on vast datasets to domains with limited data availability. Agriculture is one of the domains that lacks sufficient data. This study proposes a framework to train effective, domain-specific, small models from foundation models without manual annotation. Our approach begins with SDM (Segmentation-Description-Matching), a stage that leverages two foundation models: SAM2 (Segment Anything in Images and Videos) for segmentation and OpenCLIP (Open Contrastive Language-Image Pretraining) for zero-shot open-vocabulary classification. In the second stage, a novel knowledge distillation mechanism is utilized to distill compact, edge-deployable models from SDM, enhancing both inference speed and perception accuracy. The complete method, termed SDM-D (Segmentation-Description-Matching-Distilling), demonstrates strong performance across various fruit detection tasks object detection, semantic segmentation, and instance segmentation) without manual annotation. It nearly matches the performance of models trained with abundant labels. Notably, SDM-D outperforms open-set detection methods such as Grounding SAM and YOLO-World on all tested fruit detection datasets. Additionally, we introduce MegaFruits, a comprehensive fruit segmentation dataset encompassing over 25,000 images, and all code and datasets are made publicly available at https://github.com/AgRoboticsResearch/SDM-D.git.

machine learning, natural language, segmentation, (21 more...)

arXiv.org Artificial Intelligence

2411.16196

Country:

North America > United States (0.46)
Europe (0.46)
Asia > China > Zhejiang Province (0.14)

Genre: Research Report (1.00)

Industry: Food & Agriculture > Agriculture (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.91)

Add feedback

TTT4Rec: A Test-Time Training Approach for Rapid Adaption in Sequential Recommendation

Yang, Zhaoqi, Wang, Yanan, Ge, Yong

arXiv.org Artificial IntelligenceSep-27-2024

Sequential recommendation tasks, which aim to predict the next item a user will interact with, typically rely on models trained solely on historical data. However, in real-world scenarios, user behavior can fluctuate in the long interaction sequences, and training data may be limited to model this dynamics. To address this, Test-Time Training (TTT) offers a novel approach by using self-supervised learning during inference to dynamically update model parameters. This allows the model to adapt to new user interactions in real-time, leading to more accurate recommendations. In this paper, we propose TTT4Rec, a sequential recommendation framework that integrates TTT to better capture dynamic user behavior. By continuously updating model parameters during inference, TTT4Rec is particularly effective in scenarios where user interaction sequences are long, training data is limited, or user behavior is highly variable. We evaluate TTT4Rec on three widely-used recommendation datasets, demonstrating that it achieves performance on par with or exceeding state-of-the-art models. The codes are available at https://github.com/ZhaoqiZachYang/TTT4Rec.

artificial intelligence, machine learning, recommendation, (19 more...)

arXiv.org Artificial Intelligence

2409.19142

Country: North America > United States > Arizona (0.30)

Genre: Research Report > Promising Solution (0.54)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Anchor-aware Deep Metric Learning for Audio-visual Retrieval

Zeng, Donghuo, Wang, Yanan, Ikeda, Kazushi, Yu, Yi

arXiv.org Artificial IntelligenceApr-21-2024

Metric learning minimizes the gap between similar (positive) pairs of data points and increases the separation of dissimilar (negative) pairs, aiming at capturing the underlying data structure and enhancing the performance of tasks like audio-visual cross-modal retrieval (AV-CMR). Recent works employ sampling methods to select impactful data points from the embedding space during training. However, the model training fails to fully explore the space due to the scarcity of training data points, resulting in an incomplete representation of the overall positive and negative distributions. In this paper, we propose an innovative Anchor-aware Deep Metric Learning (AADML) method to address this challenge by uncovering the underlying correlations among existing data points, which enhances the quality of the shared embedding space. Specifically, our method establishes a correlation graph-based manifold structure by considering the dependencies between each sample as the anchor and its semantically similar samples. Through dynamic weighting of the correlations within this underlying manifold structure using an attention-driven mechanism, Anchor Awareness (AA) scores are obtained for each anchor. These AA scores serve as data proxies to compute relative distances in metric learning approaches. Extensive experiments conducted on two audio-visual benchmark datasets demonstrate the effectiveness of our proposed AADML method, significantly surpassing state-of-the-art models. Furthermore, we investigate the integration of AA proxies with various metric learning methods, further highlighting the efficacy of our approach.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2404.13789

Country:

Europe (0.93)
Asia (0.71)
North America > United States > California (0.28)

Genre: Research Report > Promising Solution (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Amplitude-Phase Fusion for Enhanced Electrocardiogram Morphological Analysis

Hu, Shuaicong, Wang, Yanan, Liu, Jian, Lin, Jingyu, Qin, Shengmei, Nie, Zhenning, Yao, Zhifeng, Cai, Wenjie, Yang, Cuiwei

arXiv.org Artificial IntelligenceApr-15-2024

Considering the variability of amplitude and phase patterns in electrocardiogram (ECG) signals due to cardiac activity and individual differences, existing entropy-based studies have not fully utilized these two patterns and lack integration. To address this gap, this paper proposes a novel fusion entropy metric, morphological ECG entropy (MEE) for the first time, specifically designed for ECG morphology, to comprehensively describe the fusion of amplitude and phase patterns. MEE is computed based on beat-level samples, enabling detailed analysis of each cardiac cycle. Experimental results demonstrate that MEE achieves rapid, accurate, and label-free localization of abnormal ECG arrhythmia regions. Furthermore, MEE provides a method for assessing sample diversity, facilitating compression of imbalanced training sets (via representative sample selection), and outperforms random pruning. Additionally, MEE exhibits the ability to describe areas of poor quality. By discussing, it proves the robustness of MEE value calculation to noise interference and its low computational complexity. Finally, we integrate this method into a clinical interactive interface to provide a more convenient and intuitive user experience. These findings indicate that MEE serves as a valuable clinical descriptor for ECG characterization. The implementation code can be referenced at the following link: https://github.com/fdu-harry/ECG-MEE-metric.

artificial intelligence, information, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2404.09729

Country: Asia > China (0.15)

Genre: Research Report > New Finding (0.88)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Diagnostic Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

TimeGraphs: Graph-based Temporal Reasoning

Maheshwari, Paridhi, Ren, Hongyu, Wang, Yanan, Sosic, Rok, Leskovec, Jure

arXiv.org Artificial IntelligenceJan-6-2024

Many real-world systems exhibit temporal, dynamic behaviors, which are captured as time series of complex agent interactions. To perform temporal reasoning, current methods primarily encode temporal dynamics through simple sequence-based models. However, in general these models fail to efficiently capture the full spectrum of rich dynamics in the input, since the dynamics is not uniformly distributed. In particular, relevant information might be harder to extract and computing power is wasted for processing all individual timesteps, even if they contain no significant changes or no new information. Here we propose TimeGraphs, a novel approach that characterizes dynamic interactions as a hierarchical temporal graph, diverging from traditional sequential representations. Our approach models the interactions using a compact graph-based representation, enabling adaptive reasoning across diverse time scales. Adopting a self-supervised method, TimeGraphs constructs a multi-level event hierarchy from a temporal input, which is then used to efficiently reason about the unevenly distributed dynamics. This construction process is scalable and incremental to accommodate streaming data. We evaluate TimeGraphs on multiple datasets with complex, dynamic agent interactions, including a football simulator, the Resistance game, and the MOMA human activity dataset. The results demonstrate both robustness and efficiency of TimeGraphs on a range of temporal reasoning tasks. Our approach obtains state-of-the-art performance and leads to a performance increase of up to 12.2% on event prediction and recognition tasks over current approaches. Our experiments further demonstrate a wide array of capabilities including zero-shot generalization, robustness in case of data sparsity, and adaptability to streaming data flow.

artificial intelligence, machine learning, temporal reasoning, (18 more...)

arXiv.org Artificial Intelligence

2401.03134

Genre: Research Report > New Finding (0.48)

Industry: Leisure & Entertainment > Sports (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Temporal Reasoning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

EVBattery: A Large-Scale Electric Vehicle Dataset for Battery Health and Capacity Estimation

He, Haowei, Zhang, Jingzhao, Wang, Yanan, Jiang, Benben, Huang, Shaobo, Wang, Chen, Zhang, Yang, Xiong, Gengang, Han, Xuebing, Guo, Dongxu, He, Guannan, Ouyang, Minggao

arXiv.org Artificial IntelligenceNov-1-2023

Electric vehicles (EVs) play an important role in reducing carbon emissions. As EV adoption accelerates, safety issues caused by EV batteries have become an important research topic. In order to benchmark and develop data-driven methods for this task, we introduce a large and comprehensive dataset of EV batteries. Our dataset includes charging records collected from hundreds of EVs from three manufacturers over several years. Our dataset is the first large-scale public dataset on real-world battery data, as existing data either include only several vehicles or is collected in the lab environment. Meanwhile, our dataset features two types of labels, corresponding to two key tasks - battery health estimation and battery capacity estimation. In addition to demonstrating how existing deep learning algorithms can be applied to this task, we further develop an algorithm that exploits the data structure of battery systems. Our algorithm achieves better results and shows that a customized method can improve model performances. We hope that this public dataset provides valuable resources for researchers, policymakers, and industry professionals to better understand the dynamics of EV battery aging and support the transition toward a sustainable transportation system.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2201.12358

Country:

Europe (0.28)
North America > United States > Pennsylvania (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Transportation > Ground > Road (1.00)
Transportation > Electric Vehicle (1.00)
Energy > Energy Storage (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.94)

Add feedback

What are Public Concerns about ChatGPT? A Novel Self-Supervised Neural Topic Model Tells You

Wang, Rui, Liu, Xing, Wang, Yanan, Huang, Haiping

arXiv.org Artificial IntelligenceOct-26-2023

The recently released artificial intelligence conversational agent, ChatGPT, has gained significant attention in academia and real life. A multitude of early ChatGPT users eagerly explore its capabilities and share their opinions on it via social media. Both user queries and social media posts express public concerns regarding this advanced dialogue system. To mine public concerns about ChatGPT, a novel Self-Supervised neural Topic Model (SSTM), which formalizes topic modeling as a representation learning procedure, is proposed in this paper. Extensive experiments have been conducted on Twitter posts about ChatGPT and queries asked by ChatGPT users. And experimental results demonstrate that the proposed approach could extract higher quality public concerns with improved interpretability and diversity, surpassing the performance of state-of-the-art approaches.

large language model, machine learning, natural language, (6 more...)

arXiv.org Artificial Intelligence

2309.01522

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

VideoAdviser: Video Knowledge Distillation for Multimodal Transfer Learning

Wang, Yanan, Zeng, Donghuo, Wada, Shinya, Kurihara, Satoshi

arXiv.org Artificial IntelligenceSep-27-2023

Multimodal transfer learning aims to transform pretrained representations of diverse modalities into a common domain space for effective multimodal fusion. However, conventional systems are typically built on the assumption that all modalities exist, and the lack of modalities always leads to poor inference performance. Furthermore, extracting pretrained embeddings for all modalities is computationally inefficient for inference. In this work, to achieve high efficiency-performance multimodal transfer learning, we propose VideoAdviser, a video knowledge distillation method to transfer multimodal knowledge of video-enhanced prompts from a multimodal fundamental model (teacher) to a specific modal fundamental model (student). With an intuition that the best learning performance comes with professional advisers and smart students, we use a CLIP-based teacher model to provide expressive multimodal knowledge supervision signals to a RoBERTa-based student model via optimizing a step-distillation objective loss -- first step: the teacher distills multimodal knowledge of video-enhanced prompts from classification logits to a regression logit -- second step: the multimodal knowledge is distilled from the regression logit of the teacher to the student. We evaluate our method in two challenging multimodal tasks: video-level sentiment analysis (MOSI and MOSEI datasets) and audio-visual retrieval (VEGAS dataset). The student (requiring only the text modality as input) achieves an MAE score improvement of up to 12.3% for MOSI and MOSEI. Our method further enhances the state-of-the-art method by 3.4% mAP score for VEGAS without additional computations for inference. These results suggest the strengths of our method for achieving high efficiency-performance multimodal transfer learning.

machine learning, multimodal transfer learning, video knowledge distillation, (2 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ACCESS.2023.3280187.

2309.15494

Genre: Research Report (1.00)

Industry: Education (0.53)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (1.00)

Add feedback

VQA-GNN: Reasoning with Multimodal Knowledge via Graph Neural Networks for Visual Question Answering

Wang, Yanan, Yasunaga, Michihiro, Ren, Hongyu, Wada, Shinya, Leskovec, Jure

arXiv.org Artificial IntelligenceSep-15-2023

Visual question answering (VQA) requires systems to perform concept-level reasoning by unifying unstructured (e.g., the context in question and answer; "QA context") and structured (e.g., knowledge graph for the QA context and scene; "concept graph") multimodal knowledge. Existing works typically combine a scene graph and a concept graph of the scene by connecting corresponding visual nodes and concept nodes, then incorporate the QA context representation to perform question answering. However, these methods only perform a unidirectional fusion from unstructured knowledge to structured knowledge, limiting their potential to capture joint reasoning over the heterogeneous modalities of knowledge. To perform more expressive reasoning, we propose VQA-GNN, a new VQA model that performs bidirectional fusion between unstructured and structured multimodal knowledge to obtain unified knowledge representations. Specifically, we inter-connect the scene graph and the concept graph through a super node that represents the QA context, and introduce a new multimodal GNN technique to perform inter-modal message passing for reasoning that mitigates representational gaps between modalities. On two challenging VQA tasks (VCR and GQA), our method outperforms strong baseline VQA methods by 3.2% on VCR (Q-AR) and 4.6% on GQA, suggesting its strength in performing concept-level reasoning. Ablation studies further demonstrate the efficacy of the bidirectional fusion and multimodal GNN method in unifying unstructured and structured multimodal knowledge.

machine learning, natural language, question answering, (19 more...)

arXiv.org Artificial Intelligence

2205.11501

Country: North America > United States (0.14)

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.83)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.66)

Add feedback