AITopics | Chen, Junyi

Collaborating Authors

Chen, Junyi

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Aether: Geometric-Aware Unified World Modeling

Aether Team, null, Zhu, Haoyi, Wang, Yifan, Zhou, Jianjun, Chang, Wenzheng, Zhou, Yang, Li, Zizun, Chen, Junyi, Shen, Chunhua, Pang, Jiangmiao, He, Tong

arXiv.org Artificial IntelligenceMar-25-2025

The integration of geometric reconstruction and generative modeling remains a critical challenge in developing AI systems capable of human-like spatial reasoning. This paper proposes Aether, a unified framework that enables geometry-aware reasoning in world models by jointly optimizing three core capabilities: (1) 4D dynamic reconstruction, (2) action-conditioned video prediction, and (3) goal-conditioned visual planning. Through task-interleaved feature learning, Aether achieves synergistic knowledge sharing across reconstruction, prediction, and planning objectives. Building upon video generation models, our framework demonstrates unprecedented synthetic-to-real generalization despite never observing real-world data during training. Furthermore, our approach achieves zero-shot generalization in both action following and reconstruction tasks, thanks to its intrinsic geometric modeling. Remarkably, even without real-world data, its reconstruction performance is comparable with or even better than that of domain-specific models. Additionally, Aether employs camera trajectories as geometry-informed action spaces, enabling effective action-conditioned prediction and visual planning. We hope our work inspires the community to explore new frontiers in physically-reasonable world modeling and its applications.

artificial intelligence, arxiv preprint arxiv, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2503.18945

Country:

Europe (0.67)
Asia > China > Shanghai > Shanghai (0.42)

Genre: Research Report (0.64)

Industry: Media (0.30)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Robots (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Make Full Use of Testing Information: An Integrated Accelerated Testing and Evaluation Method for Autonomous Driving Systems

Wu, Xinzheng, Chen, Junyi, Wu, Jianfeng, Zhang, Longgao, Xia, Tian, Shen, Yong

arXiv.org Artificial IntelligenceJan-21-2025

Testing and evaluation is an important step before the large-scale application of the autonomous driving systems (ADSs). Based on the three level of scenario abstraction theory, a testing can be performed within a logical scenario, followed by an evaluation stage which is inputted with the testing results of each concrete scenario generated from the logical parameter space. During the above process, abundant testing information is produced which is beneficial for comprehensive and accurate evaluations. To make full use of testing information, this paper proposes an Integrated accelerated Testing and Evaluation Method (ITEM). Based on a Monte Carlo Tree Search (MCTS) paradigm and a dual surrogates testing framework proposed in our previous work, this paper applies the intermediate information (i.e., the tree structure, including the affiliation of each historical sampled point with the subspaces and the parent-child relationship between subspaces) generated during the testing stage into the evaluation stage to achieve accurate hazardous domain identification. Moreover, to better serve this purpose, the UCB calculation method is improved to allow the search algorithm to focus more on the hazardous domain boundaries. Further, a stopping condition is constructed based on the convergence of the search algorithm. Ablation and comparative experiments are then conducted to verify the effectiveness of the improvements and the superiority of the proposed method. The experimental results show that ITEM could well identify the hazardous domains in both low- and high-dimensional cases, regardless of the shape of the hazardous domains, indicating its generality and potential for the safety evaluation of ADSs.

artificial intelligence, evolutionary algorithm, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2501.11924

Country: Asia > China (0.14)

Genre: Research Report > New Finding (0.66)

Industry:

Automobiles & Trucks (1.00)
Transportation > Ground > Road (0.70)
Information Technology > Robotics & Automation (0.70)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)

Add feedback

Two Birds with One Stone: Improving Rumor Detection by Addressing the Unfairness Issue

Chen, Junyi, Wu, Mengjia, Liu, Qian, Ding, Ying, Zhang, Yi

arXiv.org Artificial IntelligenceDec-29-2024

The degraded performance and group unfairness caused by confounding sensitive attributes in rumor detection remains relatively unexplored. To address this, we propose a two-step framework. Initially, it identifies confounding sensitive attributes that limit rumor detection performance and cause unfairness across groups. Subsequently, we aim to learn equally informative representations through invariant learning. Our method considers diverse sets of groups without sensitive attribute annotations. Experiments show our method easily integrates with existing rumor detectors, significantly improving both their detection performance and fairness.

artificial intelligence, detection performance, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2412.20671

Country: North America > United States (0.70)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science (0.69)

Add feedback

LAMBDA: Covering the Multimodal Critical Scenarios for Automated Driving Systems by Search Space Quantization

Wu, Xinzheng, Chen, Junyi, Xing, Xingyu, Sun, Jian, Tian, Ye, Liu, Lihao, Shen, Yong

arXiv.org Artificial IntelligenceNov-30-2024

Scenario-based virtual testing is one of the most significant methods to test and evaluate the safety of automated driving systems (ADSs). However, it is impractical to enumerate all concrete scenarios in a logical scenario space and test them exhaustively. Recently, Black-Box Optimization (BBO) was introduced to accelerate the scenario-based test of ADSs by utilizing the historical test information to generate new test cases. However, a single optimum found by the BBO algorithm is insufficient for the purpose of a comprehensive safety evaluation of ADSs in a logical scenario. In fact, all the subspaces representing danger in the logical scenario space, rather than only the most critical concrete scenario, play a more significant role for the safety evaluation. Covering as many of the critical concrete scenarios in a logical scenario space through a limited number of tests is defined as the Black-Box Coverage (BBC) problem in this paper. We formalized this problem in a sample-based search paradigm and constructed a coverage criterion with Confusion Matrix Analysis. Furthermore, we propose LAMBDA (Latent-Action Monte-Carlo Beam Search with Density Adaption) to solve BBC problems. LAMBDA can quickly focus on critical subspaces by recursively partitioning the logical scenario space into accepted and rejected parts. Compared with its predecessor LaMCTS, LAMBDA introduces sampling density to overcome the sampling bias from optimization and Beam Search to obtain more parallelizability. Experimental results show that LAMBDA achieves state-of-the-art performance among all baselines and can reach at most 33 and 6000 times faster than Random Search to get 95% coverage of the critical areas in 2- and 5-dimensional synthetic functions, respectively. Experiments also demonstrate that LAMBDA has a promising future in the safety evaluation of ADSs in virtual tests.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2412.00517

Genre: Research Report > New Finding (0.87)

Industry: Transportation > Ground > Road (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

Add feedback

A Survey on Large Language Models for Personalized and Explainable Recommendations

Chen, Junyi

arXiv.org Artificial IntelligenceNov-20-2023

In recent years, Recommender Systems(RS) have witnessed a transformative shift with the advent of Large Language Models(LLMs) in the field of Natural Language Processing(NLP). These models such as OpenAI's GPT-3.5/4, Llama from Meta, have demonstrated unprecedented capabilities in understanding and generating human-like text. This has led to a paradigm shift in the realm of personalized and explainable recommendations, as LLMs offer a versatile toolset for processing vast amounts of textual data to enhance user experiences. To provide a comprehensive understanding of the existing LLM-based recommendation systems, this survey aims to analyze how RS can benefit from LLM-based methodologies. Furthermore, we describe major challenges in Personalized Explanation Generating(PEG) tasks, which are cold-start problems, unfairness and bias problems in RS.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2311.12338

Genre:

Overview (0.49)
Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

TinyFormer: Efficient Transformer Design and Deployment on Tiny Devices

Yang, Jianlei, Liao, Jiacheng, Lei, Fanding, Liu, Meichen, Chen, Junyi, Long, Lingkun, Wan, Han, Yu, Bei, Zhao, Weisheng

arXiv.org Artificial IntelligenceNov-3-2023

Developing deep learning models on tiny devices (e.g. Microcontroller units, MCUs) has attracted much attention in various embedded IoT applications. However, it is challenging to efficiently design and deploy recent advanced models (e.g. transformers) on tiny devices due to their severe hardware resource constraints. In this work, we propose TinyFormer, a framework specifically designed to develop and deploy resource-efficient transformers on MCUs. TinyFormer mainly consists of SuperNAS, SparseNAS and SparseEngine. Separately, SuperNAS aims to search for an appropriate supernet from a vast search space. SparseNAS evaluates the best sparse single-path model including transformer architecture from the identified supernet. Finally, SparseEngine efficiently deploys the searched sparse models onto MCUs. To the best of our knowledge, SparseEngine is the first deployment framework capable of performing inference of sparse models with transformer on MCUs. Evaluation results on the CIFAR-10 dataset demonstrate that TinyFormer can develop efficient transformers with an accuracy of $96.1\%$ while adhering to hardware constraints of $1$MB storage and $320$KB memory. Additionally, TinyFormer achieves significant speedups in sparse inference, up to $12.2\times$, when compared to the CMSIS-NN library. TinyFormer is believed to bring powerful transformers into TinyML scenarios and greatly expand the scope of deep learning applications.

artificial intelligence, machine learning, tinyformer, (20 more...)

arXiv.org Artificial Intelligence

2311.01759

Country: North America > United States (0.93)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

EVE: Efficient Vision-Language Pre-training with Masked Prediction and Modality-Aware MoE

Chen, Junyi, Guo, Longteng, Sun, Jia, Shao, Shuai, Yuan, Zehuan, Lin, Liang, Zhang, Dongyu

arXiv.org Artificial IntelligenceAug-23-2023

Building scalable vision-language models to learn from diverse, multimodal data remains an open challenge. In this paper, we introduce an Efficient Vision-languagE foundation model, namely EVE, which is one unified multimodal Transformer pre-trained solely by one unified pre-training task. Specifically, EVE encodes both vision and language within a shared Transformer network integrated with modality-aware sparse Mixture-of-Experts (MoE) modules, which capture modality-specific information by selectively switching to different experts. To unify pre-training tasks of vision and language, EVE performs masked signal modeling on image-text pairs to reconstruct masked signals, i.e., image pixels and text tokens, given visible signals. This simple yet effective pre-training objective accelerates training by 3.5x compared to the model pre-trained with Image-Text Contrastive and Image-Text Matching losses. Owing to the combination of the unified architecture and pre-training task, EVE is easy to scale up, enabling better downstream performance with fewer resources and faster training speed. Despite its simplicity, EVE achieves state-of-the-art performance on various vision-language downstream tasks, including visual question answering, visual reasoning, and image-text retrieval.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2308.11971

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Evolving Testing Scenario Generation Method and Intelligence Evaluation Framework for Automated Vehicles

Ma, Yining, Jiang, Wei, Zhang, Lingtong, Chen, Junyi, Wang, Hong, Lv, Chen, Wang, Xuesong, Xiong, Lu

arXiv.org Artificial IntelligenceJun-12-2023

Interaction between the background vehicles (BVs) and automated vehicles (AVs) in scenario-based testing plays a critical role in evaluating the intelligence of the AVs. Current testing scenarios typically employ predefined or scripted BVs, which inadequately reflect the complexity of human-like social behaviors in real-world driving scenarios, and also lack a systematic metric for evaluating the comprehensive intelligence of AVs. Therefore, this paper proposes an evolving scenario generation method that utilizes deep reinforcement learning (DRL) to create human-like BVs for testing and intelligence evaluation of AVs. Firstly, a class of driver models with human-like competitive, cooperative, and mutual driving motivations is designed. Then, utilizing an improved "level-k" training procedure, the three distinct driver models acquire game-based interactive driving policies. And these models are assigned to BVs for generating evolving scenarios in which all BVs can interact continuously and evolve diverse contents. Next, a framework including safety, driving efficiency, and interaction utility are presented to evaluate and quantify the intelligence performance of 3 systems under test (SUTs), indicating the effectiveness of the evolving scenario for intelligence testing. Finally, the complexity and fidelity of the proposed evolving testing scenario are validated. The results demonstrate that the proposed evolving scenario exhibits the highest level of complexity compared to other baseline scenarios and has more than 85% similarity to naturalistic driving data. This highlights the potential of the proposed method to facilitate the development and evaluation of high-level AVs in a realistic and challenging environment.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2306.07142

Country: Asia > China (0.95)

Genre: Research Report > New Finding (0.48)

Industry:

Transportation > Ground > Road (1.00)
Automobiles & Trucks (1.00)
Education (0.88)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

PiggyBack: Pretrained Visual Question Answering Environment for Backing up Non-deep Learning Professionals

Zhang, Zhihao, Luo, Siwen, Chen, Junyi, Lai, Sijia, Long, Siqu, Chung, Hyunsuk, Han, Soyeon Caren

arXiv.org Artificial IntelligenceNov-30-2022

We propose a PiggyBack, a Visual Question Answering platform that allows users to apply the state-of-the-art visual-language pretrained models easily. The PiggyBack supports the full stack of visual question answering tasks, specifically data processing, model fine-tuning, and result visualisation. We integrate visual-language models, pretrained by HuggingFace, an open-source API platform of deep learning technologies; however, it cannot be runnable without programming skills or deep learning understanding. Hence, our PiggyBack supports an easy-to-use browser-based user interface with several deep learning visual language pretrained models for general users and domain experts. The PiggyBack includes the following benefits: Free availability under the MIT License, Portability due to web-based and thus runs on almost any platform, A comprehensive data creation and processing technique, and ease of use on deep learning-based visual language pretrained models. The demo video is available on YouTube and can be found at https://youtu.be/iz44RZ1lF4s.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2211.1594

Country:

Oceania > Australia (0.29)
North America > United States (0.28)

Genre: Research Report (0.40)

Industry:

Health & Medicine (0.48)
Information Technology (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback