AITopics | Yang, Xu

Collaborating Authors

Yang, Xu

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

DTA: Dual Temporal-channel-wise Attention for Spiking Neural Networks

Kim, Minje, Kim, Minjun, Yang, Xu

arXiv.org Artificial IntelligenceMar-13-2025

Spiking Neural Networks (SNNs) present a more energy-efficient alternative to Artificial Neural Networks (ANNs) by harnessing spatio-temporal dynamics and event-driven spikes. Effective utilization of temporal information is crucial for SNNs, leading to the exploration of attention mechanisms to enhance this capability. Conventional attention operations either apply identical operation or employ non-identical operations across target dimensions. We identify that these approaches provide distinct perspectives on temporal information. To leverage the strengths of both operations, we propose a novel Dual Temporal-channel-wise Attention (DTA) mechanism that integrates both identical/non-identical attention strategies. To the best of our knowledge, this is the first attempt to concentrate on both the correlation and dependency of temporal-channel using both identical and non-identical attention operations. Experimental results demonstrate that the DTA mechanism achieves state-of-the-art performance on both static datasets (CIFAR10, CIFAR100, ImageNet-1k) and dynamic dataset (CIFAR10-DVS), elevating spike representation and capturing complex temporal-channel relationship. We open-source our code: https://github.com/MnJnKIM/DTA-SNN.

artificial intelligence, machine learning, neural network, (18 more...)

arXiv.org Artificial Intelligence

2503.10052

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL

Peng, Yingzhe, Zhang, Gongrui, Zhang, Miaosen, You, Zhiyuan, Liu, Jie, Zhu, Qipeng, Yang, Kai, Xu, Xingzhong, Geng, Xin, Yang, Xu

arXiv.org Artificial IntelligenceMar-10-2025

Enhancing reasoning in Large Multimodal Models (LMMs) faces unique challenges from the complex interplay between visual perception and logical reasoning, particularly in compact 3B-parameter architectures where architectural constraints limit reasoning capacity and modality alignment. While rule-based reinforcement learning (RL) excels in text-only domains, its multimodal extension confronts two critical barriers: (1) data limitations due to ambiguous answers and scarce complex reasoning examples, and (2) degraded foundational reasoning induced by multimodal pretraining. To address these challenges, we propose \textbf{LMM-R1}, a two-stage framework adapting rule-based RL for multimodal reasoning through \textbf{Foundational Reasoning Enhancement (FRE)} followed by \textbf{Multimodal Generalization Training (MGT)}. The FRE stage first strengthens reasoning abilities using text-only data with rule-based RL, then the MGT stage generalizes these reasoning capabilities to multimodal domains. Experiments on Qwen2.5-VL-Instruct-3B demonstrate that LMM-R1 achieves 4.83\% and 4.5\% average improvements over baselines in multimodal and text-only benchmarks, respectively, with a 3.63\% gain in complex Football Game tasks. These results validate that text-based reasoning enhancement enables effective multimodal generalization, offering a data-efficient paradigm that bypasses costly high-quality multimodal training data.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2503.07536

Country:

North America > United States > California (0.14)
Asia > China (0.14)

Genre: Research Report > New Finding (0.92)

Industry: Education (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
(2 more...)

Add feedback

Speculative Ensemble: Fast Large Language Model Ensemble via Speculation

Fu, Jiale, Jiang, Yuchu, Chen, Junkai, Fan, Jiaming, Geng, Xin, Yang, Xu

arXiv.org Artificial IntelligenceFeb-1-2025

Ensemble methods enhance Large Language Models (LLMs) by combining multiple models but suffer from high computational costs. In this paper, we introduce Speculative Ensemble, a novel framework that accelerates LLM ensembles without sacrificing performance, inspired by Speculative Decoding-where a small proposal model generates tokens sequentially, and a larger target model verifies them in parallel. Our approach builds on two key insights: (1) the verification distribution can be the ensemble distribution of both the proposal and target models, and (2) alternating each model as the proposer and verifier can further enhance efficiency. We generalize this method to ensembles with n models and theoretically prove that SE is never slower than a standard ensemble, typically achieving faster speed. Extensive experiments demonstrate speed improvements of 1.11x-2.23x over standard ensemble techniques without compromising generation quality. Our code is available at https://github.com/Kamichanw/Speculative-Ensemble/

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2502.01662

Country:

Europe > Austria > Vienna (0.14)
North America > United States > Hawaii (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Reinforcement learning Based Automated Design of Differential Evolution Algorithm for Black-box Optimization

Yang, Xu, Wang, Rui, Li, Kaiwen, Wang, Ling

arXiv.org Artificial IntelligenceJan-22-2025

Differential evolution (DE) algorithm is recognized as one of the most effective evolutionary algorithms, demonstrating remarkable efficacy in black-box optimization due to its derivative-free nature. Numerous enhancements to the fundamental DE have been proposed, incorporating innovative mutation strategies and sophisticated parameter tuning techniques to improve performance. However, no single variant has proven universally superior across all problems. To address this challenge, we introduce a novel framework that employs reinforcement learning (RL) to automatically design DE for black-box optimization through meta-learning. RL acts as an advanced meta-optimizer, generating a customized DE configuration that includes an optimal initialization strategy, update rule, and hyperparameters tailored to a specific black-box optimization problem. This process is informed by a detailed analysis of the problem characteristics. In this proof-of-concept study, we utilize a double deep Q-network for implementation, considering a subset of 40 possible strategy combinations and parameter optimizations simultaneously. The framework's performance is evaluated against black-box optimization benchmarks and compared with state-of-the-art algorithms. The experimental results highlight the promising potential of our proposed framework.

evolutionary algorithm, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2501.12881

Country:

Asia > China (0.28)
Europe > United Kingdom > England (0.14)

Genre:

Overview (0.68)
Research Report (0.50)

Industry:

Transportation > Air (1.00)
Energy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)

Add feedback

STHFL: Spatio-Temporal Heterogeneous Federated Learning

Guo, Shunxin, Wang, Hongsong, Lin, Shuxia, Yang, Xu, Geng, Xin

arXiv.org Artificial IntelligenceJan-10-2025

Federated learning is a new framework that protects data privacy and allows multiple devices to cooperate in training machine learning models. Previous studies have proposed multiple approaches to eliminate the challenges posed by non-iid data and inter-domain heterogeneity issues. However, they ignore the \textbf{spatio-temporal} heterogeneity formed by different data distributions of increasing task data in the intra-domain. Moreover, the global data is generally a long-tailed distribution rather than assuming the global data is balanced in practical applications. To tackle the \textbf{spatio-temporal} dilemma, we propose a novel setting named \textbf{Spatio-Temporal Heterogeneity} Federated Learning (STHFL). Specially, the Global-Local Dynamic Prototype (GLDP) framework is designed for STHFL. In GLDP, the model in each client contains personalized layers which can dynamically adapt to different data distributions. For long-tailed data distribution, global prototypes are served as complementary knowledge for the training on classes with few samples in clients without leaking privacy. As tasks increase in clients, the knowledge of local prototypes generated in previous tasks guides for training in the current task to solve catastrophic forgetting. Meanwhile, the global-local prototypes are updated through the moving average method after training local prototypes in clients. Finally, we evaluate the effectiveness of GLDP, which achieves remarkable results compared to state-of-the-art methods in STHFL scenarios.

artificial intelligence, machine learning, prototype, (14 more...)

arXiv.org Artificial Intelligence

2501.05775

Genre: Research Report (0.70)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

VinT-6D: A Large-Scale Object-in-hand Dataset from Vision, Touch and Proprioception

Wan, Zhaoliang, Ling, Yonggen, Yi, Senlin, Qi, Lu, Lee, Wangwei, Lu, Minglei, Yang, Sicheng, Teng, Xiao, Lu, Peng, Yang, Xu, Yang, Ming-Hsuan, Cheng, Hui

arXiv.org Artificial IntelligenceJan-6-2025

This paper addresses the scarcity of large-scale datasets for accurate object-in-hand pose estimation, which is crucial for robotic in-hand manipulation within the ``Perception-Planning-Control" paradigm. Specifically, we introduce VinT-6D, the first extensive multi-modal dataset integrating vision, touch, and proprioception, to enhance robotic manipulation. VinT-6D comprises 2 million VinT-Sim and 0.1 million VinT-Real splits, collected via simulations in MuJoCo and Blender and a custom-designed real-world platform. This dataset is tailored for robotic hands, offering models with whole-hand tactile perception and high-quality, well-aligned data. To the best of our knowledge, the VinT-Real is the largest considering the collection difficulties in the real-world environment so that it can bridge the gap of simulation to real compared to the previous works. Built upon VinT-6D, we present a benchmark method that shows significant improvements in performance by fusing multi-modal information. The project is available at https://VinT-6D.github.io/.

artificial intelligence, large-scale object-in-hand dataset, sensor, (14 more...)

arXiv.org Artificial Intelligence

2501.0051

Country:

North America > United States > California (0.14)
Europe > Austria > Vienna (0.14)
Asia > China > Guangdong Province (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (0.93)

Technology: Information Technology > Artificial Intelligence > Robots > Manipulation (1.00)

Add feedback

BPQP: A Differentiable Convex Optimization Framework for Efficient End-to-End Learning

Pan, Jianming, Ye, Zeqi, Yang, Xiao, Yang, Xu, Liu, Weiqing, Wang, Lewen, Bian, Jiang

arXiv.org Artificial IntelligenceDec-29-2024

Data-driven decision-making processes increasingly utilize end-to-end learnable deep neural networks to render final decisions. Sometimes, the output of the forward functions in certain layers is determined by the solutions to mathematical optimization problems, leading to the emergence of differentiable optimization layers that permit gradient back-propagation. However, real-world scenarios often involve large-scale datasets and numerous constraints, presenting significant challenges. Current methods for differentiating optimization problems typically rely on implicit differentiation, which necessitates costly computations on the Jacobian matrices, resulting in low efficiency. In this paper, we introduce BPQP, a differentiable convex optimization framework designed for efficient end-to-end learning. To enhance efficiency, we reformulate the backward pass as a simplified and decoupled quadratic programming problem by leveraging the structural properties of the KKT matrix. This reformulation enables the use of first-order optimization algorithms in calculating the backward pass gradients, allowing our framework to potentially utilize any state-of-the-art solver. As solver technologies evolve, BPQP can continuously adapt and improve its efficiency. Extensive experiments on both simulated and real-world datasets demonstrate that BPQP achieves a significant improvement in efficiency--typically an order of magnitude faster in overall execution time compared to other differentiable optimization layers. Our results not only highlight the efficiency gains of BPQP but also underscore its superiority over differentiable optimization layer baselines.

institute of electrical and electronics engineers (ieee), machine learning, nvidia corporation, (28 more...)

arXiv.org Artificial Intelligence

2411.19285

Country: North America > United States > California (0.14)

Genre: Research Report > New Finding (0.34)

Industry: Energy > Oil & Gas > Upstream (0.77)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

RL2: Reinforce Large Language Model to Assist Safe Reinforcement Learning for Energy Management of Active Distribution Networks

Yang, Xu, Lin, Chenhui, Liu, Haotian, Wu, Wenchuan

arXiv.org Artificial IntelligenceDec-2-2024

As large-scale distributed energy resources are integrated into the active distribution networks (ADNs), effective energy management in ADNs becomes increasingly prominent compared to traditional distribution networks. Although advanced reinforcement learning (RL) methods, which alleviate the burden of complicated modelling and optimization, have greatly improved the efficiency of energy management in ADNs, safety becomes a critical concern for RL applications in real-world problems. Since the design and adjustment of penalty functions, which correspond to operational safety constraints, requires extensive domain knowledge in RL and power system operation, the emerging ADN operators call for a more flexible and customized approach to address the penalty functions so that the operational safety and efficiency can be further enhanced. Empowered with strong comprehension, reasoning, and in-context learning capabilities, large language models (LLMs) provide a promising way to assist safe RL for energy management in ADNs. In this paper, we introduce the LLM to comprehend operational safety requirements in ADNs and generate corresponding penalty functions. In addition, we propose an RL2 mechanism to refine the generated functions iteratively and adaptively through multi-round dialogues, in which the LLM agent adjusts the functions' pattern and parameters based on training and test performance of the downstream RL agent. The proposed method significantly reduces the intervention of the ADN operators. Comprehensive test results demonstrate the effectiveness of the proposed method.

large language model, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2412.01303

Country: Asia > China (0.28)

Genre: Research Report > New Finding (0.34)

Industry: Energy > Power Industry (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

DaDu-E: Rethinking the Role of Large Language Model in Robotic Computing Pipeline

Sun, Wenhao, Hou, Sai, Wang, Zixuan, Yu, Bo, Liu, Shaoshan, Yang, Xu, Liang, Shuai, Gan, Yiming, Han, Yinhe

arXiv.org Artificial IntelligenceDec-2-2024

Performing complex tasks in open environments remains challenging for robots, even when using large language models (LLMs) as the core planner. Many LLM-based planners are inefficient due to their large number of parameters and prone to inaccuracies because they operate in open-loop systems. We think the reason is that only applying LLMs as planners is insufficient. In this work, we propose DaDu-E, a robust closed-loop planning framework for embodied AI robots. Specifically, DaDu-E is equipped with a relatively lightweight LLM, a set of encapsulated robot skill instructions, a robust feedback system, and memory augmentation. Together, these components enable DaDu-E to (i) actively perceive and adapt to dynamic environments, (ii) optimize computational costs while maintaining high performance, and (iii) recover from execution failures using its memory and feedback mechanisms. Extensive experiments on real-world and simulated tasks show that DaDu-E achieves task success rates comparable to embodied AI robots with larger models as planners like COME-Robot, while reducing computational requirements by $6.6 \times$. Users are encouraged to explore our system at: \url{https://rlc-lab.github.io/dadu-e/}.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2412.01663

Country: Asia > China (0.28)

Genre: Research Report > New Finding (0.93)

Industry:

Information Technology (0.67)
Consumer Products & Services (0.47)
Health & Medicine > Consumer Health (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Redefining in Dictionary: Towards an Enhanced Semantic Understanding of Creative Generation

Feng, Fu, Xie, Yucheng, Yang, Xu, Wang, Jing, Geng, Xin

arXiv.org Artificial IntelligenceNov-20-2024

Given the challenge atively generated using . Furthermore, this that diffusion models face in directly generating creativity, meta-creativity enables direct concept combinations without existing methods typically rely on synthesizing reference requiring additional training, much like generating "a prompts or images to achieve creative effects. This significantly reduces both time and computational instance, to combine "Lettuce" and "Mantis" creatively, complexity compared to state-of-the-art (SOTA) ConceptLab [43] merges tokens representing these concepts creative generation methods, such as ConceptLab [43] (4s into a new composite token, while BASS [22] uses predefined vs. 120s per image, 30 speedup) and BASS [22] (4s vs. sampling rules to search for creative outcomes from a 2400s per image, 600 speedup), while maintaining linguistic large pool of candidate images. Further each generation, which leads to high computational costs evaluations using GPT-4o [1] and user studies indicate superior and limited practicality for online applications. In contrast, performance of CreTok in terms of integration, originality, "a blue banana" can be generated directly without additional and aesthetics, underscoring its effectiveness in fostering training, due to its clear and concrete semantics, especially combinatorial creativity. Inspired by this, we may Our contributions are as follows: (1) We propose Cre-ask: Can we awaken the creativity of diffusion models by Tok, a method designed to enhance models' meta-ability enhancing their semantic understanding of "creative"? To by enabling a enhanced understanding of abstract and ambiguous achieve this, we propose CreTok, which redefines "creative" adjectives (e.g., "creative" or "beautiful") through as a new specialized token, , allowing it their redefinition as new tokens. This redefinition we redefine the abstract term "creative" within our proposed enhances the model's semantic understanding for CangJie dataset for the TP2O task, and introduce combinatorial creativity, as shown in Figure 1c. Specifically, text-to-image (T2I) models and creative generation methods CreTok builds on the definition of "creativity" from in terms of computational complexity, human preference the TP2O task [22] for combinatorial object generation, ratings, text-image alignment, and other key metrics. ") and human-like creativity, a critical yet underexplored aspect an adaptive prompt (e.g., "A photo of a mixture"). of AI research [28, 29].

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2410.2416

Genre:

Research Report > Promising Solution (0.46)
Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback