AITopics | Yu, Hang

Collaborating Authors

Yu, Hang

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Every Sample Matters: Leveraging Mixture-of-Experts and High-Quality Data for Efficient and Accurate Code LLM

Codefuse, null, Team, Ling, :, null, Cai, Wenting, Cao, Yuchen, Chen, Chaoyu, Chen, Chen, Chen, Siba, Cui, Qing, Di, Peng, Fang, Junpeng, Gong, Zi, Guo, Ting, He, Zhengyu, Huang, Yang, Li, Cong, Li, Jianguo, Li, Zheng, Lian, Shijie, Liu, BingChang, Luo, Songshan, Mao, Shuo, Shen, Min, Wu, Jian, Yang, Jiaolong, Yang, Wenjie, Ye, Tong, Yu, Hang, Zhang, Wei, Zhang, Zhenduo, Zhao, Hailin, Zheng, Xunjin, Zhou, Jun

arXiv.org Artificial IntelligenceMar-22-2025

Recent advancements in code large language models (LLMs) have demonstrated remarkable capabilities in code generation and understanding. It is still challenging to build a code LLM with comprehensive performance yet ultimate efficiency. Many attempts have been released in the open source community to break the trade-off between performance and efficiency, such as the Qwen Coder series and the DeepSeek Coder series. This paper introduces yet another attempt in this area, namely Ling-Coder-Lite. We leverage the efficient Mixture-of-Experts (MoE) architecture along with a set of high-quality data curation methods (especially those based on program analytics) to build an efficient yet powerful code LLM. Ling-Coder-Lite exhibits on-par performance on 12 representative coding benchmarks compared to state-of-the-art models of similar size, such as Qwen2.5-Coder-7B and DeepSeek-Coder-V2-Lite, while offering competitive latency and throughput. In practice, we achieve a 50\% reduction in deployment resources compared to the similar-sized dense model without performance loss. To facilitate further research and development in this area, we open-source our models as well as a substantial portion of high-quality data for the annealing and post-training stages. The models and data can be accessed at~\url{https://huggingface.co/inclusionAI/Ling-Coder-lite}.

large language model, machine learning, qwen2, (17 more...)

arXiv.org Artificial Intelligence

2503.17793

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Seeing Sarcasm Through Different Eyes: Analyzing Multimodal Sarcasm Perception in Large Vision-Language Models

Chen, Junjie, Liu, Xuyang, Huang, Subin, Zhang, Linfeng, Yu, Hang

arXiv.org Artificial IntelligenceMar-15-2025

With the advent of large vision-language models (LVLMs) demonstrating increasingly human-like abilities, a pivotal question emerges: do different LVLMs interpret multimodal sarcasm differently, and can a single model grasp sarcasm from multiple perspectives like humans? To explore this, we introduce an analytical framework using systematically designed prompts on existing multimodal sarcasm datasets. Evaluating 12 state-of-the-art LVLMs over 2,409 samples, we examine interpretive variations within and across models, focusing on confidence levels, alignment with dataset labels, and recognition of ambiguous "neutral" cases. Our findings reveal notable discrepancies -- across LVLMs and within the same model under varied prompts. While classification-oriented prompts yield higher internal consistency, models diverge markedly when tasked with interpretive reasoning. These results challenge binary labeling paradigms by highlighting sarcasm's subjectivity. We advocate moving beyond rigid annotation schemes toward multi-perspective, uncertainty-aware modeling, offering deeper insights into multimodal sarcasm comprehension. Our code and data are available at: https://github.com/CoderChen01/LVLMSarcasmAnalysis

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2503.12149

Country:

Europe (0.67)
Asia > China (0.28)
North America > Mexico > Mexico City (0.14)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)

Genre: Research Report > New Finding (0.87)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

An Efficient Learning Method to Connect Observables

Yu, Hang, Miyagi, Takayuki

arXiv.org Machine LearningMar-6-2025

Constructing fast and accurate surrogate models is a key ingredient for making robust predictions in many topics. We introduce a new model, the Multiparameter Eigenvalue Problem (MEP) emulator. The new method connects emulators and can make predictions directly from observables to observables. We present that the MEP emulator can be trained with data from Eigenvector Continuation (EC) and Parametric Matrix Model (PMM) emulators. A simple simulation on a one-dimensional lattice confirms the performance of the MEP emulator. Using $^{28}$O as an example, we also demonstrate that the predictive probability distribution of the target observables can be easily obtained through the new emulator.

artificial intelligence, emulator, machine learning, (19 more...)

arXiv.org Machine Learning

2503.01684

Country: Asia > Japan (0.15)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Advancing Out-of-Distribution Detection via Local Neuroplasticity

Canevaro, Alessandro, Schmidt, Julian, Marvi, Mohammad Sajad, Yu, Hang, Martius, Georg, Jordan, Julian

arXiv.org Artificial IntelligenceFeb-20-2025

In the domain of machine learning, the assumption that training and test data share the same distribution is often violated in real-world scenarios, requiring effective out-of-distribution (OOD) detection. This paper presents a novel OOD detection method that leverages the unique local neuroplasticity property of Kolmogorov-Arnold Networks (KANs). Unlike traditional multilayer perceptrons, KANs exhibit local plasticity, allowing them to preserve learned information while adapting to new tasks. Our method compares the activation patterns of a trained KAN against its untrained counterpart to detect OOD samples. We validate our approach on benchmarks from image and medical domains, demonstrating superior performance and robustness compared to state-of-the-art techniques. These results underscore the potential of KANs in enhancing the reliability of machine learning systems in diverse environments.

artificial intelligence, benchmark, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2502.15833

Country: North America > United States > California (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Traffic and Safety Rule Compliance of Humans in Diverse Driving Situations

Kurenkov, Michael, Marvi, Sajad, Schmidt, Julian, Rist, Christoph B., Canevaro, Alessandro, Yu, Hang, Jordan, Julian, Schildbach, Georg, Valada, Abhinav

arXiv.org Artificial IntelligenceNov-4-2024

In recent years, autonomous vehicles (AVs) have gained significant attention due to their potential to reduce traffic fatalities. The widespread adoption of AV technology is contingent not only on technical performance but also on public trust, with concerns centering on safety and potential technological malfunctions [1, 2]. A key factor in improving trust in autonomous systems is the ability to understand and replicate human driving behavior. However, worldwide, road accidents cause over 1.19 million deaths annually, with a majority resulting from human error [3], hence following human driving pattern is not always desired. Since the majority of accidents are caused by human error, analyzing human driving data allows us to identify common mistakes and undesirable driving patterns. This understanding is crucial for training machine learning models, such as those used in behavior cloning, where the goal is to mimic human driving behavior. Identifying undesirable driving patterns is especially useful for achieving a defensive driving behavior, which is proven to play a significant role in increasing passenger comfort and trust in AVs [4].

artificial intelligence, dataset, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2411.01909

Country:

Europe > Germany (0.29)
North America > United States (0.28)

Genre: Research Report (0.64)

Industry:

Transportation > Ground > Road (1.00)
Automobiles & Trucks (0.95)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

CoBa: Convergence Balancer for Multitask Finetuning of Large Language Models

Gong, Zi, Yu, Hang, Liao, Cong, Liu, Bingchang, Chen, Chaoyu, Li, Jianguo

arXiv.org Artificial IntelligenceOct-28-2024

Multi-task learning (MTL) benefits the fine-tuning of large language models (LLMs) by providing a single model with improved performance and generalization ability across tasks, presenting a resource-efficient alternative to developing separate models for each task. Yet, existing MTL strategies for LLMs often fall short by either being computationally intensive or failing to ensure simultaneous task convergence. This paper presents CoBa, a new MTL approach designed to effectively manage task convergence balance with minimal computational overhead. Utilizing Relative Convergence Scores (RCS), Absolute Convergence Scores (ACS), and a Divergence Factor (DF), CoBa dynamically adjusts task weights during the training process, ensuring that the validation loss of all tasks progress towards convergence at an even pace while mitigating the issue of individual task divergence. The results of our experiments involving three disparate datasets underscore that this approach not only fosters equilibrium in task convergence but enhances the LLMs' performance by up to 13% relative to the second-best baselines. Code is open-sourced at https://github.com/codefuse-ai/MFTCoder.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2410.06741

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Rodimus*: Breaking the Accuracy-Efficiency Trade-Off with Efficient Attentions

He, Zhihao, Yu, Hang, Gong, Zi, Liu, Shizhan, Li, Jianguo, Lin, Weiyao

arXiv.org Artificial IntelligenceOct-9-2024

Recent advancements in Transformer-based large language models (LLMs) have set new standards in natural language processing. However, the classical softmax attention incurs significant computational costs, leading to a $O(T)$ complexity for per-token generation, where $T$ represents the context length. This work explores reducing LLMs' complexity while maintaining performance by introducing Rodimus and its enhanced version, Rodimus$+$. Rodimus employs an innovative data-dependent tempered selection (DDTS) mechanism within a linear attention-based, purely recurrent framework, achieving significant accuracy while drastically reducing the memory usage typically associated with recurrent models. This method exemplifies semantic compression by maintaining essential input information with fixed-size hidden states. Building on this, Rodimus$+$ combines Rodimus with the innovative Sliding Window Shared-Key Attention (SW-SKA) in a hybrid approach, effectively leveraging the complementary semantic, token, and head compression techniques. Our experiments demonstrate that Rodimus$+$-1.6B, trained on 1 trillion tokens, achieves superior downstream performance against models trained on more tokens, including Qwen2-1.5B and RWKV6-1.6B, underscoring its potential to redefine the accuracy-efficiency balance in LLMs. Model code and pre-trained checkpoints will be available soon.

large language model, machine learning, mechanism, (18 more...)

arXiv.org Artificial Intelligence

2410.06577

Country:

Asia > China (0.14)
North America > Canada (0.14)
Europe > Germany (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Focus On What Matters: Separated Models For Visual-Based RL Generalization

Zhang, Di, Lv, Bowen, Zhang, Hai, Yang, Feifan, Zhao, Junqiao, Yu, Hang, Huang, Chang, Zhou, Hongtu, Ye, Chen, Jiang, Changjun

arXiv.org Artificial IntelligenceSep-29-2024

A primary challenge for visual-based Reinforcement Learning (RL) is to generalize effectively across unseen environments. Although previous studies have explored different auxiliary tasks to enhance generalization, few adopt image reconstruction due to concerns about exacerbating overfitting to task-irrelevant features during training. Perceiving the pre-eminence of image reconstruction in representation learning, we propose SMG (Separated Models for Generalization), a novel approach that exploits image reconstruction for generalization. SMG introduces two model branches to extract task-relevant and task-irrelevant representations separately from visual observations via cooperatively reconstruction. Built upon this architecture, we further emphasize the importance of task-relevant features for generalization. Specifically, SMG incorporates two additional consistency losses to guide the agent's focus toward task-relevant areas across different scenarios, thereby achieving free from overfitting. Extensive experiments in DMC demonstrate the SOTA performance of SMG in generalization, particularly excelling in video-background settings. Evaluations on robotic manipulation tasks further confirm the robustness of SMG in real-world applications. Source code is available at https://anonymous.4open.science/r/SMG/.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2410.10834

Genre:

Research Report > Experimental Study (0.93)
Research Report > New Finding (0.67)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

SQLfuse: Enhancing Text-to-SQL Performance through Comprehensive LLM Synergy

Zhang, Tingkai, Chen, Chaoyu, Liao, Cong, Wang, Jun, Zhao, Xudong, Yu, Hang, Wang, Jianchao, Li, Jianguo, Shi, Wenhui

arXiv.org Artificial IntelligenceJul-19-2024

Text-to-SQL conversion is a critical innovation, simplifying the transition from complex SQL to intuitive natural language queries, especially significant given SQL's prevalence in the job market across various roles. The rise of Large Language Models (LLMs) like GPT-3.5 and GPT-4 has greatly advanced this field, offering improved natural language understanding and the ability to generate nuanced SQL statements. However, the potential of open-source LLMs in Text-to-SQL applications remains underexplored, with many frameworks failing to leverage their full capabilities, particularly in handling complex database queries and incorporating feedback for iterative refinement. Addressing these limitations, this paper introduces SQLfuse, a robust system integrating open-source LLMs with a suite of tools to enhance Text-to-SQL translation's accuracy and usability. SQLfuse features four modules: schema mining, schema linking, SQL generation, and a SQL critic module, to not only generate but also continuously enhance SQL query quality. Demonstrated by its leading performance on the Spider Leaderboard and deployment by Ant Group, SQLfuse showcases the practical merits of open-source LLMs in diverse business contexts.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2407.14568

Country: Asia (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

How Much Progress Did I Make? An Unexplored Human Feedback Signal for Teaching Robots

Yu, Hang, Fang, Qidi, Fang, Shijie, Aronson, Reuben M., Short, Elaine Schaertl

arXiv.org Artificial IntelligenceJul-8-2024

How Much Progress Did I Make? Abstract-- Enhancing the expressiveness of human teaching is vital for both improving robots' learning from humans and the human-teaching-robot experience. In this work, we characterize and test a little-used teaching signal: progress, designed to represent the completion percentage of a task. We conducted two online studies with 76 crowd-sourced participants and one public space study with 40 non-expert participants to validate the capability of this progress signal. We find that progress indicates whether the task is successfully performed, reflects the degree of task completion, identifies unproductive but harmless behaviors, and is likely to be more consistent across participants. Furthermore, our results show that giving progress does not require extra workload and time. An additional contribution of our work is a dataset of 40 non-expert demonstrations from the public space study through an ice cream topping-adding task, which we observe to be multi-policy and sub-optimal, with sub-optimality not only from teleoperation errors but also from exploratory actions and attempts.

artificial intelligence, demonstration, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2407.06459

Country: North America > United States > California (0.14)

Genre: Research Report > New Finding (0.89)

Industry: Leisure & Entertainment > Sports (0.66)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Communications > Social Media > Crowdsourcing (0.48)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.46)

Add feedback