AITopics | Wu, Wei

Collaborating Authors

Wu, Wei

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

From 1,000,000 Users to Every User: Scaling Up Personalized Preference for User-level Alignment

Li, Jia-Nan, Guan, Jian, Wu, Songhao, Wu, Wei, Yan, Rui

arXiv.org Artificial IntelligenceMar-21-2025

Large language models (LLMs) have traditionally been aligned through one-size-fits-all approaches that assume uniform human preferences, fundamentally overlooking the diversity in user values and needs. This paper introduces a comprehensive framework for scalable personalized alignment of LLMs. We establish a systematic preference space characterizing psychological and behavioral dimensions, alongside diverse persona representations for robust preference inference in real-world scenarios. Building upon this foundation, we introduce \textsc{AlignX}, a large-scale dataset of over 1.3 million personalized preference examples, and develop two complementary alignment approaches: \textit{in-context alignment} directly conditioning on persona representations and \textit{preference-bridged alignment} modeling intermediate preference distributions. Extensive experiments demonstrate substantial improvements over existing methods, with an average 17.06\% accuracy gain across four benchmarks while exhibiting a strong adaptation capability to novel preferences, robustness to limited user data, and precise preference controllability. These results validate our framework's effectiveness, advancing toward truly user-adaptive AI systems.

dimension, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2503.15463

Country: Asia > China (0.14)

Genre: Research Report (0.50)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Survey on Personalized Alignment -- The Missing Piece for Large Language Models in Real-World Applications

Guan, Jian, Wu, Junfei, Li, Jia-Nan, Cheng, Chuanqi, Wu, Wei

arXiv.org Artificial IntelligenceMar-21-2025

Large Language Models (LLMs) have demonstrated remarkable capabilities, yet their transition to real-world applications reveals a critical limitation: the inability to adapt to individual preferences while maintaining alignment with universal human values. Current alignment techniques adopt a one-size-fits-all approach that fails to accommodate users' diverse backgrounds and needs. This paper presents the first comprehensive survey of personalized alignment-a paradigm that enables LLMs to adapt their behavior within ethical boundaries based on individual preferences. We propose a unified framework comprising preference memory management, personalized generation, and feedback-based alignment, systematically analyzing implementation approaches and evaluating their effectiveness across various scenarios. By examining current techniques, potential risks, and future challenges, this survey provides a structured foundation for developing more adaptable and ethically-aligned LLMs.

large language model, natural language, personalized alignment, (4 more...)

arXiv.org Artificial Intelligence

2503.17003

Genre:

Research Report (0.69)
Overview (0.53)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

LEMMA: Learning from Errors for MatheMatical Advancement in LLMs

Pan, Zhuoshi, Li, Yu, Lin, Honglin, Pei, Qizhi, Tang, Zinan, Wu, Wei, Ming, Chenlin, Zhao, H. Vicky, He, Conghui, Wu, Lijun

arXiv.org Artificial IntelligenceMar-21-2025

Large language models (LLMs) have demonstrated remarkable reasoning capability in solving mathematical problems. However, existing approaches primarily focus on improving the quality of correct training data, e.g., distilling high-quality correct solutions from advanced models, neglecting the value contained in error data, potentially hindering the model's reflective ability. Though some studies attempt to leverage error data, they often involve complex mechanisms, such as Monte Carlo Tree Search (MCTS) to explore error nodes. In this work, we propose to enhance LLMs' reasoning ability by Learning from Errors for Mathematical Advancement (LEMMA). LEMMA constructs data consisting of an incorrect solution with an erroneous step and a reflection connection to a correct solution for fine-tuning. Specifically, we systematically analyze the model-generated error types and introduce an error-type grounded mistake augmentation method to collect diverse and representative errors. Correct solutions are either from fixing the errors or generating a fresh start. Through a model-aware smooth reflection connection, the erroneous solution is transferred to the correct one. By fine-tuning on the constructed dataset, the model is able to self-correct errors autonomously within the generation process without relying on external critique models. Experimental results demonstrate that LEMMA achieves significant performance improvements over other strong baselines.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2503.17439

Country: Asia > China (0.28)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Every FLOP Counts: Scaling a 300B Mixture-of-Experts LING LLM without Premium GPUs

Ling Team, null, Zeng, Binwei, Huang, Chao, Zhang, Chao, Tian, Changxin, Chen, Cong, Jin, Dingnan, Yu, Feng, Zhu, Feng, Yuan, Feng, Wang, Fakang, Wang, Gangshan, Zhai, Guangyao, Zhang, Haitao, Li, Huizhong, Zhou, Jun, Liu, Jia, Fang, Junpeng, Ou, Junjie, Hu, Jun, Luo, Ji, Zhang, Ji, Liu, Jian, Sha, Jian, Qian, Jianxue, Wu, Jiewei, Zhao, Junping, Li, Jianguo, Feng, Jubao, Di, Jingchao, Xu, Junming, Yao, Jinghua, Xu, Kuan, Du, Kewei, Li, Longfei, Liang, Lei, Yu, Lu, Tang, Li, Ju, Lin, Xu, Peng, Cui, Qing, Liu, Song, Li, Shicheng, Song, Shun, Yan, Song, Cai, Tengwei, Chen, Tianyi, Guo, Ting, Huang, Ting, Feng, Tao, Wu, Tao, Wu, Wei, Zhang, Xiaolu, Yang, Xueming, Zhao, Xin, Hu, Xiaobo, Lin, Xin, Zhao, Yao, Wang, Yilong, Guo, Yongzhen, Wang, Yuanyuan, Yang, Yue, Cao, Yang, Fu, Yuhao, Xiong, Yi, Li, Yanzhe, Li, Zhe, Zhang, Zhiqiang, Liu, Ziqi, Huan, Zhaoxin, Wen, Zujie, Sun, Zhenhang, Du, Zhuoxuan, He, Zhengyu

arXiv.org Artificial IntelligenceMar-10-2025

In this technical report, we tackle the challenges of training large-scale Mixture of Experts (MoE) models, focusing on overcoming cost inefficiency and resource limitations prevalent in such systems. To address these issues, we present two differently sized MoE large language models (LLMs), namely Ling-Lite and Ling-Plus (referred to as "Bailing" in Chinese, spelled B\v{a}il\'ing in Pinyin). Ling-Lite contains 16.8 billion parameters with 2.75 billion activated parameters, while Ling-Plus boasts 290 billion parameters with 28.8 billion activated parameters. Both models exhibit comparable performance to leading industry benchmarks. This report offers actionable insights to improve the efficiency and accessibility of AI development in resource-constrained settings, promoting more scalable and sustainable technologies. Specifically, to reduce training costs for large-scale MoE models, we propose innovative methods for (1) optimization of model architecture and training processes, (2) refinement of training anomaly handling, and (3) enhancement of model evaluation efficiency. Additionally, leveraging high-quality data generated from knowledge graphs, our models demonstrate superior capabilities in tool use compared to other models. Ultimately, our experimental findings demonstrate that a 300B MoE LLM can be effectively trained on lower-performance devices while achieving comparable performance to models of a similar scale, including dense and MoE models. Compared to high-performance devices, utilizing a lower-specification hardware system during the pre-training phase demonstrates significant cost savings, reducing computing costs by approximately 20%. The models can be accessed at https://huggingface.co/inclusionAI.

arxiv preprint arxiv, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2503.05139

Genre: Research Report > New Finding (1.00)

Industry:

Education (0.46)
Energy (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Exploring the Performance Improvement of Tensor Processing Engines through Transformation in the Bit-weight Dimension of MACs

Wu, Qizhe, Liang, Huawen, Gui, Yuchen, Zeng, Zhichen, He, Zerong, Tao, Linfeng, Wang, Xiaotian, Zhao, Letian, Zeng, Zhaoxi, Yuan, Wei, Wu, Wei, Jin, Xi

arXiv.org Artificial IntelligenceMar-8-2025

General matrix-matrix multiplication (GEMM) is a cornerstone of AI computations, making tensor processing engines (TPEs) increasingly critical in GPUs and domain-specific architectures. Existing architectures primarily optimize dataflow or operand reuse strategies. However, considering the interaction between matrix multiplication and multiply-accumulators (MACs) offers greater optimization potential. This work introduces a novel hardware perspective on matrix multiplication, focusing on the bit-weight dimension of MACs. We propose a finer-grained TPE notation using matrix triple loops as an example, introducing new methods for designing and optimizing PE microarchitectures. Based on this notation and its transformations, we propose four optimization techniques that improve timing, area, and power consumption. Implementing our design in RTL using the SMIC-28nm process, we evaluate its effectiveness across four classic TPE architectures: systolic array, 3D-Cube, multiplier-adder tree, and 2D-Matrix. Our techniques achieve area efficiency improvements of 1.27x, 1.28x, 1.56x, and 1.44x, and energy efficiency gains of 1.04x, 1.56x, 1.49x, and 1.20x, respectively. Applied to a bit-slice architecture, our approach achieves a 12.10x improvement in energy efficiency and 2.85x in area efficiency compared to Laconic. Our Verilog HDL code, along with timing, area, and power reports, is available at https://github.com/wqzustc/High-Performance-Tensor-Processing-Engines

dimension, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2503.06342

Country: North America > United States (0.28)

Genre: Research Report (0.64)

Industry: Semiconductors & Electronics (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Architecture (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.87)
Information Technology > Artificial Intelligence > Natural Language (0.68)

Add feedback

Neuron Platonic Intrinsic Representation From Dynamics Using Contrastive Learning

Wu, Wei, Liao, Can, Deng, Zizhen, Guo, Zhengrui, Wang, Jinzhuo

arXiv.org Artificial IntelligenceFeb-18-2025

The Platonic Representation Hypothesis suggests a universal, modality-independent reality representation behind different data modalities. Inspired by this, we view each neuron as a system and detect its multi-segment activity data under various peripheral conditions. We assume there's a time-invariant representation for the same neuron, reflecting its intrinsic properties like molecular profiles, location, and morphology. The goal of obtaining these intrinsic neuronal representations has two criteria: (I) segments from the same neuron should have more similar representations than those from different neurons; (II) the representations must generalize well to out-of-domain data. To meet these, we propose the NeurPIR (Neuron Platonic Intrinsic Representation) framework. It uses contrastive learning, with segments from the same neuron as positive pairs and those from different neurons as negative pairs. In implementation, we use VICReg, which focuses on positive pairs and separates dissimilar samples via regularization. We tested our method on Izhikevich model-simulated neuronal population dynamics data. The results accurately identified neuron types based on preset hyperparameters. We also applied it to two real-world neuron dynamics datasets with neuron type annotations from spatial transcriptomics and neuron locations. Our model's learned representations accurately predicted neuron types and locations and were robust on out-of-domain data (from unseen animals). This shows the potential of our approach for understanding neuronal systems and future neuroscience research.

artificial intelligence, machine learning, neuron, (18 more...)

arXiv.org Artificial Intelligence

2502.10425

Country: Asia > China (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Neuroscience (0.46)

Add feedback

Classifying the Stoichiometry of Virus-like Particles with Interpretable Machine Learning

Zhang, Jiayang, Liu, Xianyuan, Wu, Wei, Tabakhi, Sina, Fan, Wenrui, Zhou, Shuo, Tee, Kang Lan, Wong, Tuck Seng, Lu, Haiping

arXiv.org Artificial IntelligenceFeb-17-2025

Virus-like particles (VLPs) are valuable for vaccine development due to their immune-triggering properties. Understanding their stoichiometry, the number of protein subunits to form a VLP, is critical for vaccine optimisation. However, current experimental methods to determine stoichiometry are time-consuming and require highly purified proteins. To efficiently classify stoichiometry classes in proteins, we curate a new dataset and propose an interpretable, data-driven pipeline leveraging linear machine learning models. We also explore the impact of feature encoding on model performance and interpretability, as well as methods to identify key protein sequence features influencing classification. The evaluation of our pipeline demonstrates that it can classify stoichiometry while revealing protein features that possibly influence VLP assembly. The data and code used in this work are publicly available at https://github.com/Shef-AIRE/StoicIML.

artificial intelligence, classification, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2502.12049

Country: Europe (0.28)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Vaccines (0.89)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback

Theoretical Benefit and Limitation of Diffusion Language Model

Feng, Guhao, Geng, Yihan, Guan, Jian, Wu, Wei, Wang, Liwei, He, Di

arXiv.org Machine LearningFeb-13-2025

Diffusion language models have emerged as a promising approach for text generation. One would naturally expect this method to be an efficient replacement for autoregressive models since multiple tokens can be sampled in parallel during each diffusion step. However, its efficiency-accuracy trade-off is not yet well understood. In this paper, we present a rigorous theoretical analysis of a widely used type of diffusion language model, the Masked Diffusion Model (MDM), and find that its effectiveness heavily depends on the target evaluation metric. Under mild conditions, we prove that when using perplexity as the metric, MDMs can achieve near-optimal perplexity in sampling steps regardless of sequence length, demonstrating that efficiency can be achieved without sacrificing performance. However, when using the sequence error rate--which is important for understanding the "correctness" of a sequence, such as a reasoning chain--we show that the required sampling steps must scale linearly with sequence length to obtain "correct" sequences, thereby eliminating MDM's efficiency advantage over autoregressive models. Our analysis establishes the first theoretical foundation for understanding the benefits and limitations of MDMs. All theoretical findings are supported by empirical studies.

large language model, machine learning, natural language, (16 more...)

arXiv.org Machine Learning

2502.09622

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Human Decision-making is Susceptible to AI-driven Manipulation

Sabour, Sahand, Liu, June M., Liu, Siyang, Yao, Chris Z., Cui, Shiyao, Zhang, Xuanming, Zhang, Wen, Cao, Yaru, Bhat, Advait, Guan, Jian, Wu, Wei, Mihalcea, Rada, Althoff, Tim, Lee, Tatia M. C., Huang, Minlie

arXiv.org Artificial IntelligenceFeb-11-2025

Artificial Intelligence (AI) systems are increasingly intertwined with daily life, assisting users in executing various tasks and providing guidance on decision-making. This integration introduces risks of AI-driven manipulation, where such systems may exploit users' cognitive biases and emotional vulnerabilities to steer them toward harmful outcomes. Through a randomized controlled trial with 233 participants, we examined human susceptibility to such manipulation in financial (e.g., purchases) and emotional (e.g., conflict resolution) decision-making contexts. Participants interacted with one of three AI agents: a neutral agent (NA) optimizing for user benefit without explicit influence, a manipulative agent (MA) designed to covertly influence beliefs and behaviors, or a strategy-enhanced manipulative agent (SEMA) employing explicit psychological tactics to reach its hidden objectives. By analyzing participants' decision patterns and shifts in their preference ratings post-interaction, we found significant susceptibility to AI-driven manipulation. Particularly, across both decision-making domains, participants interacting with the manipulative agents shifted toward harmful options at substantially higher rates (financial, MA: 62.3%, SEMA: 59.6%; emotional, MA: 42.3%, SEMA: 41.5%) compared to the NA group (financial, 35.8%; emotional, 12.8%). Notably, our findings reveal that even subtle manipulative objectives (MA) can be as effective as employing explicit psychological strategies (SEMA) in swaying human decision-making. By revealing the potential for covert AI influence, this study highlights a critical vulnerability in human-AI interactions, emphasizing the need for ethical safeguards and regulatory frameworks to ensure responsible deployment of AI technologies and protect human autonomy.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2502.07663

Country:

North America > United States > Washington > King County > Seattle (0.14)
North America > United States > Michigan (0.14)

Genre:

Research Report > Strength High (1.00)
Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (1.00)
Health & Medicine > Consumer Health (1.00)
Government (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
(2 more...)

Add feedback

GENERator: A Long-Context Generative Genomic Foundation Model

Wu, Wei, Li, Qiuyi, Li, Mingyang, Fu, Kun, Feng, Fuli, Ye, Jieping, Xiong, Hui, Wang, Zheng

arXiv.org Artificial IntelligenceFeb-11-2025

Advancements in DNA sequencing technologies have significantly improved our ability to decode genomic sequences. However, the prediction and interpretation of these sequences remain challenging due to the intricate nature of genetic material. Large language models (LLMs) have introduced new opportunities for biological sequence analysis. Recent developments in genomic language models have underscored the potential of LLMs in deciphering DNA sequences. Nonetheless, existing models often face limitations in robustness and application scope, primarily due to constraints in model structure and training data scale. To address these limitations, we present GENERator, a generative genomic foundation model featuring a context length of 98k base pairs (bp) and 1.2B parameters. Trained on an expansive dataset comprising 386B bp of eukaryotic DNA, the GENERator demonstrates state-of-the-art performance across both established and newly proposed benchmarks. The model adheres to the central dogma of molecular biology, accurately generating protein-coding sequences that translate into proteins structurally analogous to known families. It also shows significant promise in sequence optimization, particularly through the prompt-responsive generation of promoter sequences with specific activity profiles. These capabilities position the GENERator as a pivotal tool for genomic research and biotechnological advancement, enhancing our ability to interpret and predict complex biological systems and enabling precise genomic interventions.

bioinformatics, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2502.07272

Country: Asia > China (0.68)

Genre:

Research Report (1.00)
Overview (0.67)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Biomedical Informatics > Translational Bioinformatics (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback