AITopics | Lin, Pengxiao

Collaborating Authors

Lin, Pengxiao

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Reasoning Bias of Next Token Prediction Training

Lin, Pengxiao, Zhang, Zhongwang, Xu, Zhi-Qin John

arXiv.org Artificial IntelligenceFeb-3-2025

Since the inception of Large Language Models (LLMs), the quest to efficiently train them for superior reasoning capabilities has been a pivotal challenge. The dominant training paradigm for LLMs is based on next token prediction (NTP). Alternative methodologies, called Critical Token Prediction (CTP), focused exclusively on specific critical tokens (such as the answer in Q\&A dataset), aiming to reduce the overfitting of extraneous information and noise. Contrary to initial assumptions, our research reveals that despite NTP's exposure to noise during training, it surpasses CTP in reasoning ability. We attribute this counterintuitive outcome to the regularizing influence of noise on the training dynamics. Our empirical analysis shows that NTP-trained models exhibit enhanced generalization and robustness across various benchmark reasoning datasets, demonstrating greater resilience to perturbations and achieving flatter loss minima. These findings illuminate that NTP is instrumental in fostering reasoning abilities during pretraining, whereas CTP is more effective for finetuning, thereby enriching our comprehension of optimal training strategies in LLM development.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2502.02007

Country:

Europe (0.67)
Asia > China (0.28)
North America > United States (0.28)
Asia > Middle East > UAE (0.14)

Genre: Research Report > New Finding (0.93)

Industry: Leisure & Entertainment (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.88)

Add feedback

Complexity Control Facilitates Reasoning-Based Compositional Generalization in Transformers

Zhang, Zhongwang, Lin, Pengxiao, Wang, Zhiwei, Zhang, Yaoyu, Xu, Zhi-Qin John

arXiv.org Artificial IntelligenceJan-14-2025

Transformers have demonstrated impressive capabilities across various tasks, yet their performance on compositional problems remains a subject of debate. In this study, we investigate the internal mechanisms underlying Transformers' behavior in compositional tasks. We find that complexity control strategies significantly influence whether the model learns primitive-level rules that generalize out-of-distribution (reasoning-based solutions) or relies solely on memorized mappings (memory-based solutions). By applying masking strategies to the model's information circuits and employing multiple complexity metrics, we reveal distinct internal working mechanisms associated with different solution types. Further analysis reveals that reasoning-based solutions exhibit a lower complexity bias, which aligns with the well-studied neuron condensation phenomenon. This lower complexity bias is hypothesized to be the key factor enabling these solutions to learn reasoning rules. We validate these conclusions across multiple real-world datasets, including image generation and natural language processing tasks, confirming the broad applicability of our findings.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2501.08537

Country: Asia > China (0.15)

Genre: Research Report > New Finding (0.68)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)

Add feedback

M$^{2}$M: Learning controllable Multi of experts and multi-scale operators are the Partial Differential Equations need

Liang, Aoming, Mu, Zhaoyang, Lin, Pengxiao, Wang, Cong, Ge, Mingming, Shao, Ling, Fan, Dixia, Tang, Hao

arXiv.org Artificial IntelligenceOct-1-2024

Learning the evolutionary dynamics of Partial Differential Equations (PDEs) is critical in understanding dynamic systems, yet current methods insufficiently learn their representations. This is largely due to the multi-scale nature of the solution, where certain regions exhibit rapid oscillations while others evolve more slowly. This paper introduces a framework of multi-scale and multi-expert (M$^2$M) neural operators designed to simulate and learn PDEs efficiently. We employ a divide-and-conquer strategy to train a multi-expert gated network for the dynamic router policy. Our method incorporates a controllable prior gating mechanism that determines the selection rights of experts, enhancing the model's efficiency. To optimize the learning process, we have implemented a PI (Proportional, Integral) control strategy to adjust the allocation rules precisely. This universal controllable approach allows the model to achieve greater accuracy. We test our approach on benchmark 2D Navier-Stokes equations and provide a custom multi-scale dataset. M$^2$M can achieve higher simulation accuracy and offer improved interpretability compared to baseline methods.

artificial intelligence, deep learning, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2410.11617

Country: Asia > China (0.69)

Genre: Research Report (0.64)

Industry: Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Initialization is Critical to Whether Transformers Fit Composite Functions by Inference or Memorizing

Zhang, Zhongwang, Lin, Pengxiao, Wang, Zhiwei, Zhang, Yaoyu, Xu, Zhi-Qin John

arXiv.org Artificial IntelligenceMay-24-2024

Transformers have shown impressive capabilities across various tasks, but their performance on compositional problems remains a topic of debate. In this work, we investigate the mechanisms of how transformers behave on unseen compositional tasks. We discover that the parameter initialization scale plays a critical role in determining whether the model learns inferential solutions, which capture the underlying compositional primitives, or symmetric solutions, which simply memorize mappings without understanding the compositional structure. By analyzing the information flow and vector representations within the model, we reveal the distinct mechanisms underlying these solution types. We further find that inferential solutions exhibit low complexity bias, which we hypothesize is a key factor enabling them to learn individual mappings for single anchors. Building upon the understanding of these mechanisms, we can predict the learning behavior of models with different initialization scales when faced with data of varying complexity. Our findings provide valuable insights into the role of initialization scale in shaping the type of solution learned by transformers and their ability to learn and generalize compositional tasks.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2405.05409

Country: Asia > China (0.14)

Genre: Research Report > New Finding (0.66)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback