AITopics | Lu, Xingyu

Collaborating Authors

Lu, Xingyu

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

RLCAD: Reinforcement Learning Training Gym for Revolution Involved CAD Command Sequence Generation

Yin, Xiaolong, Lu, Xingyu, Shen, Jiahang, Ni, Jingzhe, Li, Hailong, Tong, Ruofeng, Tang, Min, Du, Peng

arXiv.org Artificial IntelligenceMar-24-2025

A CAD command sequence is a typical parametric design paradigm in 3D CAD systems where a model is constructed by overlaying 2D sketches with operations such as extrusion, revolution, and Boolean operations. Although there is growing academic interest in the automatic generation of command sequences, existing methods and datasets only support operations such as 2D sketching, extrusion,and Boolean operations. This limitation makes it challenging to represent more complex geometries. In this paper, we present a reinforcement learning (RL) training environment (gym) built on a CAD geometric engine. Given an input boundary representation (B-Rep) geometry, the policy network in the RL algorithm generates an action. This action, along with previously generated actions, is processed within the gym to produce the corresponding CAD geometry, which is then fed back into the policy network. The rewards, determined by the difference between the generated and target geometries within the gym, are used to update the RL network. Our method supports operations beyond sketches, Boolean, and extrusion, including revolution operations. With this training gym, we achieve state-of-the-art (SOTA) quality in generating command sequences from B-Rep geometries. In addition, our method can significantly improve the efficiency of command sequence generation by a factor of 39X compared with the previous training gym.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

2503.18549

Country: Asia > China (0.14)

Genre: Research Report (0.50)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Kwai-STaR: Transform LLMs into State-Transition Reasoners

Lu, Xingyu, Hu, Yuhang, Liu, Changyi, Zhang, Tianke, Yang, Zhenyu, Ding, Zhixiang, Qian, Shengsheng, Du, Meng, Kang, Ruiwen, Tang, Kaiyu, Yang, Fan, Gao, Tingting, Zhang, Di, Zheng, Hai-Tao, Wen, Bin

arXiv.org Artificial IntelligenceNov-12-2024

Mathematical reasoning presents a significant challenge to the cognitive capabilities of LLMs. Various methods have been proposed to enhance the mathematical ability of LLMs. However, few recognize the value of state transition for LLM reasoning. In this work, we define mathematical problem-solving as a process of transiting from an initial unsolved state to the final resolved state, and propose Kwai-STaR framework, which transforms LLMs into State-Transition Reasoners to improve their intuitive reasoning capabilities. Our approach comprises three main steps: (1) Define the state space tailored to the mathematical reasoning. (2) Generate state-transition data based on the state space. (3) Convert original LLMs into State-Transition Reasoners via a curricular training strategy. Our experiments validate the effectiveness of Kwai-STaR in enhancing mathematical reasoning: After training on the small-scale Kwai-STaR dataset, general LLMs, including Mistral-7B and LLaMA-3, achieve considerable performance gain on the GSM8K and GSM-Hard dataset. Additionally, the state transition-based design endows Kwai-STaR with remarkable training and inference efficiency. Further experiments are underway to establish the generality of Kwai-STaR.

kwai-star, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2411.04799

Country: Asia > China (0.14)

Genre: Research Report (0.40)

Industry: Education (0.71)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

LLMOPT: Learning to Define and Solve General Optimization Problems from Scratch

Jiang, Caigao, Shu, Xiang, Qian, Hong, Lu, Xingyu, Zhou, Jun, Zhou, Aimin, Yu, Yang

arXiv.org Artificial IntelligenceOct-17-2024

Optimization problems are prevalent across various scenarios. Formulating and then solving optimization problems described by natural language often requires highly specialized human expertise, which could block the widespread application of optimization-based decision making. To make problem formulating and solving automated, leveraging large language models (LLMs) has emerged as a potential way. However, this kind of way suffers from the issue of optimization generalization. Namely, the accuracy of most current LLM-based methods and the generality of optimization problem types that they can model are still limited. In this paper, we propose a unified learning-based framework called LLMOPT to boost optimization generalization. Starting from the natural language descriptions of optimization problems and a pre-trained LLM, LLMOPT constructs the introduced five-element formulation as a universal model for learning to define diverse optimization problem types. Then, LLMOPT employs the multi-instruction tuning to enhance both problem formalization and solver code generation accuracy and generality. After that, to prevent hallucinations in LLMs, such as sacrificing solving accuracy to avoid execution errors, model alignment and self-correction mechanism are adopted in LLMOPT. We evaluate the optimization generalization ability of LLMOPT and compared methods across six real-world datasets covering roughly 20 fields such as health, environment, energy and manufacturing, etc. Extensive experiment results show that LLMOPT is able to model various optimization problem types such as linear/nonlinear programming, mixed integer programming and combinatorial optimization, and achieves a notable 11.08% average solving accuracy improvement compared with the state-of-the-art methods. The code is available at https://github.com/caigaojiang/LLMOPT.

large language model, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2410.13213

Country:

Europe > Austria (0.29)
Europe > United Kingdom > England (0.28)
Asia > China (0.28)

Genre: Research Report > New Finding (0.88)

Industry: Banking & Finance (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)

Add feedback

Scaling Laws for Fact Memorization of Large Language Models

Lu, Xingyu, Li, Xiaonan, Cheng, Qinyuan, Ding, Kai, Huang, Xuanjing, Qiu, Xipeng

arXiv.org Artificial IntelligenceJun-21-2024

Fact knowledge memorization is crucial for Large Language Models (LLM) to generate factual and reliable responses. However, the behaviors of LLM fact memorization remain under-explored. In this paper, we analyze the scaling laws for LLM's fact knowledge and LLMs' behaviors of memorizing different types of facts. We find that LLMs' fact knowledge capacity has a linear and negative exponential law relationship with model size and training epochs, respectively. Estimated by the built scaling law, memorizing the whole Wikidata's facts requires training an LLM with 1000B non-embed parameters for 100 epochs, suggesting that using LLMs to memorize all public facts is almost implausible for a general pre-training setting. Meanwhile, we find that LLMs can generalize on unseen fact knowledge and its scaling law is similar to general pre-training. Additionally, we analyze the compatibility and preference of LLMs' fact memorization. For compatibility, we find LLMs struggle with memorizing redundant facts in a unified way. Only when correlated facts have the same direction and structure, the LLM can compatibly memorize them. This shows the inefficiency of LLM memorization for redundant facts. For preference, the LLM pays more attention to memorizing more frequent and difficult facts, and the subsequent facts can overwrite prior facts' memorization, which significantly hinders low-frequency facts memorization. Our findings reveal the capacity and characteristics of LLMs' fact knowledge learning, which provide directions for LLMs' fact knowledge augmentation.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2406.1572

Country:

Asia > China (0.28)
North America (0.28)

Genre: Research Report > New Finding (0.87)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)

Add feedback

MoleculeQA: A Dataset to Evaluate Factual Accuracy in Molecular Comprehension

Lu, Xingyu, Cao, He, Liu, Zijing, Bai, Shengyuan, Chen, Leqing, Yao, Yuan, Zheng, Hai-Tao, Li, Yu

arXiv.org Artificial IntelligenceMar-12-2024

Large language models are playing an increasingly significant role in molecular research, yet existing models often generate erroneous information, posing challenges to accurate molecular comprehension. Traditional evaluation metrics for generated content fail to assess a model's accuracy in molecular understanding. To rectify the absence of factual evaluation, we present MoleculeQA, a novel question answering (QA) dataset which possesses 62K QA pairs over 23K molecules. Each QA pair, composed of a manual question, a positive option and three negative options, has consistent semantics with a molecular description from authoritative molecular corpus. MoleculeQA is not only the first benchmark for molecular factual bias evaluation but also the largest QA dataset for molecular research. A comprehensive evaluation on MoleculeQA for existing molecular LLMs exposes their deficiencies in specific areas and pinpoints several particularly crucial factors for molecular understanding.

large language model, machine learning, question answering, (22 more...)

arXiv.org Artificial Intelligence

2403.08192

Country: North America > United States (0.93)

Genre: Research Report (0.64)

Industry:

Materials > Chemicals (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
(5 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.89)

Add feedback

Bidirectional End-to-End Learning of Retriever-Reader Paradigm for Entity Linking

Li, Yinghui, Jiang, Yong, Huang, Shen, Lu, Xingyu, Li, Yangning, Xie, Pengjun, Huang, Fei, Zheng, Hai-Tao, Shen, Ying

arXiv.org Artificial IntelligenceJan-10-2024

Entity Linking (EL) is a fundamental task for Information Extraction and Knowledge Graphs. The general form of EL (i.e., end-to-end EL) aims to first find mentions in the given input document and then link the mentions to corresponding entities in a specific knowledge base. Recently, the paradigm of retriever-reader promotes the progress of end-to-end EL, benefiting from the advantages of dense entity retrieval and machine reading comprehension. However, the existing study only trains the retriever and the reader separately in a pipeline manner, which ignores the benefit that the interaction between the retriever and the reader can bring to the task. To advance the retriever-reader paradigm to perform more perfectly on end-to-end EL, we propose BEER$^2$, a Bidirectional End-to-End training framework for Retriever and Reader. Through our designed bidirectional end-to-end training, BEER$^2$ guides the retriever and the reader to learn from each other, make progress together, and ultimately improve EL performance. Extensive experiments on benchmarks of multiple domains demonstrate the effectiveness of our proposed BEER$^2$.

data mining, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2306.12245

Country:

Europe (1.00)
Asia > China (0.94)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre:

Research Report (0.64)
Overview (0.46)

Industry: Education > Educational Setting > Higher Education (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Data Science > Data Mining (0.87)

Add feedback

GreenFlow: A Computation Allocation Framework for Building Environmentally Sound Recommendation System

Lu, Xingyu, Liu, Zhining, Guan, Yanchu, Zhang, Hongxuan, Zhuang, Chenyi, Ma, Wenqi, Tan, Yize, Gu, Jinjie, Zhang, Guannan

arXiv.org Artificial IntelligenceDec-15-2023

Given the enormous number of users and items, industrial cascade recommendation systems (RS) are continuously expanded in size and complexity to deliver relevant items, such as news, services, and commodities, to the appropriate users. In a real-world scenario with hundreds of thousands requests per second, significant computation is required to infer personalized results for each request, resulting in a massive energy consumption and carbon emission that raises concern. This paper proposes GreenFlow, a practical computation allocation framework for RS, that considers both accuracy and carbon emission during inference. For each stage (e.g., recall, pre-ranking, ranking, etc.) of a cascade RS, when a user triggers a request, we define two actions that determine the computation: (1) the trained instances of models with different computational complexity; and (2) the number of items to be inferred in the stage. We refer to the combinations of actions in all stages as action chains. A reward score is estimated for each action chain, followed by dynamic primal-dual optimization considering both the reward and computation budget. Extensive experiments verify the effectiveness of the framework, reducing computation consumption by 41% in an industrial mobile application while maintaining commercial revenue. Moreover, the proposed framework saves approximately 5000kWh of electricity and reduces 3 tons of carbon emissions per day.

action chain, artificial intelligence, machine learning, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.24963/ijcai.2023/677

2312.16176

Genre: Research Report (0.40)

Industry: Energy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

InstructMol: Multi-Modal Integration for Building a Versatile and Reliable Molecular Assistant in Drug Discovery

Cao, He, Liu, Zijing, Lu, Xingyu, Yao, Yuan, Li, Yu

arXiv.org Artificial IntelligenceNov-27-2023

The rapid evolution of artificial intelligence in drug discovery encounters challenges with generalization and extensive training, yet Large Language Models (LLMs) offer promise in reshaping interactions with complex molecular data. Our novel contribution, InstructMol, a multi-modal LLM, effectively aligns molecular structures with natural language via an instruction-tuning approach, utilizing a two-stage training strategy that adeptly combines limited domain-specific data with molecular and textual information. InstructMol showcases substantial performance improvements in drug discovery-related molecular tasks, surpassing leading LLMs and significantly reducing the gap with specialized models, thereby establishing a robust foundation for a versatile and dependable drug discovery assistant.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2311.16208

Country:

North America > United States (0.14)
Asia (0.14)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Immunology (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

OptScaler: A Hybrid Proactive-Reactive Framework for Robust Autoscaling in the Cloud

Zou, Ding, Lu, Wei, Zhu, Zhibo, Lu, Xingyu, Zhou, Jun, Wang, Xiaojin, Liu, Kangyu, Wang, Haiqing, Wang, Kefan, Sun, Renen

arXiv.org Artificial IntelligenceOct-26-2023

Autoscaling is a vital mechanism in cloud computing that supports the autonomous adjustment of computing resources under dynamic workloads. A primary goal of autoscaling is to stabilize resource utilization at a desirable level, thus reconciling the need for resource-saving with the satisfaction of Service Level Objectives (SLOs). Existing proactive autoscaling methods anticipate the future workload and scale the resources in advance, whereas the reliability may suffer from prediction deviations arising from the frequent fluctuations and noise of cloud workloads; reactive methods rely on real-time system feedback, while the hysteretic nature of reactive methods could cause violations of the rigorous SLOs. To this end, this paper presents OptScaler, a hybrid autoscaling framework that integrates the power of both proactive and reactive methods for regulating CPU utilization. Specifically, the proactive module of OptScaler consists of a sophisticated workload prediction model and an optimization model, where the former provides reliable inputs to the latter for making optimal scaling decisions. The reactive module provides a self-tuning estimator of CPU utilization to the optimization model. We embed Model Predictive Control (MPC) mechanism and robust optimization techniques into the optimization model to further enhance its reliability. Numerical results have demonstrated the superiority of both the workload prediction model and the hybrid framework of OptScaler in the scenario of online services compared to prevalent reactive, proactive, or hybrid autoscalers. OptScaler has been successfully deployed at Alipay, supporting the autoscaling of applets in the world-leading payment platform.

artificial intelligence, cloud computing, workload, (17 more...)

arXiv.org Artificial Intelligence

2311.12864

Country:

Asia > China (0.14)
North America > United States (0.14)
Europe > Italy (0.14)

Genre: Research Report > New Finding (0.66)

Industry:

Information Technology > Services (0.93)
Energy > Oil & Gas (0.56)

Technology:

Information Technology > Cloud Computing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Architecture (1.00)

Add feedback

MISSRec: Pre-training and Transferring Multi-modal Interest-aware Sequence Representation for Recommendation

Wang, Jinpeng, Zeng, Ziyun, Wang, Yunxiao, Wang, Yuting, Lu, Xingyu, Li, Tianxiang, Yuan, Jun, Zhang, Rui, Zheng, Hai-Tao, Xia, Shu-Tao

arXiv.org Artificial IntelligenceOct-23-2023

The goal of sequential recommendation (SR) is to predict a user's potential interested items based on her/his historical interaction sequences. Most existing sequential recommenders are developed based on ID features, which, despite their widespread use, often underperform with sparse IDs and struggle with the cold-start problem. Besides, inconsistent ID mappings hinder the model's transferability, isolating similar recommendation domains that could have been co-optimized. This paper aims to address these issues by exploring the potential of multi-modal information in learning robust and generalizable sequence representations. We propose MISSRec, a multi-modal pre-training and transfer learning framework for SR. On the user side, we design a Transformer-based encoder-decoder model, where the contextual encoder learns to capture the sequence-level multi-modal user interests while a novel interest-aware decoder is developed to grasp item-modality-interest relations for better sequence representation. On the candidate item side, we adopt a dynamic fusion module to produce user-adaptive item representation, providing more precise matching between users and items. We pre-train the model with contrastive learning objectives and fine-tune it in an efficient manner. Extensive experiments demonstrate the effectiveness and flexibility of MISSRec, promising a practical solution for real-world recommendation scenarios. Data and code are available on \url{https://github.com/gimpong/MM23-MISSRec}.

artificial intelligence, machine learning, recommendation, (13 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3581783.3611967

2308.11175

Country:

North America (0.47)
Asia > China > Guangdong Province (0.15)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback