AITopics | Bao, Yu

Collaborating Authors

Bao, Yu

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Unveiling the Mathematical Reasoning in DeepSeek Models: A Comparative Study of Large Language Models

Jahin, Afrar, Zidan, Arif Hassan, Bao, Yu, Liang, Shizhe, Liu, Tianming, Zhang, Wei

arXiv.org Artificial IntelligenceMar-13-2025

With the rapid evolution of Artificial Intelligence (AI), Large Language Models (LLMs) have reshaped the frontiers of various fields, spanning healthcare, public health, engineering, science, agriculture, education, arts, humanities, and mathematical reasoning. Among these advancements, DeepSeek models have emerged as noteworthy contenders, demonstrating promising capabilities that set them apart from their peers. While previous studies have conducted comparative analyses of LLMs, few have delivered a comprehensive evaluation of mathematical reasoning across a broad spectrum of LLMs. In this work, we aim to bridge this gap by conducting an in-depth comparative study, focusing on the strengths and limitations of DeepSeek models in relation to their leading counterparts. In particular, our study systematically evaluates the mathematical reasoning performance of two DeepSeek models alongside five prominent LLMs across three independent benchmark datasets. The findings reveal several key insights: 1). DeepSeek-R1 consistently achieved the highest accuracy on two of the three datasets, demonstrating strong mathematical reasoning capabilities. 2). The distilled variant of LLMs significantly underperformed compared to its peers, highlighting potential drawbacks in using distillation techniques. 3). In terms of response time, Gemini 2.0 Flash demonstrated the fastest processing speed, outperforming other models in efficiency, which is a crucial factor for real-time applications. Beyond these quantitative assessments, we delve into how architecture, training, and optimization impact LLMs' mathematical reasoning. Moreover, our study goes beyond mere performance comparison by identifying key areas for future advancements in LLM-driven mathematical reasoning. This research enhances our understanding of LLMs' mathematical reasoning and lays the groundwork for future advancements

arxiv preprint arxiv, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2503.10573

Country: North America > United States > Georgia > Clarke County > Athens (0.14)

Genre:

Overview (1.00)
Research Report > New Finding (0.93)

Industry:

Health & Medicine (1.00)
Education > Educational Setting (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Practical quantum federated learning and its experimental demonstration

Liu, Zhi-Ping, Cao, Xiao-Yu, Liu, Hao-Wen, Sun, Xiao-Ran, Bao, Yu, Lu, Yu-Shuo, Yin, Hua-Lei, Chen, Zeng-Bing

arXiv.org Artificial IntelligenceJan-22-2025

Federated learning is essential for decentralized, privacy-preserving model training in the data-driven era. Quantum-enhanced federated learning leverages quantum resources to address privacy and scalability challenges, offering security and efficiency advantages beyond classical methods. However, practical and scalable frameworks addressing privacy concerns in the quantum computing era remain undeveloped. Here, we propose a practical quantum federated learning framework on quantum networks, utilizing distributed quantum secret keys to protect local model updates and enable secure aggregation with information-theoretic security. We experimentally validate our framework on a 4-client quantum network with a scalable structure. Extensive numerical experiments on both quantum and classical datasets show that adding a quantum client significantly enhances the trained global model's ability to classify multipartite entangled and non-stabilizer quantum datasets. Simulations further demonstrate scalability to 200 clients with classical models trained on the MNIST dataset, reducing communication costs by $75\%$ through advanced model compression techniques and achieving rapid training convergence. Our work provides critical insights for building scalable, efficient, and quantum-secure machine learning systems for the coming quantum internet era.

artificial intelligence, deep learning, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2501.12709

Country:

Asia > China (0.14)
North America > United States (0.14)

Genre: Research Report (0.40)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Evaluation of OpenAI o1: Opportunities and Challenges of AGI

Zhong, Tianyang, Liu, Zhengliang, Pan, Yi, Zhang, Yutong, Zhou, Yifan, Liang, Shizhe, Wu, Zihao, Lyu, Yanjun, Shu, Peng, Yu, Xiaowei, Cao, Chao, Jiang, Hanqi, Chen, Hanxu, Li, Yiwei, Chen, Junhao, Hu, Huawen, Liu, Yihen, Zhao, Huaqin, Xu, Shaochen, Dai, Haixing, Zhao, Lin, Zhang, Ruidong, Zhao, Wei, Yang, Zhenyuan, Chen, Jingyuan, Wang, Peilong, Ruan, Wei, Wang, Hui, Zhao, Huan, Zhang, Jing, Ren, Yiming, Qin, Shihuan, Chen, Tong, Li, Jiaxi, Zidan, Arif Hassan, Jahin, Afrar, Chen, Minheng, Xia, Sichen, Holmes, Jason, Zhuang, Yan, Wang, Jiaqi, Xu, Bochen, Xia, Weiran, Yu, Jichao, Tang, Kaibo, Yang, Yaxuan, Sun, Bolun, Yang, Tao, Lu, Guoyu, Wang, Xianqiao, Chai, Lilong, Li, He, Lu, Jin, Sun, Lichao, Zhang, Xin, Ge, Bao, Hu, Xintao, Zhang, Lian, Zhou, Hua, Zhang, Lu, Zhang, Shu, Liu, Ninghao, Jiang, Bei, Kong, Linglong, Xiang, Zhen, Ren, Yudan, Liu, Jun, Jiang, Xi, Bao, Yu, Zhang, Wei, Li, Xiang, Li, Gang, Liu, Wei, Shen, Dinggang, Sikora, Andrea, Zhai, Xiaoming, Zhu, Dajiang, Liu, Tianming

arXiv.org Artificial IntelligenceSep-27-2024

This comprehensive study evaluates the performance of OpenAI's o1-preview large language model across a diverse array of complex reasoning tasks, spanning multiple domains, including computer science, mathematics, natural sciences, medicine, linguistics, and social sciences. Through rigorous testing, o1-preview demonstrated remarkable capabilities, often achieving human-level or superior performance in areas ranging from coding challenges to scientific reasoning and from language processing to creative problem-solving. Key findings include: -83.3% success rate in solving complex competitive programming problems, surpassing many human experts. -Superior ability in generating coherent and accurate radiology reports, outperforming other evaluated models. -100% accuracy in high school-level mathematical reasoning tasks, providing detailed step-by-step solutions. -Advanced natural language inference capabilities across general and specialized domains like medicine. -Impressive performance in chip design tasks, outperforming specialized models in areas such as EDA script generation and bug analysis. -Remarkable proficiency in anthropology and geology, demonstrating deep understanding and reasoning in these specialized fields. -Strong capabilities in quantitative investing. O1 has comprehensive financial knowledge and statistical modeling skills. -Effective performance in social media analysis, including sentiment analysis and emotion recognition. The model excelled particularly in tasks requiring intricate reasoning and knowledge integration across various fields. While some limitations were observed, including occasional errors on simpler problems and challenges with certain highly specialized concepts, the overall results indicate significant progress towards artificial general intelligence.

chip design-engineering assistant chatbot, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2409.18486

Country:

Asia > China (1.00)
North America > United States > Texas (0.47)
North America > United States > Georgia > Clarke County > Athens (0.14)
(3 more...)

Genre:

Research Report > Promising Solution (1.00)
Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
(2 more...)

Industry:

Leisure & Entertainment (1.00)
Information Technology (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)
(12 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.70)

Add feedback

EDT: Improving Large Language Models' Generation by Entropy-based Dynamic Temperature Sampling

Zhang, Shimao, Bao, Yu, Huang, Shujian

arXiv.org Artificial IntelligenceApr-3-2024

Recently, Large Language Models (LLMs) have demonstrated outstanding performance across a wide range of downstream language tasks. Temperature sampling is a commonly used decoding strategy for LLMs' generation process. However, a fixed temperature parameter is used in most cases, which may not always be an optimal choice for balancing generation quality and diversity. In this paper, we propose an effective Entropy-based Dynamic Temperature (EDT) Sampling method, to achieve a more balanced performance in terms of both generation quality and diversity by dynamically selecting the temperature parameter. Additionally, we also show model performance and comprehensive analyses for 4 different generation benchmarks. Our experiments show that EDT significantly outperforms the existing strategies across different tasks.

artificial intelligence, entropy-based dynamic temperature sampling, large language model, (1 more...)

arXiv.org Artificial Intelligence

2403.14541

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

DecompOpt: Controllable and Decomposed Diffusion Models for Structure-based Molecular Optimization

Zhou, Xiangxin, Cheng, Xiwei, Yang, Yuwei, Bao, Yu, Wang, Liang, Gu, Quanquan

arXiv.org Artificial IntelligenceMar-6-2024

Recently, 3D generative models have shown promising performances in structure-based drug design by learning to generate ligands given target binding sites. However, only modeling the target-ligand distribution can hardly fulfill one of the main goals in drug discovery -- designing novel ligands with desired properties, e.g., high binding affinity, easily synthesizable, etc. This challenge becomes particularly pronounced when the target-ligand pairs used for training do not align with these desired properties. Moreover, most existing methods aim at solving \textit{de novo} design task, while many generative scenarios requiring flexible controllability, such as R-group optimization and scaffold hopping, have received little attention. In this work, we propose DecompOpt, a structure-based molecular optimization method based on a controllable and decomposed diffusion model. DecompOpt presents a new generation paradigm which combines optimization with conditional diffusion models to achieve desired properties while adhering to the molecular grammar. Additionally, DecompOpt offers a unified framework covering both \textit{de novo} design and controllable generation. To achieve so, ligands are decomposed into substructures which allows fine-grained control and local optimization. Experiments show that DecompOpt can efficiently generate molecules with improved properties than strong de novo baselines, and demonstrate great potential in controllable generation tasks.

artificial intelligence, machine learning, optimization problem, (21 more...)

arXiv.org Artificial Intelligence

2403.13829

Country: North America > United States > California (0.14)

Genre: Research Report (0.64)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

DecompDiff: Diffusion Models with Decomposed Priors for Structure-Based Drug Design

Guan, Jiaqi, Zhou, Xiangxin, Yang, Yuwei, Bao, Yu, Peng, Jian, Ma, Jianzhu, Liu, Qiang, Wang, Liang, Gu, Quanquan

arXiv.org Artificial IntelligenceFeb-26-2024

Designing 3D ligands within a target binding site is a fundamental task in drug discovery. Existing structured-based drug design methods treat all ligand atoms equally, which ignores different roles of atoms in the ligand for drug design and can be less efficient for exploring the large drug-like molecule space. In this paper, inspired by the convention in pharmaceutical practice, we decompose the ligand molecule into two parts, namely arms and scaffold, and propose a new diffusion model, DecompDiff, with decomposed priors over arms and scaffold. In order to facilitate the decomposed generation and improve the properties of the generated molecules, we incorporate both bond diffusion in the model and additional validity guidance in the sampling phase. Extensive experiments on CrossDocked2020 show that our approach achieves state-of-the-art performance in generating high-affinity molecules while maintaining proper molecular properties and conformational stability, with up to -8.39 Avg. Vina Dock score and 24.5 Success Rate. The code is provided at https://github.com/bytedance/DecompDiff

artificial intelligence, machine learning, molecule, (15 more...)

arXiv.org Artificial Intelligence

2403.07902

Country:

North America > United States > Illinois (0.14)
North America > United States > Hawaii (0.14)

Genre: Research Report (0.50)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Binding-Adaptive Diffusion Models for Structure-Based Drug Design

Huang, Zhilin, Yang, Ling, Zhang, Zaixi, Zhou, Xiangxin, Bao, Yu, Zheng, Xiawu, Yang, Yuwei, Wang, Yu, Yang, Wenming

arXiv.org Artificial IntelligenceJan-14-2024

Structure-based drug design (SBDD) aims to generate 3D ligand molecules that bind to specific protein targets. Existing 3D deep generative models including diffusion models have shown great promise for SBDD. However, it is complex to capture the essential protein-ligand interactions exactly in 3D space for molecular generation. To address this problem, we propose a novel framework, namely Binding-Adaptive Diffusion Models (BindDM). In BindDM, we adaptively extract subcomplex, the essential part of binding sites responsible for protein-ligand interactions. Then the selected protein-ligand subcomplex is processed with SE(3)-equivariant neural networks, and transmitted back to each atom of the complex for augmenting the target-aware 3D molecule diffusion generation with binding interaction information. We iterate this hierarchical complex-subcomplex process with cross-hierarchy interaction node for adequately fusing global binding context between the complex and its corresponding subcomplex. Empirical studies on the CrossDocked2020 dataset show BindDM can generate molecules with more realistic 3D structures and higher binding affinities towards the protein targets, with up to -5.92 Avg. Vina Score, while maintaining proper molecular properties. Our code is available at https://github.com/YangLing0818/BindDM

artificial intelligence, machine learning, molecule, (19 more...)

arXiv.org Artificial Intelligence

2402.18583

Country:

Asia > China (0.14)
North America > United States (0.14)

Genre: Research Report (0.50)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning

Ye, Jiasheng, Zheng, Zaixiang, Bao, Yu, Qian, Lihua, Gu, Quanquan

arXiv.org Artificial IntelligenceAug-25-2023

The recent surge of generative AI has been fueled by the generative power of diffusion probabilistic models and the scalable capabilities of large language models. Despite their potential, it remains elusive whether diffusion language models can solve general language tasks comparable to their autoregressive counterparts. This paper demonstrates that scaling diffusion models w.r.t. data, sizes, and tasks can effectively make them strong language learners. We build competent diffusion language models at scale by first acquiring knowledge from massive data via masked language modeling pretraining thanks to their intrinsic connections. We then reprogram pretrained masked language models into diffusion language models via diffusive adaptation, wherein task-specific finetuning and instruction finetuning are explored to unlock their versatility in solving general language tasks. Experiments show that scaling diffusion language models consistently improves performance across downstream language tasks. We further discover that instruction finetuning can elicit zero-shot and few-shot in-context learning abilities that help tackle many unseen tasks by following natural language instructions, and show promise in advanced and challenging abilities such as reasoning.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2308.12219

Country:

North America > United States > Louisiana (0.14)
North America > United States > Arizona (0.14)

Genre: Research Report > New Finding (0.92)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)

Add feedback

Selective Knowledge Distillation for Non-Autoregressive Neural Machine Translation

Liu, Min, Bao, Yu, Zhao, Chengqi, Huang, Shujian

arXiv.org Artificial IntelligenceAug-4-2023

Benefiting from the sequence-level knowledge distillation, the Non-Autoregressive Transformer (NAT) achieves great success in neural machine translation tasks. However, existing knowledge distillation has side effects, such as propagating errors from the teacher to NAT students, which may limit further improvements of NAT models and are rarely discussed in existing research. In this paper, we introduce selective knowledge distillation by introducing an NAT evaluator to select NAT-friendly targets that are of high quality and easy to learn. In addition, we introduce a simple yet effective progressive distillation method to boost NAT performance. Experiment results on multiple WMT language directions and several representative NAT models show that our approach can realize a flexible trade-off between the quality and complexity of training data for NAT models, achieving strong performances. Further analysis shows that distilling only 5% of the raw translations can help an NAT outperform its counterpart trained on raw data by about 2.4 BLEU.

artificial intelligence, natural language, translation, (15 more...)

arXiv.org Artificial Intelligence

2303.1791

Country:

North America > United States (0.14)
Europe > Czechia (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

DINOISER: Diffused Conditional Sequence Learning by Manipulating Noises

Ye, Jiasheng, Zheng, Zaixiang, Bao, Yu, Qian, Lihua, Wang, Mingxuan

arXiv.org Artificial IntelligenceFeb-20-2023

While diffusion models have achieved great success in generating continuous signals such as images and audio, it remains elusive for diffusion models in learning discrete sequence data like natural languages. Although recent advances circumvent this challenge of discreteness by embedding discrete tokens as continuous surrogates, they still fall short of satisfactory generation quality. To understand this, we first dive deep into the denoised training protocol of diffusion-based sequence generative models and determine their three severe problems, i.e., 1) failing to learn, 2) lack of scalability, and 3) neglecting source conditions. We argue that these problems can be boiled down to the pitfall of the not completely eliminated discreteness in the embedding space, and the scale of noises is decisive herein. In this paper, we introduce DINOISER to facilitate diffusion models for sequence generation by manipulating noises. We propose to adaptively determine the range of sampled noise scales for counter-discreteness training; and encourage the proposed diffused sequence learner to leverage source conditions with amplified noise scales during inference. Experiments show that DINOISER enables consistent improvement over the baselines of previous diffusion-based sequence generative models on several conditional sequence modeling benchmarks thanks to both effective training and inference strategies. Analyses further verify that DINOISER can make better use of source conditions to govern its generative process.

machine learning, natural language, noise scale, (11 more...)

arXiv.org Artificial Intelligence

2302.10025

Country: Europe (0.67)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Generation (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.46)

Add feedback