AITopics | Yuan, Yang

Collaborating Authors

Yuan, Yang

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Transformer-Enhanced Variational Autoencoder for Crystal Structure Prediction

Chen, Ziyi, Yuan, Yang, Zheng, Siming, Guo, Jialong, Liang, Sihan, Wang, Yangang, Wang, Zongguo

arXiv.org Artificial IntelligenceFeb-13-2025

Crystal structure forms the foundation for understanding the physical and chemical properties of materials. Generative models have emerged as a new paradigm in crystal structure prediction(CSP), however, accurately capturing key characteristics of crystal structures, such as periodicity and symmetry, remains a significant challenge. In this paper, we propose a Transformer-Enhanced Variational Autoencoder for Crystal Structure Prediction (TransVAE-CSP), who learns the characteristic distribution space of stable materials, enabling both the reconstruction and generation of crystal structures. TransVAE-CSP integrates adaptive distance expansion with irreducible representation to effectively capture the periodicity and symmetry of crystal structures, and the encoder is a transformer network based on an equivariant dot product attention mechanism. Experimental results on the carbon_24, perov_5, and mp_20 datasets demonstrate that TransVAE-CSP outperforms existing methods in structure reconstruction and generation tasks under various modeling metrics, offering a powerful tool for crystal structure design and optimization.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2502.09423

Country: Asia > China (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

CrySPAI: A new Crystal Structure Prediction Software Based on Artificial Intelligence

Wang, Zongguo, Chen, Ziyi, Yuan, Yang, Wang, Yangang

arXiv.org Artificial IntelligenceJan-27-2025

Crystal structure predictions based on the combination of first-principles calculations and machine learning have achieved significant success in materials science. However, most of these approaches are limited to predicting specific systems, which hinders their application to unknown or unexplored domains. In this paper, we present CrySPAI, a crystal structure prediction package developed using artificial intelligence (AI) to predict energetically stable crystal structures of inorganic materials given their chemical compositions. The software consists of three key modules, an evolutionary optimization algorithm (EOA) that searches for all possible crystal structure configurations, density functional theory (DFT) that provides the accurate energy values for these structures, and a deep neural network (DNN) that learns the relationship between crystal structures and their corresponding energies. To optimize the process across these modules, a distributed framework is implemented to parallelize tasks, and an automated workflow has been integrated into CrySPAI for seamless execution. This paper reports the development and implementation of AI AI-based CrySPAI Crystal Prediction Software tool and its unique features.

artificial intelligence, evolutionary algorithm, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2501.15838

Country: Asia > China (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)

Add feedback

Tensor Product Attention Is All You Need

Zhang, Yifan, Liu, Yifeng, Yuan, Huizhuo, Qin, Zhen, Yuan, Yang, Gu, Quanquan, Yao, Andrew Chi-Chih

arXiv.org Artificial IntelligenceJan-10-2025

Scaling language models to handle longer input sequences typically necessitates large key-value (KV) caches, resulting in substantial memory overhead during inference. In this paper, we propose Tensor Product Attention (TPA), a novel attention mechanism that uses tensor decompositions to represent queries, keys, and values compactly, significantly shrinking KV cache size at inference time. By factorizing these representations into contextual low-rank components (contextual factorization) and seamlessly integrating with RoPE, TPA achieves improved model quality alongside memory efficiency. Based on TPA, we introduce the Tensor ProducT ATTenTion Transformer (T6), a new model architecture for sequence modeling. Through extensive empirical evaluation of language modeling tasks, we demonstrate that T6 exceeds the performance of standard Transformer baselines including MHA, MQA, GQA, and MLA across various metrics, including perplexity and a range of renowned evaluation benchmarks. Notably, TPA's memory efficiency enables the processing of significantly longer sequences under fixed resource constraints, addressing a critical scalability challenge in modern language models. The code is available at https://github.com/tensorgi/T6.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2501.06425

Country:

Asia (0.93)
Europe > Austria > Vienna (0.15)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California (0.14)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)

Add feedback

CatCode: A Comprehensive Evaluation Framework for LLMs On the Mixture of Code and Text

Lin, Zhenru, Yao, Yiqun, Yuan, Yang

arXiv.org Artificial IntelligenceMar-4-2024

Large language models (LLMs) such as ChatGPT are increasingly proficient in understanding and generating a mixture of code and text. Evaluation based on such $\textit{mixture}$ can lead to a more comprehensive understanding of the models' abilities in solving coding problems. However, in this context, current evaluation methods are either limited in task coverage or lack standardization. To address this issue, we propose using category theory as a framework for evaluation. Specifically, morphisms within a code category can represent code debugging and transformation, functors between two categories represent code translation, and functors between a code category and a natural language category represent code generation, explanation, and reproduction. We present an automatic evaluation framework called $\textbf{CatCode}$ ($\textbf{Cat}$egory $\textbf{Code}$) that can comprehensively assess the coding abilities of LLMs, including ChatGPT, Text-Davinci, and CodeGeeX.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2403.01784

Country:

North America > United States (0.28)
Asia (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Add feedback

AutoMathText: Autonomous Data Selection with Language Models for Mathematical Texts

Zhang, Yifan, Luo, Yifan, Yuan, Yang, Yao, Andrew Chi-Chih

arXiv.org Artificial IntelligenceFeb-12-2024

To improve language models' proficiency in mathematical reasoning via continual pretraining, we introduce a novel strategy that leverages base language models for autonomous data selection. Departing from conventional supervised fine-tuning or trained classifiers with human-annotated data, our approach utilizes meta-prompted language models as zero-shot verifiers to autonomously evaluate and select high-quality mathematical content, and we release the curated open-source AutoMathText dataset encompassing over 200GB of data. To demonstrate the efficacy of our method, we continuously pretrained a 7B-parameter Mistral language model on the AutoMathText dataset, achieving substantial improvements in downstream performance on the MATH dataset with a token amount reduced by orders of magnitude compared to previous continuous pretraining works. Our method showcases a 2 times increase in pretraining token efficiency compared to baselines, underscoring the potential of our approach in enhancing models' mathematical reasoning capabilities. The AutoMathText dataset is available at https://huggingface.co/datasets/math-ai/AutoMathText. The code is available at https://github.com/yifanzhang-pro/AutoMathText.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2402.07625

Genre: Research Report (0.64)

Industry: Education (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Matrix Information Theory for Self-Supervised Learning

Zhang, Yifan, Tan, Zhiquan, Yang, Jingqin, Huang, Weiran, Yuan, Yang

arXiv.org Artificial IntelligenceFeb-6-2024

The maximum entropy encoding framework provides a unified perspective for many non-contrastive learning methods like SimSiam, Barlow Twins, and MEC. Inspired by this framework, we introduce Matrix-SSL, a novel approach that leverages matrix information theory to interpret the maximum entropy encoding loss as matrix uniformity loss. Furthermore, Matrix-SSL enhances the maximum entropy encoding method by seamlessly incorporating matrix alignment loss, directly aligning covariance matrices in different branches. Experimental results reveal that Matrix-SSL outperforms state-of-the-art methods on the ImageNet dataset under linear evaluation settings and on MS-COCO for transfer learning tasks. Specifically, when performing transfer learning tasks on MS-COCO, our method outperforms previous SOTA methods such as MoCo v2 and BYOL up to 3.3% with only 400 epochs compared to 800 epochs pre-training. We also try to introduce representation learning into the language modeling regime, achieving 72.3% on the GSM8K dataset by fine-tuning a 7B model using matrix cross-entropy loss, with a margin of 3.1% over the standard cross-entropy loss. Code available at https://github.com/yifanzhang-pro/Matrix-SSL.

artificial intelligence, machine learning, matrix, (17 more...)

arXiv.org Artificial Intelligence

2305.17326

Country: Asia > China (0.14)

Genre:

Research Report > Promising Solution (0.54)
Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Meta Prompting for AGI Systems

Zhang, Yifan, Yuan, Yang, Yao, Andrew Chi-Chih

arXiv.org Artificial IntelligenceJan-31-2024

This paper presents a comprehensive study of Meta Prompting, an innovative technique reshaping the utilization of large language models (LLMs), multi-modal foundation models, and AI systems in problem-solving and data interaction. Grounded in type theory and category theory, Meta Prompting emphasizes the structure and syntax of information over traditional content-centric methods. The paper explores the formal definitions of Meta Prompting (MP), sets it apart from Few-Shot Prompting, and underlines its effectiveness in various AI applications. A key focus is applying Meta Prompting for complex reasoning (MP-CR) tasks, showing how it effectively deconstructs intricate problems into simpler sub-problems, enhancing token efficiency, and enabling more equitable problem-solving comparisons, especially against few-shot prompting methods. Additionally, the paper introduces Meta Prompting for prompting tasks, allowing LLMs to self-generate new prompts in a recursive, metaprogramming-like manner. This approach marks a significant leap in AI's autonomous and adaptive capabilities. The paper also introduces the integration of Meta Prompting into multi-modal foundation model settings, tackling the challenges and opportunities of incorporating varied data types such as images, audio, and video within the structured Meta Prompting framework. Empirical experiments, including solving the Game of 24 tasks with 100% success rate, demonstrate the MP-CR Agent's enhanced reasoning capabilities, achieving high accuracy and efficiency, and showcasing Meta Prompting's transformative impact on AI problem-solving. (The code is available at https://github.com/meta-prompting/meta-prompting)

artificial intelligence, large language model, natural language, (14 more...)

arXiv.org Artificial Intelligence

2311.11482

Country:

North America > United States (0.14)
Europe (0.14)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)

Add feedback

Cumulative Reasoning with Large Language Models

Zhang, Yifan, Yang, Jingqin, Yuan, Yang, Yao, Andrew Chi-Chih

arXiv.org Artificial IntelligenceDec-1-2023

While language models are powerful and versatile, they often fail to address highly complex problems. This is because solving complex problems requires deliberate thinking, which has been only minimally guided during training. In this paper, we propose a new method called Cumulative Reasoning (CR), which employs language models in a cumulative and iterative manner to emulate human thought processes. By decomposing tasks into smaller components, CR streamlines the problem-solving process, rendering it both more manageable and effective. For logical inference tasks, CR consistently outperforms existing methods with an improvement up to 9.3%, and achieves an accuracy of 98.04% on the curated FOLIO wiki dataset. In the context of the Game of 24, CR achieves an accuracy of 98%, which signifies a substantial enhancement of 24% over the previous state-of-the-art method. Finally, on the MATH dataset, we establish new state-of-the-art results with 58.0% overall accuracy, surpassing the previous best approach by a margin of 4.2%, and achieving 43% relative improvement on the hardest level 5 problems (22.4% to 32.1%). Additionally, we expand the concept of Cumulative Reasoning to incorporate a Python code environment, deliberately omitting external aids such as retrieval and web browsing and focusing solely on the LLM's intrinsic reasoning capabilities within a Python code environment. Our experiments in this setting yielded impressive results, with an overall accuracy of 72.2% on the MATH dataset, significantly outperforming the PAL method with 38.8% relative improvement. Code is available at https://github.com/iiis-ai/cumulative-reasoning.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2308.04371

Country:

Europe (0.68)
North America > United States > New York (0.15)
North America > United States > Ohio (0.14)
Asia > Middle East > Iraq (0.14)

Genre: Research Report (1.00)

Industry:

Media (1.00)
Leisure & Entertainment (1.00)
Health & Medicine > Therapeutic Area > Oncology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)

Add feedback

Trade-off Between Efficiency and Consistency for Removal-based Explanations

Zhang, Yifan, He, Haowei, Tan, Zhiquan, Yuan, Yang

arXiv.org Machine LearningOct-20-2023

In the current landscape of explanation methodologies, most predominant approaches, such as SHAP and LIME, employ removal-based techniques to evaluate the impact of individual features by simulating various scenarios with specific features omitted. Nonetheless, these methods primarily emphasize efficiency in the original context, often resulting in general inconsistencies. In this paper, we demonstrate that such inconsistency is an inherent aspect of these approaches by establishing the Impossible Trinity Theorem, which posits that interpretability, efficiency, and consistency cannot hold simultaneously. Recognizing that the attainment of an ideal explanation remains elusive, we propose the utilization of interpretation error as a metric to gauge inefficiencies and inconsistencies. To this end, we present two novel algorithms founded on the standard polynomial basis, aimed at minimizing interpretation error. Our empirical findings indicate that the proposed methods achieve a substantial reduction in interpretation error, up to 31.8 times lower when compared to alternative techniques. Code is available at https://github.com/trusty-ai/efficient-consistent-explanations.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Machine Learning

2210.17426

Country:

Europe (0.28)
Asia > China (0.14)
North America > United States (0.14)

Genre: Research Report (0.81)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(4 more...)

Add feedback

MatChat: A Large Language Model and Application Service Platform for Materials Science

Chen, Ziyi, Xie, Fankai, Wan, Meng, Yuan, Yang, Liu, Miao, Wang, Zongguo, Meng, Sheng, Wang, Yangang

arXiv.org Artificial IntelligenceOct-11-2023

The prediction of chemical synthesis pathways plays a pivotal role in materials science research. Challenges, such as the complexity of synthesis pathways and the lack of comprehensive datasets, currently hinder our ability to predict these chemical processes accurately. However, recent advancements in generative artificial intelligence (GAI), including automated text generation and question-answering systems, coupled with fine-tuning techniques, have facilitated the deployment of large-scale AI models tailored to specific domains. In this study, we harness the power of the LLaMA2-7B model and enhance it through a learning process that incorporates 13,878 pieces of structured material knowledge data. This specialized AI model, named MatChat, focuses on predicting inorganic material synthesis pathways. MatChat exhibits remarkable proficiency in generating and reasoning with knowledge in materials science. Although MatChat requires further refinement to meet the diverse material design needs, this research undeniably highlights its impressive reasoning capabilities and innovative potential in the field of materials science. MatChat is now accessible online and open for use, with both the model and its application framework available as open source. This study establishes a robust foundation for collaborative innovation in the integration of generative AI in materials science.

machine learning, natural language, question answering, (6 more...)

arXiv.org Artificial Intelligence

doi: 10.1088/1674-1056/ad04cb

2310.07197

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.53)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.53)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.53)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.40)

Add feedback