AITopics | He, Sizhuang

Collaborating Authors

He, Sizhuang

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Non-Markovian Discrete Diffusion with Causal Language Models

Zhang, Yangtian, He, Sizhuang, Levine, Daniel, Zhao, Lawrence, Zhang, David, Rizvi, Syed A, Zappala, Emanuele, Ying, Rex, van Dijk, David

arXiv.org Artificial IntelligenceFeb-13-2025

Discrete diffusion models have emerged as a flexible and controllable paradigm for structured sequence modeling, yet they still lag behind causal language models in expressiveness. To bridge the gap between two paradigms, we introduce CaDDi, a causal discrete diffusion model that unifies sequential and temporal modeling within a non-Markovian diffusion framework. Unlike conventional diffusion models that operate step by step with no access to prior states, CaDDi integrates the temporal trajectory, enabling more expressive and controllable generation. Our approach also treats causal language models as a special case, allowing seamless adoption of pretrained large language models (LLMs) for discrete diffusion without the need for architectural modifications. Empirically, we demonstrate that CaDDi outperforms state-of-the-art discrete diffusion models on both natural language and biological sequence tasks, narrowing the gap between diffusion-based methods and large-scale autoregressive transformers.

large language model, natural language, non-markovian discrete diffusion, (2 more...)

arXiv.org Artificial Intelligence

2502.09767

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.53)

Add feedback

COAST: Intelligent Time-Adaptive Neural Operators

Wu, Zhikai, Zhang, Shiyang, He, Sizhuang, Wang, Sifan, Zhu, Min, Jiao, Anran, Lu, Lu, van Dijk, David

arXiv.org Artificial IntelligenceFeb-12-2025

We introduce Causal Operator with Adaptive Solver Transformer (COAST), a novel neural operator learning method that leverages a causal language model (CLM) framework to dynamically adapt time steps. Our method predicts both the evolution of a system and its optimal time step, intelligently balancing computational efficiency and accuracy. We find that COAST generates variable step sizes that correlate with the underlying system intrinsicities, both within and across dynamical systems. Within a single trajectory, smaller steps are taken in regions of high complexity, while larger steps are employed in simpler regions. Across different systems, more complex dynamics receive more granular time steps. Benchmarked on diverse systems with varied dynamics, COAST consistently outperforms state-of-the-art methods, achieving superior performance in both efficiency and accuracy. This work underscores the potential of CLM-based intelligent adaptive solvers for scalable operator learning of dynamical systems.

coast, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2502.08574

Country: North America > United States (0.14)

Genre: Research Report (1.00)

Industry: Energy > Oil & Gas > Upstream (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.89)

Add feedback

Intelligence at the Edge of Chaos

Zhang, Shiyang, Patel, Aakash, Rizvi, Syed A, Liu, Nianchen, He, Sizhuang, Karbasi, Amin, Zappala, Emanuele, van Dijk, David

arXiv.org Artificial IntelligenceOct-8-2024

We explore the emergence of intelligent behavior in artificial systems by investigating how the complexity of rule-based systems influences the capabilities of models trained to predict these rules. Our study focuses on elementary cellular automata (ECA), simple yet powerful one-dimensional systems that generate behaviors ranging from trivial to highly complex. By training distinct Large Language Models (LLMs) on different ECAs, we evaluated the relationship between the complexity of the rules' behavior and the intelligence exhibited by the LLMs, as reflected in their performance on downstream tasks. Our findings reveal that rules with higher complexity lead to models exhibiting greater intelligence, as demonstrated by their performance on reasoning and chess move prediction tasks. Both uniform and periodic systems, and often also highly chaotic systems, resulted in poorer downstream performance, highlighting a sweet spot of complexity conducive to intelligence. We conjecture that intelligence arises from the ability to predict complexity and that creating intelligence may require only exposure to complexity.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2410.02536

Country: North America > United States (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.46)
Education (0.46)
Leisure & Entertainment > Games > Chess (0.36)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

CaLMFlow: Volterra Flow Matching using Causal Language Models

He, Sizhuang, Levine, Daniel, Vrkic, Ivan, Bressana, Marco Francesco, Zhang, David, Rizvi, Syed Asad, Zhang, Yangtian, Zappala, Emanuele, van Dijk, David

arXiv.org Artificial IntelligenceOct-3-2024

We introduce CaLMFlow (Causal Language Models for Flow Matching), a novel framework that casts flow matching as a Volterra integral equation (VIE), leveraging the power of large language models (LLMs) for continuous data generation. CaLMFlow enables the direct application of LLMs to learn complex flows by formulating flow matching as a sequence modeling task, bridging discrete language modeling and continuous generative modeling. Our method implements tokenization across space and time, thereby solving a VIE over these domains. This approach enables efficient handling of high-dimensional data and outperforms ODE solver-dependent methods like conditional flow matching (CFM). We demonstrate CaLMFlow's effectiveness on synthetic and real-world data, including single-cell perturbation response prediction, showcasing its ability to incorporate textual context and generalize to unseen conditions. Our results highlight LLM-driven flow matching as a promising paradigm in generative modeling, offering improved scalability, flexibility, and context-awareness. Recent advances in deep learning have revolutionized generative modeling for complex, highdimensional data. In particular, methods based on ordinary differential equations (ODEs), such as continuous normalizing flows (CNFs) (Chen et al., 2018) and flow matching (Lipman et al., 2022), have emerged as efficient tools for modeling continuous data distributions. However, many ODE systems suffer from stiffness making them numerically unstable and computationally expensive to solve accurately (Kushnir & Rokhlin, 2012; Zappala et al., 2024). Recent work in operator learning (Xiong et al., 2021; Cao, 2021; Zappala et al., 2024) has also connected solving integral equations with transformers, the foundational architecture of large language models (LLMs), inspiring the use of LLMs to model dynamical systems through the lens of IEs.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2410.05292

Country: Europe (0.94)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Operator Learning Meets Numerical Analysis: Improving Neural Networks through Iterative Methods

Zappala, Emanuele, Levine, Daniel, He, Sizhuang, Rizvi, Syed, Levy, Sacha, van Dijk, David

arXiv.org Artificial IntelligenceOct-2-2023

Deep neural networks have become essential tools in domains such as computer vision, natural language processing, and physical system simulations, consistently delivering impressive empirical results. However, a deeper theoretical understanding of these networks remains an open challenge. This study seeks to bridge this gap by examining the connections between deep learning and classical numerical analysis. By interpreting neural networks as operators that transform input functions to output functions, discretized on some grid, we establish parallels with numerical methods designed for operator equations. This approach facilitates a new iterative learning framework for neural networks, inspired by established techniques like the Picard iteration. Our findings indicate that certain prominent architectures, including diffusion models, AlphaFold, and Graph Neural Networks (GNNs), inherently utilize iterative operator learning (see Figure 1). Empirical evaluations show that adopting a more explicit iterative approach in these models can enhance performance. Building on this, we introduce the Picard Iterative Graph Neural Network (PIGN), an iterative GNN model, demonstrating its effectiveness in node classification tasks.

artificial intelligence, iteration, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2310.01618

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Michigan (0.14)

Genre: Research Report > New Finding (0.66)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback