AITopics | van Dijk, David

Collaborating Authors

van Dijk, David

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Non-Markovian Discrete Diffusion with Causal Language Models

Zhang, Yangtian, He, Sizhuang, Levine, Daniel, Zhao, Lawrence, Zhang, David, Rizvi, Syed A, Zappala, Emanuele, Ying, Rex, van Dijk, David

arXiv.org Artificial IntelligenceFeb-13-2025

Discrete diffusion models have emerged as a flexible and controllable paradigm for structured sequence modeling, yet they still lag behind causal language models in expressiveness. To bridge the gap between two paradigms, we introduce CaDDi, a causal discrete diffusion model that unifies sequential and temporal modeling within a non-Markovian diffusion framework. Unlike conventional diffusion models that operate step by step with no access to prior states, CaDDi integrates the temporal trajectory, enabling more expressive and controllable generation. Our approach also treats causal language models as a special case, allowing seamless adoption of pretrained large language models (LLMs) for discrete diffusion without the need for architectural modifications. Empirically, we demonstrate that CaDDi outperforms state-of-the-art discrete diffusion models on both natural language and biological sequence tasks, narrowing the gap between diffusion-based methods and large-scale autoregressive transformers.

large language model, natural language, non-markovian discrete diffusion, (2 more...)

arXiv.org Artificial Intelligence

2502.09767

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.53)

Add feedback

COAST: Intelligent Time-Adaptive Neural Operators

Wu, Zhikai, Zhang, Shiyang, He, Sizhuang, Wang, Sifan, Zhu, Min, Jiao, Anran, Lu, Lu, van Dijk, David

arXiv.org Artificial IntelligenceFeb-12-2025

We introduce Causal Operator with Adaptive Solver Transformer (COAST), a novel neural operator learning method that leverages a causal language model (CLM) framework to dynamically adapt time steps. Our method predicts both the evolution of a system and its optimal time step, intelligently balancing computational efficiency and accuracy. We find that COAST generates variable step sizes that correlate with the underlying system intrinsicities, both within and across dynamical systems. Within a single trajectory, smaller steps are taken in regions of high complexity, while larger steps are employed in simpler regions. Across different systems, more complex dynamics receive more granular time steps. Benchmarked on diverse systems with varied dynamics, COAST consistently outperforms state-of-the-art methods, achieving superior performance in both efficiency and accuracy. This work underscores the potential of CLM-based intelligent adaptive solvers for scalable operator learning of dynamical systems.

coast, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2502.08574

Country: North America > United States (0.14)

Genre: Research Report (1.00)

Industry: Energy > Oil & Gas > Upstream (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.89)

Add feedback

Intelligence at the Edge of Chaos

Zhang, Shiyang, Patel, Aakash, Rizvi, Syed A, Liu, Nianchen, He, Sizhuang, Karbasi, Amin, Zappala, Emanuele, van Dijk, David

arXiv.org Artificial IntelligenceOct-8-2024

We explore the emergence of intelligent behavior in artificial systems by investigating how the complexity of rule-based systems influences the capabilities of models trained to predict these rules. Our study focuses on elementary cellular automata (ECA), simple yet powerful one-dimensional systems that generate behaviors ranging from trivial to highly complex. By training distinct Large Language Models (LLMs) on different ECAs, we evaluated the relationship between the complexity of the rules' behavior and the intelligence exhibited by the LLMs, as reflected in their performance on downstream tasks. Our findings reveal that rules with higher complexity lead to models exhibiting greater intelligence, as demonstrated by their performance on reasoning and chess move prediction tasks. Both uniform and periodic systems, and often also highly chaotic systems, resulted in poorer downstream performance, highlighting a sweet spot of complexity conducive to intelligence. We conjecture that intelligence arises from the ability to predict complexity and that creating intelligence may require only exposure to complexity.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2410.02536

Country: North America > United States (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.46)
Education (0.46)
Leisure & Entertainment > Games > Chess (0.36)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

CaLMFlow: Volterra Flow Matching using Causal Language Models

He, Sizhuang, Levine, Daniel, Vrkic, Ivan, Bressana, Marco Francesco, Zhang, David, Rizvi, Syed Asad, Zhang, Yangtian, Zappala, Emanuele, van Dijk, David

arXiv.org Artificial IntelligenceOct-3-2024

We introduce CaLMFlow (Causal Language Models for Flow Matching), a novel framework that casts flow matching as a Volterra integral equation (VIE), leveraging the power of large language models (LLMs) for continuous data generation. CaLMFlow enables the direct application of LLMs to learn complex flows by formulating flow matching as a sequence modeling task, bridging discrete language modeling and continuous generative modeling. Our method implements tokenization across space and time, thereby solving a VIE over these domains. This approach enables efficient handling of high-dimensional data and outperforms ODE solver-dependent methods like conditional flow matching (CFM). We demonstrate CaLMFlow's effectiveness on synthetic and real-world data, including single-cell perturbation response prediction, showcasing its ability to incorporate textual context and generalize to unseen conditions. Our results highlight LLM-driven flow matching as a promising paradigm in generative modeling, offering improved scalability, flexibility, and context-awareness. Recent advances in deep learning have revolutionized generative modeling for complex, highdimensional data. In particular, methods based on ordinary differential equations (ODEs), such as continuous normalizing flows (CNFs) (Chen et al., 2018) and flow matching (Lipman et al., 2022), have emerged as efficient tools for modeling continuous data distributions. However, many ODE systems suffer from stiffness making them numerically unstable and computationally expensive to solve accurately (Kushnir & Rokhlin, 2012; Zappala et al., 2024). Recent work in operator learning (Xiong et al., 2021; Cao, 2021; Zappala et al., 2024) has also connected solving integral equations with transformers, the foundational architecture of large language models (LLMs), inspiring the use of LLMs to model dynamical systems through the lens of IEs.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2410.05292

Country: Europe (0.94)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

AMPNet: Attention as Message Passing for Graph Neural Networks

Rizvi, Syed Asad, Nguyen, Nhi, Lyu, Haoran, Christensen, Benjamin, Caro, Josue Ortega, Fonseca, Antonio H. O., Zappala, Emanuele, Bagherian, Maryam, Averill, Christopher, Abdallah, Chadi G., Ying, Rex, Brbic, Maria, Dhodapkar, Rahul Madhav, van Dijk, David

arXiv.org Artificial IntelligenceOct-6-2023

Graph Neural Networks (GNNs) have emerged as a powerful representation learning framework for graph-structured data. A key limitation of conventional GNNs is their representation of each node with a singular feature vector, potentially overlooking intricate details about individual node features. Here, we propose an Attention-based Message-Passing layer for GNNs (AMPNet) that encodes individual features per node and models feature-level interactions through cross-node attention during message-passing steps. We demonstrate the abilities of AMPNet through extensive benchmarking on real-world biological systems such as fMRI brain activity recordings and spatial genomic data, improving over existing baselines by 20% on fMRI signal reconstruction, and further improving another 8% with positional embedding added. Finally, we validate the ability of AMPNet to uncover meaningful feature-level interactions through case studies on biological systems. We anticipate that our architecture will be highly applicable to graph-structured data where node entities encompass rich feature-level information.

ampnet, artificial intelligence, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2210.09475

Country:

Europe (0.46)
North America > United States > Massachusetts (0.14)
North America > United States > California (0.14)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Health Care Technology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Operator Learning Meets Numerical Analysis: Improving Neural Networks through Iterative Methods

Zappala, Emanuele, Levine, Daniel, He, Sizhuang, Rizvi, Syed, Levy, Sacha, van Dijk, David

arXiv.org Artificial IntelligenceOct-2-2023

Deep neural networks have become essential tools in domains such as computer vision, natural language processing, and physical system simulations, consistently delivering impressive empirical results. However, a deeper theoretical understanding of these networks remains an open challenge. This study seeks to bridge this gap by examining the connections between deep learning and classical numerical analysis. By interpreting neural networks as operators that transform input functions to output functions, discretized on some grid, we establish parallels with numerical methods designed for operator equations. This approach facilitates a new iterative learning framework for neural networks, inspired by established techniques like the Picard iteration. Our findings indicate that certain prominent architectures, including diffusion models, AlphaFold, and Graph Neural Networks (GNNs), inherently utilize iterative operator learning (see Figure 1). Empirical evaluations show that adopting a more explicit iterative approach in these models can enhance performance. Building on this, we introduce the Picard Iterative Graph Neural Network (PIGN), an iterative GNN model, demonstrating its effectiveness in node classification tasks.

artificial intelligence, iteration, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2310.01618

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Michigan (0.14)

Genre: Research Report > New Finding (0.66)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Continuous Spatiotemporal Transformers

Fonseca, Antonio H. de O., Zappala, Emanuele, Caro, Josue Ortega, van Dijk, David

arXiv.org Artificial IntelligenceJul-28-2023

Modeling spatiotemporal dynamical systems is a fundamental challenge in machine learning. Transformer models have been very successful in NLP and computer vision where they provide interpretable representations of data. However, a limitation of transformers in modeling continuous dynamical systems is that they are fundamentally discrete time and space models and thus have no guarantees regarding continuous sampling. To address this challenge, we present the Continuous Spatiotemporal Transformer (CST), a new transformer architecture that is designed for the modeling of continuous systems. This new framework guarantees a continuous and smooth output via optimization in Sobolev space. We benchmark CST against traditional transformers as well as other spatiotemporal dynamics modeling methods and achieve superior performance in a number of tasks on synthetic and real systems, including learning brain dynamics from calcium imaging data.

artificial intelligence, machine learning, transformer, (18 more...)

arXiv.org Artificial Intelligence

2301.13338

Country: North America > United States > Hawaii (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Neural Integral Equations

Zappala, Emanuele, Fonseca, Antonio Henrique de Oliveira, Caro, Josue Ortega, van Dijk, David

arXiv.org Artificial IntelligenceMay-18-2023

Integral equations (IEs) are equations that model spatiotemporal systems with non-local interactions. They have found important applications throughout theoretical and applied sciences, including in physics, chemistry, biology, and engineering. While efficient algorithms exist for solving given IEs, no method exists that can learn an IE and its associated dynamics from data alone. In this paper, we introduce Neural Integral Equations (NIE), a method that learns an unknown integral operator from data through an IE solver. We also introduce Attentional Neural Integral Equations (ANIE), where the integral is replaced by self-attention, which improves scalability, capacity, and results in an interpretable model. We demonstrate that (A)NIE outperforms other methods in both speed and accuracy on several benchmark tasks in ODE, PDE, and IE systems of synthetic and real-world data.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2209.1519

Country: Asia > Middle East > Israel (0.14)

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Mathematics of Computing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)

Add feedback

Neural Integro-Differential Equations

Zappala, Emanuele, Fonseca, Antonio Henrique de Oliveira, Moberly, Andrew Henry, Higley, Michael James, Abdallah, Chadi, Cardin, Jessica, van Dijk, David

arXiv.org Artificial IntelligenceNov-29-2022

Modeling continuous dynamical systems from discretely sampled observations is a fundamental problem in data science. Often, such dynamics are the result of non-local processes that present an integral over time. As such, these systems are modeled with Integro-Differential Equations (IDEs); generalizations of differential equations that comprise both an integral and a differential component. For example, brain dynamics are not accurately modeled by differential equations since their behavior is non-Markovian, i.e. dynamics are in part dictated by history. Here, we introduce the Neural IDE (NIDE), a novel deep learning framework based on the theory of IDEs where integral operators are learned using neural networks. We test NIDE on several toy and brain activity datasets and demonstrate that NIDE outperforms other models. These tasks include time extrapolation as well as predicting dynamics from unseen initial conditions, which we test on whole-cortex activity recordings in freely behaving mice. Further, we show that NIDE can decompose dynamics into their Markovian and non-Markovian constituents via the learned integral operator, which we test on fMRI brain activity recordings of people on ketamine. Finally, the integrand of the integral operator provides a latent space that gives insight into the underlying dynamics, which we demonstrate on wide-field brain imaging recordings. Altogether, NIDE is a novel approach that enables modeling of complex non-local dynamics with neural networks.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2206.14282

Country: North America > United States (0.47)

Genre: Research Report > Promising Solution (0.34)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Health Care Technology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Permutation invariant networks to learn Wasserstein metrics

Sehanobish, Arijit, Ravindra, Neal, van Dijk, David

arXiv.org Machine LearningOct-16-2020

Understanding the space of probability measures on a metric space equipped with a Wasserstein distance is one of the fundamental questions in mathematical analysis. The Wasserstein metric has received a lot of attention in the machine learning community especially for its principled way of comparing distributions. In this work, we use a permutation invariant network to map samples from probability measures into a low-dimensional space such that the Euclidean distance between the encoded samples reflects the Wasserstein distance between probability measures. We show that our network can generalize to correctly compute distances between unseen densities. We also show that these networks can learn the first and the second moments of probability distributions.

artificial intelligence, neural network, wasserstein distance, (18 more...)

arXiv.org Machine Learning

2010.0582

Country:

North America > United States (0.14)
North America > Canada (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.35)

Add feedback