AITopics | Niepert, Mathias

Collaborating Authors

Niepert, Mathias

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Adaptive Physics-informed Neural Networks: A Survey

Torres, Edgar, Schiefer, Jonathan, Niepert, Mathias

arXiv.org Artificial IntelligenceMar-23-2025

Advances in machine learning have led to important applications in various fields, such as computer vision (enabling technologies like self-driving cars), natural language processing (powering intelligent agents and chatbots), and image generation (facilitating media creation). Motivated by this success, there has been growing interest in developing Machine Learning (ML) solutions to solve problems in science and engineering. Unlike other fields where data is abundant or easily obtained, however, science and engineering often face data scarcity due to the high costs associated with generating data through expensive experiments or simulations. Therefore, to facilitate the development of ML approaches in these disciplines, AI methods that are data-efficient and computationally efficient need to be created. To this end, other domains have tackled similar problems with techniques such as transfer learning, meta-learning, and few-shot learning, indicating significant potential for applying these techniques in the context of science and engineering. One specific application in science and engineering where these efficient ML models can be particularly beneficial is to determine the approximate solutions of PDEs. PDEs are fundamental for modeling and describing natural phenomena in various scientific and engineering domains. Traditionally, these equations are solved numerically, which can become prohibitively expensive, especially when dealing with nonlinear and high-dimensional problems [Han et al., 2018]. This challenge limits their application in areas where a fast evaluation of a PDE is required.

artificial intelligence, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2503.18181

Country:

Europe > Germany (0.46)
Europe > Austria > Vienna (0.14)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.67)
Research Report > New Finding (0.67)

Industry: Energy > Oil & Gas > Upstream (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Preference-Based Gradient Estimation for ML-Based Approximate Combinatorial Optimization

Mielke, Arman, Bauknecht, Uwe, Strauss, Thilo, Niepert, Mathias

arXiv.org Artificial IntelligenceFeb-26-2025

Combinatorial optimization (CO) problems arise in a wide range of fields from medicine to logistics and manufacturing. While exact solutions are often not necessary, many applications require finding high-quality solutions quickly. For this purpose, we propose a data-driven approach to improve existing non-learned approximation algorithms for CO. We parameterize the approximation algorithm and train a graph neural network (GNN) to predict parameter values that lead to the best possible solutions. Our pipeline is trained end-to-end in a self-supervised fashion using gradient estimation, treating the approximation algorithm as a black box. We propose a novel gradient estimation scheme for this purpose, which we call preference-based gradient estimation. Our approach combines the benefits of the neural network and the non-learned approximation algorithm: The GNN leverages the information from the dataset to allow the approximation algorithm to find better solutions, while the approximation algorithm guarantees that the solution is feasible. We validate our approach on two well-known combinatorial optimization problems, the travelling salesman problem and the minimum k-cut problem, and show that our method is competitive with state of the art learned CO solvers.

algorithm, artificial intelligence, machine learning, (12 more...)

arXiv.org Artificial Intelligence

2502.19377

Country:

Europe > Germany (0.14)
Asia > China (0.14)

Genre: Research Report (0.64)

Industry: Transportation (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Position: Graph Learning Will Lose Relevance Due To Poor Benchmarks

Bechler-Speicher, Maya, Finkelshtein, Ben, Frasca, Fabrizio, Müller, Luis, Tönshoff, Jan, Siraudin, Antoine, Zaverkin, Viktor, Bronstein, Michael M., Niepert, Mathias, Perozzi, Bryan, Galkin, Mikhail, Morris, Christopher

arXiv.org Artificial IntelligenceFeb-20-2025

While machine learning on graphs has demonstrated promise in drug design and molecular property prediction, significant benchmarking challenges hinder its further progress and relevance. Current benchmarking practices often lack focus on transformative, real-world applications, favoring narrow domains like two-dimensional molecular graphs over broader, impactful areas such as combinatorial optimization, relational databases, or chip design. Additionally, many benchmark datasets poorly represent the underlying data, leading to inadequate abstractions and misaligned use cases. Fragmented evaluations and an excessive focus on accuracy further exacerbate these issues, incentivizing overfitting rather than fostering generalizable insights. These limitations have prevented the development of truly useful graph foundation models. This position paper calls for a paradigm shift toward more meaningful benchmarks, rigorous evaluation protocols, and stronger collaboration with domain experts to drive impactful and reliable advances in graph learning research, unlocking the potential of graph learning.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2502.14546

Country:

North America > United States (0.28)
Europe > United Kingdom > England (0.14)
Asia > Middle East > Israel (0.14)

Genre: Research Report > New Finding (0.67)

Industry:

Information Technology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)

Add feedback

Symmetry-Preserving Diffusion Models via Target Symmetrization

Tong, Vinh, Ye, Yun, Hoang, Trung-Dung, Liu, Anji, Broeck, Guy Van den, Niepert, Mathias

arXiv.org Artificial IntelligenceFeb-13-2025

Diffusion models are powerful tools for capturing complex distributions, but modeling data with inherent symmetries, such as molecular structures, remains challenging. Equivariant denoisers are commonly used to address this, but they introduce architectural complexity and optimization challenges, including noisy gradients and convergence issues. We propose a novel approach that enforces equivariance through a symmetrized loss function, which applies a time-dependent weighted averaging operation over group actions to the model's prediction target. This ensures equivariance without explicit architectural constraints and reduces gradient variance, leading to more stable and efficient optimization. Our method uses Monte Carlo sampling to estimate the average, incurring minimal computational overhead. We provide theoretical guarantees of equivariance for the minimizer of our loss function and demonstrate its effectiveness on synthetic datasets and the molecular conformation generation task using the GEOM-QM9 dataset. Experiments show improved sample quality compared to existing methods, highlighting the potential of our approach to enhance the scalability and practicality of equivariant diffusion models in generative tasks.

artificial intelligence, diffusion model, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2502.0989

Country:

North America > United States > New York (0.14)
North America > United States > Maryland (0.14)
Europe > Austria > Vienna (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Tractable Transformers for Flexible Conditional Generation

Liu, Anji, Liu, Xuejie, Zhao, Dayuan, Niepert, Mathias, Liang, Yitao, Broeck, Guy Van den

arXiv.org Artificial IntelligenceFeb-11-2025

Non-autoregressive (NAR) generative models are valuable because they can handle diverse conditional generation tasks in a more principled way than their autoregressive (AR) counterparts, which are constrained by sequential dependency requirements. Recent advancements in NAR models, such as diffusion language models, have demonstrated superior performance in unconditional generation compared to AR models (e.g., GPTs) of similar sizes. However, such improvements do not always lead to improved conditional generation performance. We show that a key reason for this gap is the difficulty in generalizing to conditional probability queries unseen during training. As a result, strong unconditional generation performance does not guarantee high-quality conditional generation. This paper proposes Tractable Transformers (Tracformer), a Transformer-based generative model that is more robust to different conditional generation tasks. Unlike existing models that rely solely on global contextual features derived from full inputs, Tracformers incorporate a sparse Transformer encoder to capture both local and global contextual information. This information is routed through a decoder for conditional generation. Empirical results demonstrate that Tracformers achieve state-of-the-art conditional generation performance on text modeling compared to recent diffusion and AR model baselines.

large language model, machine learning, tracformer, (19 more...)

arXiv.org Artificial Intelligence

2502.07616

Country: North America > United States > California (0.14)

Genre: Research Report > New Finding (0.47)

Industry:

Law (0.92)
Health & Medicine > Pharmaceuticals & Biotechnology (0.67)
Government > Military (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.86)

Add feedback

On Zero-Initialized Attention: Optimal Prompt and Gating Factor Estimation

Diep, Nghiem T., Nguyen, Huy, Nguyen, Chau, Le, Minh, Nguyen, Duy M. H., Sonntag, Daniel, Niepert, Mathias, Ho, Nhat

arXiv.org Artificial IntelligenceFeb-5-2025

The LLaMA-Adapter has recently emerged as an efficient fine-tuning technique for LLaMA models, leveraging zero-initialized attention to stabilize training and enhance performance. However, despite its empirical success, the theoretical foundations of zero-initialized attention remain largely unexplored. In this paper, we provide a rigorous theoretical analysis, establishing a connection between zero-initialized attention and mixture-of-expert models. We prove that both linear and non-linear prompts, along with gating functions, can be optimally estimated, with non-linear prompts offering greater flexibility for future applications. Empirically, we validate our findings on the open LLM benchmarks, demonstrating that non-linear prompts outperform linear ones. Notably, even with limited training data, both prompt types consistently surpass vanilla attention, highlighting the robustness and adaptability of zero-initialized attention.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2502.03029

Country:

Asia > Vietnam (0.28)
Europe (0.28)
North America > United States > Texas (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Adaptive Width Neural Networks

Errica, Federico, Christiansen, Henrik, Zaverkin, Viktor, Niepert, Mathias, Alesiani, Francesco

arXiv.org Artificial IntelligenceJan-27-2025

For almost 70 years, researchers have mostly relied on hyper-parameter tuning to pick the width of neural networks' layers out of many possible choices. This paper challenges the status quo by introducing an easy-to-use technique to learn an unbounded width of a neural network's layer during training. The technique does not rely on alternate optimization nor hand-crafted gradient heuristics; rather, it jointly optimizes the width and the parameters of each layer via simple backpropagation. We apply the technique to a broad range of data domains such as tables, images, texts, and graphs, showing how the width adapts to the task's difficulty. By imposing a soft ordering of importance among neurons, it is possible to truncate the trained network at virtually zero cost, achieving a smooth trade-off between performance and compute resources in a structured way. Alternatively, one can dynamically compress the network with no performance degradation. In light of recent foundation models trained on large datasets, believed to require billions of parameters and where hyper-parameter tuning is unfeasible due to huge training costs, our approach stands as a viable alternative for width learning.

artificial intelligence, machine learning, neuron, (18 more...)

arXiv.org Artificial Intelligence

2501.15889

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

MolMix: A Simple Yet Effective Baseline for Multimodal Molecular Representation Learning

Manolache, Andrei, Tantaru, Dragos, Niepert, Mathias

arXiv.org Artificial IntelligenceOct-24-2024

In this work, we propose a simple transformer-based baseline for multimodal molecular representation learning, integrating three distinct modalities: SMILES strings, 2D graph representations, and 3D conformers of molecules. A key aspect of our approach is the aggregation of 3D conformers, allowing the model to account for the fact that molecules can adopt multiple conformations--an important factor for accurate molecular representation. The tokens for each modality are extracted using modality-specific encoders: a transformer for SMILES strings, a messagepassing neural network for 2D graphs, and an equivariant neural network for 3D conformers. The flexibility and modularity of this framework enable easy adaptation and replacement of these encoders, making the model highly versatile for different molecular tasks. The extracted tokens are then combined into a unified multimodal sequence, which is processed by a downstream transformer for prediction tasks. To efficiently scale our model for large multimodal datasets, we utilize Flash Attention 2 and bfloat16 precision. Despite its simplicity, our approach achieves state-of-the-art results across multiple datasets, demonstrating its effectiveness as a strong baseline for multimodal molecular representation learning.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2410.07981

Country: North America > United States > Minnesota (0.28)

Genre: Research Report (0.40)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

LoGra-Med: Long Context Multi-Graph Alignment for Medical Vision-Language Model

Nguyen, Duy M. H., Diep, Nghiem T., Nguyen, Trung Q., Le, Hoang-Bao, Nguyen, Tai, Nguyen, Tien, Nguyen, TrungTin, Ho, Nhat, Xie, Pengtao, Wattenhofer, Roger, Zhou, James, Sonntag, Daniel, Niepert, Mathias

arXiv.org Artificial IntelligenceOct-6-2024

State-of-the-art medical multi-modal large language models (med-MLLM), like LLaVA-Med or BioMedGPT, leverage instruction-following data in pre-training. However, those models primarily focus on scaling the model size and data volume to boost performance while mainly relying on the autoregressive learning objectives. Surprisingly, we reveal that such learning schemes might result in a weak alignment between vision and language modalities, making these models highly reliant on extensive pre-training datasets - a significant challenge in medical domains due to the expensive and time-consuming nature of curating high-quality instruction-following instances. We address this with LoGra-Med, a new multi-graph alignment algorithm that enforces triplet correlations across image modalities, conversation-based descriptions, and extended captions. This helps the model capture contextual meaning, handle linguistic variability, and build cross-modal associations between visuals and text. To scale our approach, we designed an efficient end-to-end learning scheme using black-box gradient estimation, enabling faster LLaMa 7B training. Our results show LoGra-Med matches LLAVA-Med performance on 600K image-text pairs for Medical VQA and significantly outperforms it when trained on 10% of the data. For example, on VQA-RAD, we exceed LLAVA-Med by 20.13% and nearly match the 100% pre-training score (72.52% vs. 72.64%). We also surpass SOTA methods like BiomedGPT on visual chatbots and RadFM on zero-shot image classification with VQA, highlighting the effectiveness of multi-graph alignment.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2410.02615

Country:

North America > United States > Texas (0.14)
North America > United States > California (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Therapeutic Area > Hematology (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Discrete Copula Diffusion

Liu, Anji, Broadrick, Oliver, Niepert, Mathias, Broeck, Guy Van den

arXiv.org Artificial IntelligenceOct-2-2024

Discrete diffusion models have recently shown significant progress in modeling complex data, such as natural languages and DNA sequences. However, unlike diffusion models for continuous data, which can generate high-quality samples in just a few denoising steps, modern discrete diffusion models still require hundreds or even thousands of denoising steps to perform well. In this paper, we identify a fundamental limitation that prevents discrete diffusion models from achieving strong performance with fewer steps -- they fail to capture dependencies between output variables at each denoising step. To address this issue, we provide a formal explanation and introduce a general approach to supplement the missing dependency information by incorporating another deep generative model, termed the copula model. Our method does not require fine-tuning either the diffusion model or the copula model, yet it enables high-quality sample generation with significantly fewer denoising steps. When we apply this approach to autoregressive copula models, the combined model outperforms both models individually in unconditional and conditional text generation. Specifically, the hybrid model achieves better (un)conditional text generation using 8 to 32 times fewer denoising steps than the diffusion model alone. In addition to presenting an effective discrete diffusion generation algorithm, this paper emphasizes the importance of modeling inter-variable dependencies in discrete diffusion.

diffusion model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2410.01949

Country:

Europe (0.93)
North America > United States > California (0.14)

Genre: Research Report (0.64)

Industry:

Leisure & Entertainment > Sports (1.00)
Government (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)

Add feedback