Optimal Linear Baseline Models for Scientific Machine Learning

DeLise, Alexander, Loh, Kyle, Patel, Krish, Teague, Meredith, Arnold, Andrea, Chung, Matthias

arXiv.org Artificial Intelligence 

In nearly every scientific discipline, a central challenge lies in modeling, computing, and understanding the functional relationships between signals, measurements, and their underlying physical processes. These mappings typically manifest in three fundamental forms: forward modeling, inference, and autoencoding. While mathematical models often provide insight into these relationships, they are frequently inadequate for real-world prediction and analysis due to limitations in analytical tractability, computational feasibility, or algorithmic robustness. The advent of scientific machine learning (ML) has led to a paradigm shift, where data-driven methods, particularly neural networks, have emerged as powerful tools for learning complex input-output relations directly from data. Unlike traditional model based approaches, neural networks are capable of overcoming longstanding issues such as computational complexity and scalability issues, model misspecification, and the ill-posedness inherent to many scientific problems [1]. A central strength of neural networks is their capacity to project inputs into a lower-dimensional latent space before mapping to targets, a principle commonly realized in autoencoder and encoder-decoder architectures.