AITopics | Harrington, Peter

Collaborating Authors

Harrington, Peter

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Hierarchical Conditional Multi-Task Learning for Streamflow Modeling

Xu, Shaoming, Renganathan, Arvind, Khandelwal, Ankush, Ghosh, Rahul, Li, Xiang, Liu, Licheng, Tayal, Kshitij, Harrington, Peter, Jia, Xiaowei, Jin, Zhenong, Nieber, Jonh, Kumar, Vipin

arXiv.org Artificial IntelligenceOct-17-2024

Streamflow, vital for water resource management, is governed by complex hydrological systems involving intermediate processes driven by meteorological forces. While deep learning models have achieved state-of-the-art results of streamflow prediction, their end-to-end single-task learning approach often fails to capture the causal relationships within these systems. To address this, we propose Hierarchical Conditional Multi-Task Learning (HCMTL), a hierarchical approach that jointly models soil water and snowpack processes based on their causal connections to streamflow. HCMTL utilizes task embeddings to connect network modules, enhancing flexibility and expressiveness while capturing unobserved processes beyond soil water and snowpack. It also incorporates the Conditional Mini-Batch strategy to improve long time series modeling. We compare HCMTL with five baselines on a global dataset. HCMTL's superior performance across hundreds of drainage basins over extended periods shows that integrating domain-specific causal knowledge into deep learning enhances both prediction accuracy and interpretability. This is essential for advancing our understanding of complex hydrological systems and supporting efficient water resource management to mitigate natural disasters like droughts and floods.

artificial intelligence, basin, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2410.14137

Country: North America > United States (0.95)

Genre: Research Report > New Finding (0.46)

Industry: Energy (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Comprehensive Performance Modeling and System Design Insights for Foundation Models

Subramanian, Shashank, Rrapaj, Ermal, Harrington, Peter, Chheda, Smeet, Farrell, Steven, Austin, Brian, Williams, Samuel, Wright, Nicholas, Bhimji, Wahid

arXiv.org Artificial IntelligenceSep-30-2024

Generative AI, in particular large transformer models, are increasingly driving HPC system design in science and industry. We analyze performance characteristics of such transformer models and discuss their sensitivity to the transformer type, parallelization strategy, and HPC system features (accelerators and interconnects). We utilize a performance model that allows us to explore this complex design space and highlight its key components. We find that different transformer types demand different parallelism and system characteristics at different training regimes. Large Language Models are performant with 3D parallelism and amplify network needs only at pre-training scales with reduced dependence on accelerator capacity and bandwidth. On the other hand, long-sequence transformers, representative of scientific foundation models, place a more uniform dependence on network and capacity with necessary 4D parallelism. Our analysis emphasizes the need for closer performance modeling of different transformer types keeping system features in mind and demonstrates a path towards this. Our code is available as open-source.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2410.00273

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)

Add feedback

Analyzing and Exploring Training Recipes for Large-Scale Transformer-Based Weather Prediction

Willard, Jared D., Harrington, Peter, Subramanian, Shashank, Mahesh, Ankur, O'Brien, Travis A., Collins, William D.

arXiv.org Artificial IntelligenceApr-30-2024

The rapid rise of deep learning (DL) in numerical weather prediction (NWP) has led to a proliferation of models which forecast atmospheric variables with comparable or superior skill than traditional physics-based NWP. However, among these leading DL models, there is a wide variance in both the training settings and architecture used. Further, the lack of thorough ablation studies makes it hard to discern which components are most critical to success. In this work, we show that it is possible to attain high forecast skill even with relatively off-the-shelf architectures, simple training procedures, and moderate compute budgets. Specifically, we train a minimally modified SwinV2 transformer on ERA5 data, and find that it attains superior forecast skill when compared against IFS. We present some ablations on key aspects of the training pipeline, exploring different loss functions, model sizes and depths, and multi-step fine-tuning to investigate their effect. We also examine the model performance with metrics beyond the typical ACC and RMSE, and investigate how the performance scales with model size.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2404.1963

Country: North America > United States > California (0.14)

Genre: Research Report (0.40)

Industry:

Energy (0.94)
Government > Regional Government (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Practical Probabilistic Benchmark for AI Weather Models

Brenowitz, Noah D., Cohen, Yair, Pathak, Jaideep, Mahesh, Ankur, Bonev, Boris, Kurth, Thorsten, Durran, Dale R., Harrington, Peter, Pritchard, Michael S.

arXiv.org Artificial IntelligenceJan-27-2024

Since the weather is chaotic, forecasts aim to predict the distribution of future states rather than make a single prediction. Recently, multiple data driven weather models have emerged claiming breakthroughs in skill. However, these have mostly been benchmarked using deterministic skill scores, and little is known about their probabilistic skill. Unfortunately, it is hard to fairly compare AI weather models in a probabilistic sense, since variations in choice of ensemble initialization, definition of state, and noise injection methodology become confounding. Moreover, even obtaining ensemble forecast baselines is a substantial engineering challenge given the data volumes involved. We sidestep both problems by applying a decades-old idea -- lagged ensembles -- whereby an ensemble can be constructed from a moderately-sized library of deterministic forecasts. This allows the first parameter-free intercomparison of leading AI weather models' probabilistic skill against an operational baseline. The results reveal that two leading AI weather models, i.e. GraphCast and Pangu, are tied on the probabilistic CRPS metric even though the former outperforms the latter in deterministic scoring. We also reveal how multiple time-step loss functions, which many data-driven weather models have employed, are counter-productive: they improve deterministic metrics at the cost of increased dissipation, deteriorating probabilistic skill. This is confirmed through ablations applied to a spherical Fourier Neural Operator (SFNO) approach to AI weather forecasting. Separate SFNO ablations modulating effective resolution reveal it has a useful effect on ensemble dispersion relevant to achieving good ensemble calibration. We hope these and forthcoming insights from lagged ensembles can help guide the development of AI weather forecasts and have thus shared the diagnostic code.

artificial intelligence, machine learning, modeling & simulation, (18 more...)

arXiv.org Artificial Intelligence

2401.15305

Country: North America > United States > California > Alameda County > Berkeley (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Energy (0.93)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Towards Stability of Autoregressive Neural Operators

McCabe, Michael, Harrington, Peter, Subramanian, Shashank, Brown, Jed

arXiv.org Artificial IntelligenceDec-10-2023

Neural operators have proven to be a promising approach for modeling spatiotemporal systems in the physical sciences. However, training these models for large systems can be quite challenging as they incur significant computational and memory expense -- these systems are often forced to rely on autoregressive time-stepping of the neural network to predict future temporal states. While this is effective in managing costs, it can lead to uncontrolled error growth over time and eventual instability. We analyze the sources of this autoregressive error growth using prototypical neural operator models for physical systems and explore ways to mitigate it. We introduce architectural and application-specific improvements that allow for careful control of instability-inducing operations within these models without inflating the compute/memory expense. We present results on several scientific systems that include Navier-Stokes fluid flow, rotating shallow water, and a high-resolution global weather forecasting system. We demonstrate that applying our design principles to neural operators leads to significantly lower errors for long-term forecasts as well as longer time horizons without qualitative signs of divergence compared to the original models for these systems. We open-source our \href{https://github.com/mikemccabe210/stabilizing_neural_operators}{code} for reproducibility.

artificial intelligence, convolution, natural language, (14 more...)

arXiv.org Artificial Intelligence

2306.10619

Country:

North America > United States > Colorado (0.14)
North America > United States > New York (0.14)

Genre: Research Report > Promising Solution (0.54)

Industry: Energy > Oil & Gas > Upstream (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.92)

Add feedback

Towards Foundation Models for Scientific Machine Learning: Characterizing Scaling and Transfer Behavior

Subramanian, Shashank, Harrington, Peter, Keutzer, Kurt, Bhimji, Wahid, Morozov, Dmitriy, Mahoney, Michael, Gholami, Amir

arXiv.org Artificial IntelligenceMay-31-2023

Pre-trained machine learning (ML) models have shown great performance for a wide range of applications, in particular in natural language processing (NLP) and computer vision (CV). Here, we study how pre-training could be used for scientific machine learning (SciML) applications, specifically in the context of transfer learning. We study the transfer behavior of these models as (i) the pre-trained model size is scaled, (ii) the downstream training dataset size is scaled, (iii) the physics parameters are systematically pushed out of distribution, and (iv) how a single model pre-trained on a mixture of different physics problems can be adapted to various downstream applications. We find that-when fine-tuned appropriately-transfer learning can help reach desired accuracy levels with orders of magnitude fewer downstream examples (across different tasks that can even be out-of-distribution) than training from scratch, with consistent behavior across a wide range of downstream examples. We also find that fine-tuning these models yields more performance gains as model size increases, compared to training from scratch on new downstream tasks. These results hold for a broad range of PDE learning tasks. All in all, our results demonstrate the potential of the "pre-train and fine-tune" paradigm for SciML problems, demonstrating a path towards building SciML foundation models. We open-source our code for reproducibility.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2306.00258

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.86)

Industry: Energy > Oil & Gas > Upstream (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Model-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Estimating Galactic Distances From Images Using Self-supervised Representation Learning

Hayat, Md Abul, Harrington, Peter, Stein, George, Lukić, Zarija, Mustafa, Mustafa

arXiv.org Artificial IntelligenceJan-11-2021

We use a contrastive self-supervised learning framework to estimate distances to galaxies from their photometric images. We incorporate data augmentations from computer vision as well as an application-specific augmentation accounting for galactic dust. We find that the resulting visual representations of galaxy images are semantically useful and allow for fast similarity searches, and can be successfully fine-tuned for the task of redshift estimation. We show that (1) pretraining on a large corpus of unlabeled data followed by fine-tuning on some labels can attain the accuracy of a fully-supervised model which requires 2-4x more labeled data, and (2) that by fine-tuning our self-supervised representations using all available data labels in the Main Galaxy Sample of the Sloan Digital Sky Survey (SDSS), we outperform the state-of-the-art supervised learning method.

artificial intelligence, galaxy, inductive learning, (15 more...)

arXiv.org Artificial Intelligence

2101.04293

Country: North America > United States > Arkansas (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)

Add feedback

Self-Supervised Representation Learning for Astronomical Images

Hayat, Md Abul, Stein, George, Harrington, Peter, Lukić, Zarija, Mustafa, Mustafa

arXiv.org Artificial IntelligenceDec-23-2020

Submitted to The Astrophysical Journal Letters ABSTRACT Sky surveys are the largest data generators in astronomy, making automated tools for extracting meaningful scientific information an absolute necessity. We show that, without the need for labels, self-supervised learning recovers representations of sky survey images that are semantically useful for a variety of scientific tasks. These representations can be directly used as features, or fine-tuned, to outperform supervised methods trained only on labeled data. We apply a contrastive learning framework on multi-band galaxy photometry from the Sloan Digital Sky Survey (SDSS), to learn image representations. We then use them for galaxy morphology classification, and fine-tune them for photometric redshift estimation, using labels from the Galaxy Zoo 2 dataset and SDSS spectroscopy. In both downstream tasks, using the same learned representations, we outperform the supervised stateof-the-art results, and we show that our approach can achieve the accuracy of supervised models while using 2-4 times fewer labels for training. INTRODUCTION the quantity and quality of (manually assigned) image labels. Observing and imaging objects in the sky has been Serendipitous discovery of an ionization echo from a the main driver of the scientific discovery process in astronomy, recently faded quasar (Lintott et al. 2009), and the cumbersome because doing controlled experiments is not a search for similar systems that followed (Keel viable option. It in the 1990s, spearheaded by SDSS (Gunn et al. 1998, demonstrates the need for methods which allow for the 2006), has rendered obsolete the approach of manual discovery of truly unusual and previously unseen objects, inspection of images by an expert.

artificial intelligence, inductive learning, representation, (20 more...)

arXiv.org Artificial Intelligence

2012.13083

Country:

North America > United States > California > Alameda County > Berkeley (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (1.00)

Industry: Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.67)

Add feedback