AITopics | Gibbs, Tom

Plotting

Gibbs, Tom

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

MOFA: Discovering Materials for Carbon Capture with a GenAI- and Simulation-Based Workflow

Yan, Xiaoli, Hudson, Nathaniel, Park, Hyun, Grzenda, Daniel, Pauloski, J. Gregory, Schwarting, Marcus, Pan, Haochen, Harb, Hassan, Foreman, Samuel, Knight, Chris, Gibbs, Tom, Chard, Kyle, Chaudhuri, Santanu, Tajkhorshid, Emad, Foster, Ian, Moosavi, Mohamad, Ward, Logan, Huerta, E. A.

arXiv.org Artificial IntelligenceJan-17-2025

We present MOFA, an open-source generative AI (GenAI) plus simulation workflow for high-throughput generation of metal-organic frameworks (MOFs) on large-scale high-performance computing (HPC) systems. MOFA addresses key challenges in integrating GPU-accelerated computing for GPU-intensive GenAI tasks, including distributed training and inference, alongside CPU- and GPU-optimized tasks for screening and filtering AI-generated MOFs using molecular dynamics, density functional theory, and Monte Carlo simulations. These heterogeneous tasks are unified within an online learning framework that optimizes the utilization of available CPU and GPU resources across HPC systems. Performance metrics from a 450-node (14,400 AMD Zen 3 CPUs + 1800 NVIDIA A100 GPUs) supercomputer run demonstrate that MOFA achieves high-throughput generation of novel MOF structures, with CO$_2$ adsorption capacities ranking among the top 10 in the hypothetical MOF (hMOF) dataset. Furthermore, the production of high-quality MOFs exhibits a linear relationship with the number of nodes utilized. The modular architecture of MOFA will facilitate its integration into other scientific applications that dynamically combine GenAI with large-scale simulations.

artificial intelligence, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2501.10651

Country:

North America > United States > Illinois > Cook County (0.14)
North America > Canada > Ontario > Toronto (0.14)
North America > United States > Illinois > Champaign County > Urbana (0.14)

Genre: Workflow (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.68)
Energy > Oil & Gas (0.68)

Technology:

Information Technology > Scientific Computing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.48)

Add feedback

A Simulation System Towards Solving Societal-Scale Manipulation

Touzel, Maximilian Puelma, Sarangi, Sneheel, Welch, Austin, Krishnakumar, Gayatri, Zhao, Dan, Yang, Zachary, Yu, Hao, Kosak-Hine, Ethan, Gibbs, Tom, Musulan, Andreea, Thibault, Camille, Gurbuz, Busra Tugce, Rabbany, Reihaneh, Godbout, Jean-François, Pelrine, Kellin

arXiv.org Artificial IntelligenceOct-16-2024

The rise of AI-driven manipulation poses significant risks to societal trust and democratic processes. Yet, studying these effects in real-world settings at scale is ethically and logistically impractical, highlighting a need for simulation tools that can model these dynamics in controlled settings to enable experimentation with possible defenses. We present a simulation environment designed to address this. We elaborate upon the Concordia framework that simulates offline, 'real life' activity by adding online interactions to the simulation through social media with the integration of a Mastodon server. We improve simulation efficiency and information flow, and add a set of measurement tools, particularly longitudinal surveys. We demonstrate the simulator with a tailored example in which we track agents' political positions and show how partisan manipulation of agents can affect election results.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2410.13915

Country:

North America > United States (0.46)
North America > Mexico (0.28)
North America > Canada > Quebec (0.28)

Genre: Research Report > Experimental Study (0.46)

Industry:

Information Technology (1.00)
Government > Voting & Elections (1.00)
Media (0.94)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

CodeTrans: Towards Cracking the Language of Silicon's Code Through Self-Supervised Deep Learning and High Performance Computing

Elnaggar, Ahmed, Ding, Wei, Jones, Llion, Gibbs, Tom, Feher, Tamas, Angerer, Christoph, Severini, Silvia, Matthes, Florian, Rost, Burkhard

arXiv.org Artificial IntelligenceMay-12-2021

Currently, a growing number of mature natural language processing applications make people's life more convenient. Such applications are built by source code - the language in software engineering. However, the applications for understanding source code language to ease the software engineering process are under-researched. Simultaneously, the transformer model, especially its combination with transfer learning, has been proven to be a powerful technique for natural language processing tasks. These breakthroughs point out a promising direction for process source code and crack software engineering tasks. This paper describes CodeTrans - an encoder-decoder transformer model for tasks in the software engineering domain, that explores the effectiveness of encoder-decoder transformer models for six software engineering tasks, including thirteen sub-tasks. Moreover, we have investigated the effect of different training strategies, including single-task learning, transfer learning, multi-task learning, and multi-task learning with fine-tuning. CodeTrans outperforms the state-of-the-art models on all the tasks. To expedite future works in the software engineering domain, we have published our pre-trained models of CodeTrans. https://github.com/agemagician/CodeTrans

dataset, deep learning, software engineering, (18 more...)

arXiv.org Artificial Intelligence

2104.02443

Country:

Europe > Germany (0.46)
North America > United States > California > Santa Clara County (0.14)

Genre: Research Report (1.00)

Industry: Education > Curriculum (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.90)

Add feedback

ProtTrans: Towards Cracking the Language of Life's Code Through Self-Supervised Deep Learning and High Performance Computing

Elnaggar, Ahmed, Heinzinger, Michael, Dallago, Christian, Rihawi, Ghalia, Wang, Yu, Jones, Llion, Gibbs, Tom, Feher, Tamas, Angerer, Christoph, Steinegger, Martin, Bhowmik, Debsindhu, Rost, Burkhard

arXiv.org Machine LearningJul-20-2020

Computational biology and bioinformatics provide vast data gold-mines from protein sequences, ideal for Language Models (LMs) taken from Natural Language Processing (NLP). These LMs reach for new prediction frontiers at low inference costs. Here, we trained two auto-regressive language models (Transformer-XL, XLNet) and two auto-encoder models (Bert, Albert) on data from UniRef and BFD containing up to 393 billion amino acids (words) from 2.1 billion protein sequences (22- and 112-times the entire English Wikipedia). The LMs were trained on the Summit supercomputer at Oak Ridge National Laboratory (ORNL), using 936 nodes (total 5616 GPUs) and one TPU Pod (V3-512 or V3-1024). We validated the advantage of up-scaling LMs to larger models supported by bigger data by predicting secondary structure (3-states: Q3=76-84, 8-states: Q8=65-73), sub-cellular localization for 10 cellular compartments (Q10=74) and whether a protein is membrane-bound or water-soluble (Q2=89). Dimensionality reduction revealed that the LM-embeddings from unlabeled data (only protein sequences) captured important biophysical properties governing protein shape. This implied learning some of the grammar of the language of life realized in protein sequences. The successful up-scaling of protein LMs through HPC to larger data sets slightly reduced the gap between models trained on evolutionary information and LMs. The official GitHub repository: https://github.com/agemagician/ProtTrans

deep learning, neural network, protein, (21 more...)

arXiv.org Machine Learning

2007.06225

Country:

North America > United States > California > Santa Clara County (0.14)
North America > Canada > Quebec (0.14)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Government > Regional Government > North America Government > United States Government (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback