AITopics

2406.08068

Country:

Asia (0.67)
Europe (0.67)
North America > United States > California (0.28)
North America > United States > Washington > King County > Seattle (0.14)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Media (1.00)
Information Technology (1.00)
Health & Medicine > Therapeutic Area (0.92)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
(2 more...)

arXiv.org Artificial IntelligenceFeb-6-2024

ClimSim: A large multi-scale dataset for hybrid physics-ML climate emulation

Yu, Sungduk, Hannah, Walter, Peng, Liran, Lin, Jerry, Bhouri, Mohamed Aziz, Gupta, Ritwik, Lütjens, Björn, Will, Justus Christopher, Behrens, Gunnar, Busecke, Julius, Loose, Nora, Stern, Charles I, Beucler, Tom, Harrop, Bryce, Hillman, Benjamin R, Jenney, Andrea, Ferretti, Savannah, Liu, Nana, Anandkumar, Anima, Brenowitz, Noah D, Eyring, Veronika, Geneva, Nicholas, Gentine, Pierre, Mandt, Stephan, Pathak, Jaideep, Subramaniam, Akshay, Vondrick, Carl, Yu, Rose, Zanna, Laure, Zheng, Tian, Abernathey, Ryan, Ahmed, Fiaz, Bader, David C, Baldi, Pierre, Barnes, Elizabeth, Bretherton, Christopher, Caldwell, Peter, Chuang, Wayne, Han, Yilun, Huang, Yu, Iglesias-Suarez, Fernando, Jantre, Sanket, Kashinath, Karthik, Khairoutdinov, Marat, Kurth, Thorsten, Lutsko, Nicholas, Ma, Po-Lun, Mooers, Griffin, Neelin, J. David, Randall, David, Shamekh, Sara, Taylor, Mark A, Urban, Nathan, Yuval, Janni, Zhang, Guang, Pritchard, Michael

Modern climate projections lack adequate spatial and temporal resolution due to computational constraints. A consequence is inaccurate and imprecise predictions of critical processes such as storms. Hybrid methods that combine physics with machine learning (ML) have introduced a new generation of higher fidelity climate simulators that can sidestep Moore's Law by outsourcing compute-hungry, short, high-resolution simulations to ML emulators. However, this hybrid ML-physics simulation approach requires domain-specific treatment and has been inaccessible to ML experts because of lack of training data and relevant, easy-to-use workflows. We present ClimSim, the largest-ever dataset designed for hybrid ML-physics research. It comprises multi-scale climate simulations, developed by a consortium of climate scientists and ML researchers. It consists of 5.7 billion pairs of multivariate input and output vectors that isolate the influence of locally-nested, high-resolution, high-fidelity physics on a host climate simulator's macro-scale physical state. The dataset is global in coverage, spans multiple years at high sampling frequency, and is designed such that resulting emulators are compatible with downstream coupling into operational climate simulators. We implement a range of deterministic and stochastic regression baselines to highlight the ML challenges and their scoring.

artificial intelligence, deep learning, machine learning, (18 more...)

2306.08754

Country: North America > United States (1.00)

Genre: Research Report (1.00)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Energy > Oil & Gas > Upstream (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

arXiv.org Artificial IntelligenceOct-25-2023

An Early Evaluation of GPT-4V(ision)

Wu, Yang, Wang, Shilong, Yang, Hao, Zheng, Tian, Zhang, Hongbo, Zhao, Yanyan, Qin, Bing

In this paper, we evaluate different abilities of GPT-4V including visual understanding, language understanding, visual puzzle solving, and understanding of other modalities such as depth, thermal, video, and audio. To estimate GPT-4V's performance, we manually construct 656 test instances and carefully evaluate the results of GPT-4V. The highlights of our findings are as follows: (1) GPT-4V exhibits impressive performance on English visual-centric benchmarks but fails to recognize simple Chinese texts in the images; (2) GPT-4V shows inconsistent refusal behavior when answering questions related to sensitive traits such as gender, race, and age; (3) GPT-4V obtains worse results than GPT-4 (API) on language understanding tasks including general language understanding benchmarks and visual commonsense knowledge evaluation benchmarks; (4) Few-shot prompting can improve GPT-4V's performance on both visual understanding and language understanding; (5) GPT-4V struggles to find the nuances between two similar images and solve the easy math picture puzzles; (6) GPT-4V shows non-trivial performance on the tasks of similar modalities to image, such as video and thermal. Our experimental results reveal the ability and limitations of GPT-4V and we hope our paper can provide some insights into the application and research of GPT-4V.

large language model, machine learning, natural language, (6 more...)

2310.16534

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

arXiv.org Artificial IntelligenceFeb-5-2023

FastFold: Reducing AlphaFold Training Time from 11 Days to 67 Hours

Cheng, Shenggan, Zhao, Xuanlei, Lu, Guangyang, Fang, Jiarui, Yu, Zhongming, Zheng, Tian, Wu, Ruidong, Zhang, Xiwen, Peng, Jian, You, Yang

Protein structure prediction helps to understand gene translation and protein function, which is of growing interest and importance in structural biology. The AlphaFold model, which used transformer architecture to achieve atomic-level accuracy in protein structure prediction, was a significant breakthrough. However, training and inference of the AlphaFold model are challenging due to its high computation and memory cost. In this work, we present FastFold, an efficient implementation of AlphaFold for both training and inference. We propose Dynamic Axial Parallelism and Duality Async Operations to improve the scaling efficiency of model parallelism. Besides, AutoChunk is proposed to reduce memory cost by over 80% during inference by automatically determining the chunk strategy. Experimental results show that FastFold reduces overall training time from 11 days to 67 hours and achieves 7.5X - 9.5X speedup for long-sequence inference. Furthermore, we scale FastFold to 512 GPUs and achieve an aggregate throughput of 6.02 PetaFLOP/s with 90.1% parallel efficiency.

artificial intelligence, deep learning, machine learning, (18 more...)

2203.00854

Country:

North America > United States (0.28)
Asia (0.28)

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Machine LearningSep-11-2022

Wasserstein Distributional Learning

Tang, Chengliang, Lenssen, Nathan, Wei, Ying, Zheng, Tian

Learning conditional densities and identifying factors that influence the entire distribution are vital tasks in data-driven applications. Conventional approaches work mostly with summary statistics, and are hence inadequate for a comprehensive investigation. Recently, there have been developments on functional regression methods to model density curves as functional outcomes. A major challenge for developing such models lies in the inherent constraint of non-negativity and unit integral for the functional space of density outcomes. To overcome this fundamental issue, we propose Wasserstein Distributional Learning (WDL), a flexible density-on-scalar regression modeling framework that starts with the Wasserstein distance $W_2$ as a proper metric for the space of density outcomes. We then introduce a heterogeneous and flexible class of Semi-parametric Conditional Gaussian Mixture Models (SCGMM) as the model class $\mathfrak{F} \otimes \mathcal{T}$. The resulting metric space $(\mathfrak{F} \otimes \mathcal{T}, W_2)$ satisfies the required constraints and offers a dense and closed functional subspace. For fitting the proposed model, we further develop an efficient algorithm based on Majorization-Minimization optimization with boosted trees. Compared with methods in the previous literature, WDL better characterizes and uncovers the nonlinear dependence of the conditional densities, and their derived summary statistics. We demonstrate the effectiveness of the WDL framework through simulations and real-world applications.

artificial intelligence, machine learning, wasserstein distance, (18 more...)

2209.04991

Country: North America > United States (0.93)

Genre: Research Report (0.81)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.86)

arXiv.org Machine LearningJun-2-2021

Weakly Supervised Learning Creates a Fusion of Modeling Cultures

Tang, Chengliang, Yuan, Gan, Zheng, Tian

The past two decades have witnessed the great success of the algorithmic modeling framework advocated by Breiman et al. (2001). Nevertheless, the excellent prediction performance of these black-box models rely heavily on the availability of strong supervision, i.e. a large set of accurate and exact ground-truth labels. In practice, strong supervision can be unavailable or expensive, which calls for modeling techniques under weak supervision. In this comment, we summarize the key concepts in weakly supervised learning and discuss some recent developments in the field. Using algorithmic modeling alone under a weak supervision might lead to unstable and misleading results. A promising direction would be integrating the data modeling culture into such a framework.

inductive learning, supervision, survey article, (14 more...)

2106.01485

Country: North America > United States > Wisconsin (0.15)

Genre:

Overview (0.70)
Research Report (0.64)

Industry: Transportation (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

arXiv.org Machine LearningSep-3-2020

Online Community Detection for Event Streams on Networks

Fang, Guanhua, Ward, Owen G., Zheng, Tian

A common goal in network modeling is to uncover the latent community structure present among nodes. For many real-world networks, observed connections consist of events arriving as streams, which are then aggregated to form edges, ignoring the temporal dynamic component. A natural way to take account of this temporal dynamic component of interactions is to use point processes as the foundation of the network models for community detection. Computational complexity hampers the scalability of such approaches to large sparse networks. To circumvent this challenge, we propose a fast online variational inference algorithm for learning the community structure underlying dynamic event arrivals on a network using continuous-time point process latent network models. We provide regret bounds on the loss function of this procedure, giving theoretical guarantees on performance. The proposed algorithm is illustrated, using both simulation studies and real data, to have comparable performance in terms of community structure in terms of community recovery to non-online variants. Our proposed framework can also be readily modified to incorporate other popular network structures.

artificial intelligence, data mining, exp, (20 more...)

2009.01742

Country: North America > United States (0.14)

Genre: Research Report (0.63)

Industry: Education (0.46)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Data Science > Data Mining (0.85)
Information Technology > Communications > Social Media (0.82)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

arXiv.org Machine LearningJul-10-2020

Next Waves in Veridical Network Embedding

Ward, Owen G., Huang, Zhen, Davison, Andrew, Zheng, Tian

Embedding nodes of a large network into a metric (e.g., Euclidean) space has become an area of active research in statistical machine learning, which has found applications in natural and social sciences. Generally, a representation of a network object is learned in a Euclidean geometry and is then used for subsequent tasks regarding the nodes and/or edges of the network, such as community detection, node classification and link prediction. Network embedding algorithms have been proposed in multiple disciplines, often with domain-specific notations and details. In addition, different measures and tools have been adopted to evaluate and compare the methods proposed under different settings, often dependent of the downstream tasks. As a result, it is challenging to study these algorithms in the literature systematically. Motivated by the recently proposed Veridical Data Science (VDS) framework, we propose a framework for network embedding algorithms and discuss how the principles of predictability, computability and stability apply in this context. The utilization of this framework in network embedding holds the potential to motivate and point to new directions for future research.

neural network, representation, survey article, (21 more...)

2007.05385

Country: North America > United States (0.14)

Genre: Research Report (0.64)

Industry: Information Technology (0.47)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(3 more...)

arXiv.org Machine LearningMay-8-2017

Stabilized Sparse Online Learning for Sparse Data

Ma, Yuting, Zheng, Tian

Stochastic gradient descent (SGD) is commonly used for optimization in large-scale machine learning problems. Langford et al. (2009) introduce a sparse online learning method to induce sparsity via truncated gradient. With high-dimensional sparse data, however, the method suffers from slow convergence and high variance due to the heterogeneity in feature sparsity. To mitigate this issue, we introduce a stabilized truncated stochastic gradient descent algorithm. We employ a soft-thresholding scheme on the weight vector where the imposed shrinkage is adaptive to the amount of information available in each feature. The variability in the resulted sparse weight vector is further controlled by stability selection integrated with the informative truncation. To facilitate better convergence, we adopt an annealing strategy on the truncation rate, which leads to a balanced trade-off between exploration and exploitation in learning a sparse weight vector. Numerical experiments show that our algorithm compares favorably with the original algorithm in terms of prediction accuracy, achieved sparsity and stability.

algorithm, computer based training, educational technology, (19 more...)

1604.06498

Genre:

Research Report > New Finding (0.67)
Research Report > Experimental Study (0.45)

Industry: Education > Educational Setting > Online (0.63)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.97)

arXiv.org Machine LearningDec-10-2015

Boosted Sparse Non-linear Distance Metric Learning

Ma, Yuting, Zheng, Tian

This paper proposes a boosting-based solution addressing metric learning problems for high-dimensional data. Distance measures have been used as natural measures of (dis)similarity and served as the foundation of various learning methods. The efficiency of distance-based learning methods heavily depends on the chosen distance metric. With increasing dimensionality and complexity of data, however, traditional metric learning methods suffer from poor scalability and the limitation due to linearity as the true signals are usually embedded within a low-dimensional nonlinear subspace. In this paper, we propose a nonlinear sparse metric learning algorithm via boosting. We restructure a global optimization problem into a forward stage-wise learning of weak learners based on a rank-one decomposition of the weight matrix in the Mahalanobis distance metric. A gradient boosting algorithm is devised to obtain a sparse rank-one update of the weight matrix at each step. Nonlinear features are learned by a hierarchical expansion of interactions incorporated within the boosting algorithm. Meanwhile, an early stopping rule is imposed to control the overall complexity of the learned metric. As a result, our approach guarantees three desirable properties of the final metric: positive semi-definiteness, low rank and element-wise sparsity. Numerical experiments show that our learning model compares favorably with the state-of-the-art methods in the current literature of metric learning.

algorithm, artificial intelligence, optimization problem, (16 more...)

1512.03396

Country: North America > United States > California (0.28)

Genre: Research Report > Promising Solution (0.34)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)