AITopics

2412.03506

Country: North America > United States (0.46)

Genre: Research Report (0.50)

Industry: Energy > Oil & Gas > Upstream (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

arXiv.org Artificial IntelligenceJun-26-2024

MathOdyssey: Benchmarking Mathematical Problem-Solving Skills in Large Language Models Using Odyssey Math Data

Fang, Meng, Wan, Xiangpeng, Lu, Fei, Xing, Fei, Zou, Kai

Large language models (LLMs) have significantly advanced natural language understanding and demonstrated strong problem-solving abilities. Despite these successes, most LLMs still struggle with solving mathematical problems due to the intricate reasoning required. This paper investigates the mathematical problem-solving capabilities of LLMs using the newly developed "MathOdyssey" dataset. The dataset includes diverse mathematical problems at high school and university levels, created by experts from notable institutions to rigorously test LLMs in advanced problem-solving scenarios and cover a wider range of subject areas. By providing the MathOdyssey dataset as a resource to the AI community, we aim to contribute to the understanding and improvement of AI capabilities in complex mathematical problem-solving. We conduct benchmarking on open-source models, such as Llama-3 and DBRX-Instruct, and closed-source models from the GPT series and Gemini models. Our results indicate that while LLMs perform well on routine and moderately difficult tasks, they face significant challenges with Olympiad-level problems and complex university-level questions. Our analysis shows a narrowing performance gap between open-source and closed-source models, yet substantial challenges remain, particularly with the most demanding problems. This study highlights the ongoing need for research to enhance the mathematical reasoning of LLMs. The dataset, results, and code are publicly available.

large language model, machine learning, natural language, (18 more...)

2406.18321

Genre: Research Report > New Finding (0.66)

Industry:

Education > Educational Setting > Higher Education (0.46)
Education > Educational Setting > K-12 Education (0.38)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Machine LearningFeb-13-2024

Interacting Particle Systems on Networks: joint inference of the network and the interaction kernel

Lang, Quanjun, Wang, Xiong, Lu, Fei, Maggioni, Mauro

Modeling multi-agent systems on networks is a fundamental challenge in a wide variety of disciplines. We jointly infer the weight matrix of the network and the interaction kernel, which determine respectively which agents interact with which others and the rules of such interactions from data consisting of multiple trajectories. The estimator we propose leads naturally to a non-convex optimization problem, and we investigate two approaches for its solution: one is based on the alternating least squares (ALS) algorithm; another is based on a new algorithm named operator regression with alternating least squares (ORALS). Both algorithms are scalable to large ensembles of data trajectories. We establish coercivity conditions guaranteeing identifiability and well-posedness. The ALS algorithm appears statistically efficient and robust even in the small data regime but lacks performance and convergence guarantees. The ORALS estimator is consistent and asymptotically normal under a coercivity condition. We conduct several numerical experiments ranging from Kuramoto particle systems on networks to opinion dynamics in leader-follower models.

artificial intelligence, coercivity condition, matrix, (17 more...)

2402.08412

Country: North America > United States (0.67)

Genre: Research Report > New Finding (0.45)

Industry: Energy > Power Industry (0.45)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

arXiv.org Machine LearningNov-28-2023

Optimal minimax rate of learning interaction kernels

Wang, Xiong, Seroussi, Inbar, Lu, Fei

Nonparametric estimation of nonlocal interaction kernels is crucial in various applications involving interacting particle systems. The inference challenge, situated at the nexus of statistical learning and inverse problems, comes from the nonlocal dependency. A central question is whether the optimal minimax rate of convergence for this problem aligns with the rate of $M^{-\frac{2\beta}{2\beta+1}}$ in classical nonparametric regression, where $M$ is the sample size and $\beta$ represents the smoothness exponent of the radial kernel. Our study confirms this alignment for systems with a finite number of particles. We introduce a tamed least squares estimator (tLSE) that attains the optimal convergence rate for a broad class of exchangeable distributions. The tLSE bridges the smallest eigenvalue of random matrices and Sobolev embedding. This estimator relies on nonasymptotic estimates for the left tail probability of the smallest eigenvalue of the normal matrix. The lower minimax rate is derived using the Fano-Tsybakov hypothesis testing method. Our findings reveal that provided the inverse problem in the large sample limit satisfies a coercivity condition, the left tail probability does not alter the bias-variance tradeoff, and the optimal minimax rate remains intact. Our tLSE method offers a straightforward approach for establishing the optimal minimax rate for models with either local or nonlocal dependency.

artificial intelligence, bayesian inference, machine learning, (18 more...)

2311.16852

Country:

North America > United States > New York (0.14)
Asia > Middle East > Israel (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.45)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.45)

arXiv.org Artificial IntelligenceSep-29-2023

Energy-Aware Routing Algorithm for Mobile Ground-to-Air Charging

Cai, Bill, Lu, Fei, Zhou, Lifeng

We investigate the problem of energy-constrained planning for a cooperative system of an Unmanned Ground Vehicles (UGV) and an Unmanned Aerial Vehicle (UAV). In scenarios where the UGV serves as a mobile base to ferry the UAV and as a charging station to recharge the UAV, we formulate a novel energy-constrained routing problem. To tackle this problem, we design an energy-aware routing algorithm, aiming to minimize the overall mission duration under the energy limitations of both vehicles. The algorithm first solves a Traveling Salesman Problem (TSP) to generate a guided tour. Then, it employs the Monte-Carlo Tree Search (MCTS) algorithm to refine the tour and generate paths for the two vehicles. We evaluate the performance of our algorithm through extensive simulations and a proof-of-concept experiment. The results show that our algorithm consistently achieves near-optimal mission time and maintains fast running time across a wide range of problem instances.

artificial intelligence, energy-aware routing algorithm, planning & scheduling, (1 more...)

2310.07729

Genre: Research Report (0.69)

Industry:

Telecommunications > Networks (0.60)
Transportation > Infrastructure & Services (0.53)
Transportation > Ground > Road (0.53)
Transportation > Electric Vehicle (0.53)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.53)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.53)

arXiv.org Artificial IntelligenceMay-18-2023

Small noise analysis for Tikhonov and RKHS regularizations

Lang, Quanjun, Lu, Fei

Regularization plays a pivotal role in ill-posed machine learning and inverse problems. However, the fundamental comparative analysis of various regularization norms remains open. We establish a small noise analysis framework to assess the effects of norms in Tikhonov and RKHS regularizations, in the context of ill-posed linear inverse problems with Gaussian noise. This framework studies the convergence rates of regularized estimators in the small noise limit and reveals the potential instability of the conventional L2-regularizer. We solve such instability by proposing an innovative class of adaptive fractional RKHS regularizers, which covers the L2 Tikhonov and RKHS regularizations by adjusting the fractional smoothness parameter. A surprising insight is that over-smoothing via these fractional RKHSs consistently yields optimal convergence rates, but the optimal hyper-parameter may decay too fast to be selected in practice.

artificial intelligence, machine learning, regularization, (18 more...)

2305.11055

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Mathematics of Computing (0.93)

arXiv.org Artificial IntelligenceApr-11-2023

Benchmarking optimality of time series classification methods in distinguishing diffusions

Zhang, Zehong, Lu, Fei, Fei, Esther Xu, Lyons, Terry, Kevrekidis, Yannis, Woolf, Tom

Statistical optimality benchmarking is crucial for analyzing and designing time series classification (TSC) algorithms. This study proposes to benchmark the optimality of TSC algorithms in distinguishing diffusion processes by the likelihood ratio test (LRT). The LRT is an optimal classifier by the Neyman-Pearson lemma. The LRT benchmarks are computationally efficient because the LRT does not need training, and the diffusion processes can be efficiently simulated and are flexible to reflect the specific features of real-world applications. We demonstrate the benchmarking with three widely-used TSC algorithms: random forest, ResNet, and ROCKET. These algorithms can achieve the LRT optimality for univariate time series and multivariate Gaussian processes. However, these model-agnostic algorithms are suboptimal in classifying high-dimensional nonlinear multivariate time series. Additionally, the LRT benchmark provides tools to analyze the dependence of classification accuracy on the time length, dimension, temporal sampling frequency, and randomness of the time series.

algorithm, artificial intelligence, machine learning, (18 more...)

2301.13112

Country:

Europe > United Kingdom > England (0.28)
North America > United States > California (0.28)

Genre: Research Report (1.00)

Industry: Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Artificial IntelligenceDec-28-2022

A Data-Adaptive Prior for Bayesian Learning of Kernels in Operators

Chada, Neil K., Lang, Quanjun, Lu, Fei, Wang, Xiong

Kernels are efficient in representing nonlocal dependence and they are widely used to design operators between function spaces. Thus, learning kernels in operators from data is an inverse problem of general interest. Due to the nonlocal dependence, the inverse problem can be severely ill-posed with a data-dependent singular inversion operator. The Bayesian approach overcomes the ill-posedness through a non-degenerate prior. However, a fixed non-degenerate prior leads to a divergent posterior mean when the observation noise becomes small, if the data induces a perturbation in the eigenspace of zero eigenvalues of the inversion operator. We introduce a data-adaptive prior to achieve a stable posterior whose mean always has a small noise limit. The data-adaptive prior's covariance is the inversion operator with a hyper-parameter selected adaptive to data by the L-curve method. Furthermore, we provide a detailed analysis on the computational practice of the data-adaptive prior, and demonstrate it on Toeplitz matrices and integral operators. Numerical tests show that a fixed prior can lead to a divergent posterior mean in the presence of any of the four types of errors: discretization error, model error, partial observation and wrong noise assumption. In contrast, the data-adaptive prior always attains posterior means with small noise limits.

artificial intelligence, bayesian learning, machine learning, (2 more...)

2212.14163

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.69)

arXiv.org Machine LearningJul-11-2022

Unsupervised learning of observation functions in state-space models by nonparametric moment methods

An, Qingci, Kevrekidis, Yannis, Lu, Fei, Maggioni, Mauro

We investigate the unsupervised learning of non-invertible observation functions in nonlinear state-space models. Assuming abundant data of the observation process along with the distribution of the state process, we introduce a nonparametric generalized moment method to estimate the observation function via constrained regression. The major challenge comes from the non-invertibility of the observation function and the lack of data pairs between the state and observation. We address the fundamental issue of identifiability from quadratic loss functionals and show that the function space of identifiability is the closure of a RKHS that is intrinsic to the state process. Numerical results show that the first two moments and temporal correlations, along with upper and lower bounds, can identify functions ranging from piecewise polynomials to smooth functions, leading to convergent estimators. The limitations of this method, such as non-identifiability due to symmetry and stationarity, are also discussed.

artificial intelligence, machine learning, observation function, (19 more...)

2207.05242

Country: North America > United States (1.00)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.46)

arXiv.org Machine LearningJun-10-2021

Identifiability of interaction kernels in mean-field equations of interacting particles

Lang, Quanjun, Lu, Fei

Systems of interacting particles or agents are widely used in many areas of science and engineering (see [2, 29, 24, 1] and the references therein). Motivated by these applications, there has been increasing interests in inferring the interaction kernel (or the interaction potential) in a nonparametric fashion for generality. When the system has finitely many particles, the recent efforts [4, 20, 18, 19, 17, 16] provide systematical tools for the inference of the kernel from multiple trajectories of all particles. When the number of particles is large, it becomes impractical to collect trajectories of all particles, but one can often observe the population density, i.e., the solution of the mean-field equations. This leads to the inverse problem of inferring the interaction kernel of the mean-field equation from data.

artificial intelligence, identifiability, machine learning, (16 more...)

2106.05565

Country:

North America > United States > New York (0.14)
North America > United States > California (0.14)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)