AITopics

2503.02265

Country:

North America > United States (1.00)
Europe (0.68)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Nephrology (1.00)
Health & Medicine > Therapeutic Area > Oncology > Kidney Cancer (0.49)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

arXiv.org Artificial IntelligenceFeb-12-2025

MATH-Perturb: Benchmarking LLMs' Math Reasoning Abilities against Hard Perturbations

Huang, Kaixuan, Guo, Jiacheng, Li, Zihao, Ji, Xiang, Ge, Jiawei, Li, Wenzhe, Guo, Yingqing, Cai, Tianle, Yuan, Hui, Wang, Runzhe, Wu, Yue, Yin, Ming, Tang, Shange, Huang, Yangsibo, Jin, Chi, Chen, Xinyun, Zhang, Chiyuan, Wang, Mengdi

Large language models have demonstrated impressive performance on challenging mathematical reasoning tasks, which has triggered the discussion of whether the performance is achieved by true reasoning capability or memorization. To investigate this question, prior work has constructed mathematical benchmarks when questions undergo simple perturbations -- modifications that still preserve the underlying reasoning patterns of the solutions. However, no work has explored hard perturbations, which fundamentally change the nature of the problem so that the original solution steps do not apply. To bridge the gap, we construct MATH-P-Simple and MATH-P-Hard via simple perturbation and hard perturbation, respectively. Each consists of 279 perturbed math problems derived from level-5 (hardest) problems in the MATH dataset (Hendrycksmath et. al., 2021). We observe significant performance drops on MATH-P-Hard across various models, including o1-mini (-16.49%) and gemini-2.0-flash-thinking (-12.9%). We also raise concerns about a novel form of memorization where models blindly apply learned problem-solving skills without assessing their applicability to modified contexts. This issue is amplified when using original problems for in-context learning. We call for research efforts to address this challenge, which is critical for developing more robust and reliable reasoning models.

large language model, machine learning, natural language, (17 more...)

2502.06453

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Machine LearningFeb-10-2025

Covariates-Adjusted Mixed-Membership Estimation: A Novel Network Model with Optimal Guarantees

Fan, Jianqing, Ge, Jiawei, Hou, Jikai

This paper addresses the problem of mixed-membership estimation in networks, where the goal is to efficiently estimate the latent mixed-membership structure from the observed network. Recognizing the widespread availability and valuable information carried by node covariates, we propose a novel network model that incorporates both community information, as represented by the Degree-Corrected Mixed Membership (DCMM) model, and node covariate similarities to determine connections. We investigate the regularized maximum likelihood estimation (MLE) for this model and demonstrate that our approach achieves optimal estimation accuracy for both the similarity matrix and the mixed-membership, in terms of both the Frobenius norm and the entrywise loss. Since directly analyzing the original convex optimization problem is intractable, we employ nonconvex optimization to facilitate the analysis. A key contribution of our work is identifying a crucial assumption that bridges the gap between convex and nonconvex solutions, enabling the transfer of statistical guarantees from the nonconvex approach to its convex counterpart. Importantly, our analysis extends beyond the MLE loss and the mean squared error (MSE) used in matrix completion problems, generalizing to all the convex loss functions. Consequently, our analysis techniques extend to a broader set of applications, including ranking problems based on pairwise comparisons. Finally, simulation experiments validate our theoretical findings, and real-world data analyses confirm the practical relevance of our model.

artificial intelligence, bayesian inference, machine learning, (19 more...)

2502.06671

Country: North America > United States (0.13)

Genre: Research Report (1.00)

Industry:

Information Technology (0.67)
Banking & Finance > Trading (0.45)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.54)

arXiv.org Artificial IntelligenceNov-4-2024

Tracking Tumors under Deformation from Partial Point Clouds using Occupancy Networks

Henrich, Pit, Liu, Jiawei, Ge, Jiawei, Schmidgall, Samuel, Shepard, Lauren, Ghazi, Ahmed Ezzat, Mathis-Ullrich, Franziska, Krieger, Axel

-- T o track tumors during surgery, information from preoperative CT scans is used to determine their position. However, as the surgeon operates, the tumor may be deformed which presents a major hurdle for accurately resecting the tumor, and can lead to surgical inaccuracy, increased operation time, and excessive margins. This issue is particularly pronounced in robot-assisted partial nephrectomy (RAPN), where the kidney undergoes significant deformations during operation. T oward addressing this, we introduce a occupancy network-based method for the localization of tumors within kidney phantoms undergoing deformations at interactive speeds. We validate our method by introducing a 3D hydrogel kidney phantom embedded with exophytic and endophytic renal tumors. It closely mimics real tissue mechanics to simulate kidney deformation during in vivo surgery, providing excellent contrast and clear delineation of tumor margins to enable automatic threshold-based segmentation. Our findings indicate that the proposed method can localize tumors in moderately deforming kidneys with a margin of 6mm to 10mm, while providing essential volumetric 3D information at over 60Hz. This capability directly enables downstream tasks such as robotic resection. Kidney cancer is one of the most common forms of cancer in the US, with over 65,000 new patients being diagnosed every year, leading to over 15,000 deaths [1]. The standard treatment for localized small renal masses has shifted from radical nephrectomy (complete kidney removal) toward the more minimally invasive approach of partial nephrectomy (removal of the tumor, retaining partial kidney function). One of the main challenges during tumor removal is ensuring the resection of adequate tumor margins. This work has been submitted to the IEEE for possible publication.

artificial intelligence, deformation, machine learning, (16 more...)

2411.02619

Country: North America > United States > California (0.14)

Genre: Research Report > New Finding (0.34)

Industry:

Health & Medicine > Therapeutic Area > Nephrology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Health & Medicine > Therapeutic Area > Oncology > Kidney Cancer (0.54)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

arXiv.org Machine LearningJun-6-2024

Towards Principled Superhuman AI for Multiplayer Symmetric Games

Ge, Jiawei, Wang, Yuanhao, Li, Wenzhe, Jin, Chi

Multiplayer games, when the number of players exceeds two, present unique challenges that fundamentally distinguish them from the extensively studied two-player zero-sum games. These challenges arise from the non-uniqueness of equilibria and the risk of agents performing highly suboptimally when adopting equilibrium strategies. While a line of recent works developed learning systems successfully achieving human-level or even superhuman performance in popular multiplayer games such as Mahjong, Poker, and Diplomacy, two critical questions remain unaddressed: (1) What is the correct solution concept that AI agents should find? and (2) What is the general algorithmic framework that provably solves all games within this class? This paper takes the first step towards solving these unique challenges of multiplayer games by provably addressing both questions in multiplayer symmetric normal-form games. We also demonstrate that many meta-algorithms developed in prior practical systems for multiplayer games can fail to achieve even the basic goal of obtaining agent's equal share of the total reward.

algorithm, artificial intelligence, machine learning, (19 more...)

2406.04201

Country: Europe > United Kingdom > England (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Games > Computer Games (0.68)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Machine LearningMay-16-2024

Optimal Aggregation of Prediction Intervals under Unsupervised Domain Shift

Ge, Jiawei, Mukherjee, Debarghya, Fan, Jianqing

As machine learning models are increasingly deployed in dynamic environments, it becomes paramount to assess and quantify uncertainties associated with distribution shifts. A distribution shift occurs when the underlying data-generating process changes, leading to a deviation in the model's performance. The prediction interval, which captures the range of likely outcomes for a given prediction, serves as a crucial tool for characterizing uncertainties induced by their underlying distribution. In this paper, we propose methodologies for aggregating prediction intervals to obtain one with minimal width and adequate coverage on the target domain under unsupervised domain shift, under which we have labeled samples from a related source domain and unlabeled covariates from the target domain. Our analysis encompasses scenarios where the source and the target domain are related via i) a bounded density ratio, and ii) a measure-preserving transformation. Our proposed methodologies are computationally efficient and easy to implement. Beyond illustrating the performance of our method through a real-world dataset, we also delve into the theoretical details. This includes establishing rigorous theoretical guarantees, coupled with finite sample bounds, regarding the coverage and width of our prediction intervals. Our approach excels in practical applications and is underpinned by a solid theoretical framework, ensuring its reliability and effectiveness across diverse contexts.

data mining, init, machine learning, (17 more...)

2405.10302

Country:

Europe > Spain (0.14)
Europe > North Macedonia (0.14)
Europe > Italy (0.14)
Europe > France (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Data Science > Data Mining (0.93)
(2 more...)

arXiv.org Machine LearningNov-27-2023

Maximum Likelihood Estimation is All You Need for Well-Specified Covariate Shift

Ge, Jiawei, Tang, Shange, Fan, Jianqing, Ma, Cong, Jin, Chi

A key challenge of modern machine learning systems is to achieve Out-of-Distribution (OOD) generalization -- generalizing to target data whose distribution differs from that of source data. Despite its significant importance, the fundamental question of ``what are the most effective algorithms for OOD generalization'' remains open even under the standard setting of covariate shift. This paper addresses this fundamental question by proving that, surprisingly, classical Maximum Likelihood Estimation (MLE) purely using source data (without any modification) achieves the minimax optimality for covariate shift under the well-specified setting. That is, no algorithm performs better than MLE in this setting (up to a constant factor), justifying MLE is all you need. Our result holds for a very rich class of parametric models, and does not require any boundedness condition on the density ratio. We illustrate the wide applicability of our framework by instantiating it to three concrete examples -- linear regression, logistic regression, and phase retrieval. This paper further complement the study by proving that, under the misspecified setting, MLE is no longer the optimal choice, whereas Maximum Weighted Likelihood Estimator (MWLE) emerges as minimax optimal in certain scenarios.

artificial intelligence, bayesian inference, machine learning, (20 more...)

2311.15961

Country: North America > United States (0.14)

Genre: Research Report > New Finding (0.54)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

arXiv.org Machine LearningJun-28-2023

UTOPIA: Universally Trainable Optimal Prediction Intervals Aggregation

Fan, Jianqing, Ge, Jiawei, Mukherjee, Debarghya

Uncertainty quantification for prediction is an intriguing problem with significant applications in various fields, such as biomedical science, economic studies, and weather forecasts. Numerous methods are available for constructing prediction intervals, such as quantile regression and conformal predictions, among others. Nevertheless, model misspecification (especially in high-dimension) or sub-optimal constructions can frequently result in biased or unnecessarily-wide prediction intervals. In this paper, we propose a novel and widely applicable technique for aggregating multiple prediction intervals to minimize the average width of the prediction band along with coverage guarantee, called Universally Trainable Optimal Predictive Intervals Aggregation (UTOPIA). The method also allows us to directly construct predictive bands based on elementary basis functions. Our approach is based on linear or convex programming which is easy to implement. All of our proposed methodologies are supported by theoretical guarantees on the coverage probability and optimal average length, which are detailed in this paper. The effectiveness of our approach is convincingly demonstrated by applying it to synthetic data and two real datasets on finance and macroeconomics.

artificial intelligence, data mining, machine learning, (19 more...)

2306.16549

Country: North America > United States (0.14)

Genre: Research Report (1.00)

Industry:

Health & Medicine (0.65)
Banking & Finance > Trading (0.45)
Banking & Finance > Economy (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Data Science > Data Mining (0.67)

arXiv.org Artificial IntelligenceMar-2-2023

On the Provable Advantage of Unsupervised Pretraining

Ge, Jiawei, Tang, Shange, Fan, Jianqing, Jin, Chi

Unsupervised pretraining, which learns a useful representation using a large amount of unlabeled data to facilitate the learning of downstream tasks, is a critical component of modern large-scale machine learning systems. Despite its tremendous empirical success, the rigorous theoretical understanding of why unsupervised pretraining generally helps remains rather limited -- most existing results are restricted to particular methods or approaches for unsupervised pretraining with specialized structural assumptions. This paper studies a generic framework, where the unsupervised representation learning task is specified by an abstract class of latent variable models $\Phi$ and the downstream task is specified by a class of prediction functions $\Psi$. We consider a natural approach of using Maximum Likelihood Estimation (MLE) for unsupervised pretraining and Empirical Risk Minimization (ERM) for learning downstream tasks. We prove that, under a mild ''informative'' condition, our algorithm achieves an excess risk of $\tilde{\mathcal{O}}(\sqrt{\mathcal{C}_\Phi/m} + \sqrt{\mathcal{C}_\Psi/n})$ for downstream tasks, where $\mathcal{C}_\Phi, \mathcal{C}_\Psi$ are complexity measures of function classes $\Phi, \Psi$, and $m, n$ are the number of unlabeled and labeled data respectively. Comparing to the baseline of $\tilde{\mathcal{O}}(\sqrt{\mathcal{C}_{\Phi \circ \Psi}/n})$ achieved by performing supervised learning using only the labeled data, our result rigorously shows the benefit of unsupervised pretraining when $m \gg n$ and $\mathcal{C}_{\Phi\circ \Psi} > \mathcal{C}_\Psi$. This paper further shows that our generic framework covers a wide range of approaches for unsupervised pretraining, including factor models, Gaussian mixture models, and contrastive learning.

artificial intelligence, bayesian inference, machine learning, (20 more...)

2303.01566

Genre: Research Report > New Finding (0.34)