AITopics

2502.11986

Country: Europe (0.46)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.46)

Sengupta, Ayan, Goel, Yash, Chakraborty, Tanmoy

How to Upscale Neural Networks with Scaling Law? A Survey and Practical Guidelines

arXiv.org Artificial IntelligenceFeb-17-2025

Neural scaling laws have revolutionized the design and optimization of large-scale AI models by revealing predictable relationships between model size, dataset volume, and computational resources. Early research established power-law relationships in model performance, leading to compute-optimal scaling strategies. However, recent studies highlighted their limitations across architectures, modalities, and deployment contexts. Sparse models, mixture-of-experts, retrieval-augmented learning, and multimodal models often deviate from traditional scaling patterns. Moreover, scaling behaviors vary across domains such as vision, reinforcement learning, and fine-tuning, underscoring the need for more nuanced approaches. In this survey, we synthesize insights from over 50 studies, examining the theoretical foundations, empirical findings, and practical implications of scaling laws. We also explore key challenges, including data efficiency, inference scaling, and architecture-specific constraints, advocating for adaptive scaling strategies tailored to real-world applications. We suggest that while scaling laws provide a useful guide, they do not always generalize across all architectures and training strategies.

artificial intelligence, machine learning, natural language, (17 more...)

2502.12051

Country: Asia (0.28)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)

Industry:

Law (0.93)
Education (0.67)
Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Artificial IntelligenceFeb-17-2025

Solving the Cold Start Problem on One's Own as an End User via Preference Transfer

Sato, Ryoma

We propose a new approach that enables end users to directly solve the cold start problem by themselves. The cold start problem is a common issue in recommender systems, and many methods have been proposed to address the problem on the service provider's side. However, when the service provider does not take action, users are left with poor recommendations and no means to improve their experience. We propose an algorithm, Pretender, that allows end users to proactively solve the cold start problem on their own. Pretender does not require any special support from the service provider and can be deployed independently by users. We formulate the problem as minimizing the distance between the source and target distributions and optimize item selection from the target service accordingly. Furthermore, we establish theoretical guarantees for Pretender based on a discrete quadrature problem. We conduct experiments on real-world datasets to demonstrate the effectiveness of Pretender.

data mining, machine learning, natural language, (21 more...)

2502.12398

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment (0.93)
Media > Film (0.68)
Information Technology > Services (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(3 more...)

arXiv.org Machine LearningFeb-17-2025

Learning Surrogate Potential Mean Field Games via Gaussian Processes: A Data-Driven Approach to Ill-Posed Inverse Problems

Zhang, Jingguo, Yang, Xianjin, Mou, Chenchen, Zhou, Chao

Mean field games (MFGs) describe the collective behavior of large populations of interacting agents. In this work, we tackle ill-posed inverse problems in potential MFGs, aiming to recover the agents' population, momentum, and environmental setup from limited, noisy measurements and partial observations. These problems are ill-posed because multiple MFG configurations can explain the same data, or different parameters can yield nearly identical observations. Nonetheless, they remain crucial in practice for real-world scenarios where data are inherently sparse or noisy, or where the MFG structure is not fully determined. Our focus is on finding surrogate MFGs that accurately reproduce the observed data despite these challenges. We propose two Gaussian process (GP)-based frameworks: an inf-sup formulation and a bilevel approach. The choice between them depends on whether the unknown parameters introduce concavity in the objective. In the inf-sup framework, we use the linearity of GPs and their parameterization structure to maintain convex-concave properties, allowing us to apply standard convex optimization algorithms. In the bilevel framework, we employ a gradient-descent-based algorithm and introduce two methods for computing the outer gradient. The first method leverages an existing solver for the inner potential MFG and applies automatic differentiation, while the second adopts an adjoint-based strategy that computes the outer gradient independently of the inner solver. Our numerical experiments show that when sufficient prior information is available, the unknown parameters can be accurately recovered. Otherwise, if prior information is limited, the inverse problem is ill-posed, but our frameworks can still produce surrogate MFG models that closely match observed data.

artificial intelligence, inverse problem, machine learning, (17 more...)

arXiv.org Machine Learning

2502.11506

Country:

Asia (0.67)
North America > United States (0.46)

Genre: Research Report (1.00)

Industry: Banking & Finance (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Gouverneur, Amaury, Oechtering, Tobias J., Skoglund, Mikael

Refined PAC-Bayes Bounds for Offline Bandits

arXiv.org Machine LearningFeb-17-2025

In this paper, we present refined probabilistic bounds on empirical reward estimates for off-policy learning in bandit problems. We build on the PAC-Bayesian bounds from Seldin et al. (2010) and improve on their results using a new parameter optimization approach introduced by Rodr\'iguez et al. (2024). This technique is based on a discretization of the space of possible events to optimize the "in probability" parameter. We provide two parameter-free PAC-Bayes bounds, one based on Hoeffding-Azuma's inequality and the other based on Bernstein's inequality. We prove that our bounds are almost optimal as they recover the same rate as would be obtained by setting the "in probability" parameter after the realization of the data.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Machine Learning

2502.11953

Country: Europe > United Kingdom (0.28)

Genre: Research Report (0.83)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.48)
Information Technology > Data Science > Data Mining > Big Data (0.36)

Kim, Sanghee, Basu, Sumanta

A Pathwise Coordinate Descent Algorithm for LASSO Penalized Quantile Regression

arXiv.org Machine LearningFeb-17-2025

$\ell_1$ penalized quantile regression is used in many fields as an alternative to penalized least squares regressions for high-dimensional data analysis. Existing algorithms for penalized quantile regression either use linear programming, which does not scale well in high dimension, or an approximate coordinate descent (CD) which does not solve for exact coordinatewise minimum of the nonsmooth loss function. Further, neither approaches build fast, pathwise algorithms commonly used in high-dimensional statistics to leverage sparsity structure of the problem in large-scale data sets. To avoid the computational challenges associated with the nonsmooth quantile loss, some recent works have even advocated using smooth approximations to the exact problem. In this work, we develop a fast, pathwise coordinate descent algorithm to compute exact $\ell_1$ penalized quantile regression estimates for high-dimensional data. We derive an easy-to-compute exact solution for the coordinatewise nonsmooth loss minimization, which, to the best of our knowledge, has not been reported in the literature. We also employ a random perturbation strategy to help the algorithm avoid getting stuck along the regularization path. In simulated data sets, we show that our algorithm runs substantially faster than existing alternatives based on approximate CD and linear program, while retaining the same level of estimation accuracy.

artificial intelligence, machine learning, survey article, (20 more...)

arXiv.org Machine Learning

2502.12363

Genre:

Overview (0.48)
Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science (0.88)

Beyond Any-Shot Adaptation: Predicting Optimization Outcome for Robustness Gains without Extra Pay

Wang, Qi Cheems, Xiao, Zehao, Mao, Yixiu, Qu, Yun, Shen, Jiayi, Lv, Yiqin, Ji, Xiangyang

The foundation model enables general-purpose problem-solving and enjoys desirable rapid adaptation due to its adopted cross-task generalization paradigms, e.g., pretraining, meta-training, and finetuning. Recent advances in these paradigms show the crucial role of challenging tasks' prioritized sampling in enhancing adaptation robustness. However, ranking task difficulties exhausts massive task queries to evaluate, thus computation and annotation intensive, which is typically unaffordable in practice. This work underscores the criticality of both adaptation robustness and learning efficiency, especially in scenarios where tasks are risky or costly to evaluate, e.g., policy evaluations in Markov decision processes (MDPs) or inference with large models. To this end, we present Model Predictive Task Sampling (MPTS) to establish connections between the task space and adaptation risk landscape to form a theoretical guideline in robust active task sampling. MPTS characterizes the task episodic information with a generative model and directly predicts task-specific adaptation risk values from posterior inference. The developed risk learner can amortize expensive evaluation and provably approximately rank task difficulties in the pursuit of task robust adaptation. MPTS can be seamlessly integrated into zero-shot, few-shot, and many-shot learning paradigms. Extensive experimental results are conducted to exhibit the superiority of the proposed framework, remarkably increasing task adaptation robustness and retaining learning efficiency in contrast to existing state-of-the-art (SOTA) methods. The code is available at the project site https://github.com/thu-rllab/MPTS.

large language model, learner, machine learning, (21 more...)

2501.11039

Country: Asia > China (0.28)

Genre: Research Report > New Finding (0.67)

Industry:

Health & Medicine (0.46)
Education (0.45)
Energy > Oil & Gas (0.45)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(6 more...)

Practical Topics in Optimization

Lu, Jun

In an era where data-driven decision-making and computational efficiency are paramount, optimization plays a foundational role in advancing fields such as mathematics, computer science, operations research, machine learning, and beyond. From refining machine learning models to improving resource allocation and designing efficient algorithms, optimization techniques serve as essential tools for tackling complex problems. This book aims to provide both an introductory guide and a comprehensive reference, equipping readers with the necessary knowledge to understand and apply optimization methods within their respective fields. Our primary goal is to demystify the inner workings of optimization algorithms, including black-box and stochastic optimizers, by offering both formal and intuitive explanations. Starting from fundamental mathematical principles, we derive key results to ensure that readers not only learn how these techniques work but also understand when and why to apply them effectively. By striking a careful balance between theoretical depth and practical application, this book serves a broad audience, from students and researchers to practitioners seeking robust optimization strategies.

generalized conditional gradient method, machine learning, natural language, (23 more...)

2503.05882

Country:

North America > United States (0.45)
Europe (0.13)
Asia (0.13)
North America > Canada > Ontario > Toronto (0.13)

Genre:

Workflow (1.00)
Research Report > New Finding (0.92)
Summary/Review (0.87)

Industry:

Health & Medicine (1.00)
Energy > Oil & Gas (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

JExplore: Design Space Exploration Tool for Nvidia Jetson Boards

Kutukcu, Basar, Xie, Sinan, Baidya, Sabur, Dey, Sujit

Nvidia Jetson boards are powerful systems for executing artificial intelligence workloads in edge and mobile environments due to their effective GPU hardware and widely supported software stack. In addition to these benefits, Nvidia Jetson boards provide large configurability by giving the user the choice to modify many hardware parameters. This large space of configurability creates the need of searching the optimal configurations based on the user's requirements. In this work, we propose JExplore, a multi-board software and hardware design space exploration tool. JExplore can be integrated with any search tool, hence creating a common benchmarking ground for the search algorithms. Moreover, it accelerates the exploration of user application and Nvidia Jetson configurations for researchers and engineers by encapsulating host-client communication, configuration management, and metric measurement.

jexplore, search algorithm, workload, (13 more...)

2502.15773

Country:

North America > United States > Kentucky > Jefferson County > Louisville (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
(4 more...)

Genre: Research Report (0.40)

Industry: Information Technology > Hardware (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.74)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.47)

OptMATH: A Scalable Bidirectional Data Synthesis Framework for Optimization Modeling

Lu, Hongliang, Xie, Zhonglin, Wu, Yaoyu, Ren, Can, Chen, Yuxuan, Wen, Zaiwen

Despite the rapid development of large language models (LLMs), a fundamental challenge persists: the lack of high-quality optimization modeling datasets hampers LLMs' robust modeling of practical optimization problems from natural language descriptions (NL). This data scarcity also contributes to the generalization difficulties experienced by learning-based methods. To address these challenges, we propose a scalable framework for synthesizing a high-quality dataset, named OptMATH. Starting from curated seed data with mathematical formulations (MF), this framework automatically generates problem data (PD) with controllable complexity. Then, a back-translation step is employed to obtain NL. To verify the correspondence between the NL and the PD, a forward modeling step followed by rejection sampling is used. The accepted pairs constitute the training part of OptMATH. Then a collection of rejected pairs is identified and further filtered. This collection serves as a new benchmark for optimization modeling, containing difficult instances whose lengths are much longer than these of NL4OPT and MAMO. Through extensive experiments, we demonstrate that models of various sizes (0.5B-32B parameters) trained on OptMATH achieve superior results on multiple modeling benchmarks, thereby validating the effectiveness and scalability of our approach.

large language model, machine learning, natural language, (19 more...)

2502.11102

Country:

North America > United States (0.28)
Asia (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Transportation (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)