AITopics

2502.01048

Country:

Europe > Italy > Marche > Ancona Province > Ancona (0.04)
Europe > Finland > Uusimaa > Helsinki (0.04)
South America > Argentina > Patagonia > Tierra del Fuego Province > Ushuaia (0.04)
(5 more...)

Genre:

Workflow (1.00)
Research Report > Promising Solution (1.00)
Research Report > New Finding (1.00)
(3 more...)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Regional Government (1.00)
Health & Medicine > Therapeutic Area > Neurology (0.92)
(3 more...)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(8 more...)

Taitler, Boaz, Ben-Porat, Omer

Selective Response Strategies for GenAI

arXiv.org Artificial IntelligenceFeb-2-2025

The rise of Generative AI (GenAI) has significantly impacted human-based forums like Stack Overflow, which are essential for generating high-quality data. This creates a negative feedback loop, hindering the development of GenAI systems, which rely on such data to provide accurate responses. In this paper, we provide a possible remedy: A novel strategy we call selective response. Selective response implies that GenAI could strategically provide inaccurate (or conservative) responses to queries involving emerging topics and novel technologies, thereby driving users to use human-based forums like Stack Overflow. We show that selective response can potentially have a compounding effect on the data generation process, increasing both GenAI's revenue and user welfare in the long term. From an algorithmic perspective, we propose an approximately optimal approach to maximize GenAI's revenue under social welfare constraints. From a regulatory perspective, we derive sufficient and necessary conditions for selective response to improve welfare improvements.

artificial intelligence, machine learning, natural language, (19 more...)

2502.00729

Country:

Asia > Middle East > Israel (0.04)
North America > United States > Texas (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report (1.00)
Overview > Innovation (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

arXiv.org Machine LearningFeb-2-2025

HASSLE-free: A unified Framework for Sparse plus Low-Rank Matrix Decomposition for LLMs

Makni, Mehdi, Behdin, Kayhan, Xu, Zheng, Ponomareva, Natalia, Mazumder, Rahul

The impressive capabilities of large foundation models come at a cost of substantial computing resources to serve them. Compressing these pre-trained models is of practical interest as it can democratize deploying them to the machine learning community at large by lowering the costs associated with inference. A promising compression scheme is to decompose foundation models' dense weights into a sum of sparse plus low-rank matrices. In this paper, we design a unified framework coined HASSLE-free for (semi-structured) sparse plus low-rank matrix decomposition of foundation models. Our framework introduces the local layer-wise reconstruction error objective for this decomposition, we demonstrate that prior work solves a relaxation of this optimization problem; and we provide efficient and scalable methods to minimize the exact introduced optimization problem. HASSLE-free substantially outperforms state-of-the-art methods in terms of the introduced objective and a wide range of LLM evaluation benchmarks. For the Llama3-8B model with a 2:4 sparsity component plus a 64-rank component decomposition, a compression scheme for which recent work shows important inference acceleration on GPUs, HASSLE-free reduces the test perplexity by 12% for the WikiText-2 dataset and reduces the gap (compared to the dense model) of the average of eight popular zero-shot tasks by 15% compared to existing methods.

large language model, machine learning, natural language, (19 more...)

2502.00899

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report (0.84)

Industry: Government (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.91)

Ohashi, Atsumoto, Higashinaka, Ryuichiro

Universal Post-Processing Networks for Joint Optimization of Modules in Task-Oriented Dialogue Systems

arXiv.org Artificial IntelligenceFeb-2-2025

Post-processing networks (PPNs) are components that modify the outputs of arbitrary modules in task-oriented dialogue systems and are optimized using reinforcement learning (RL) to improve the overall task completion capability of the system. However, previous PPN-based approaches have been limited to handling only a subset of modules within a system, which poses a significant limitation in improving the system performance. In this study, we propose a joint optimization method for post-processing the outputs of all modules using universal post-processing networks (UniPPNs), which are language-model-based networks that can modify the outputs of arbitrary modules in a system as a sequence-transformation task. Moreover, our RL algorithm, which employs a module-level Markov decision process, enables fine-grained value and advantage estimation for each module, thereby stabilizing joint learning for post-processing the outputs of all modules. Through both simulation-based and human evaluation experiments using the MultiWOZ dataset, we demonstrated that UniPPN outperforms conventional PPNs in the task completion capability of task-oriented dialogue systems.

large language model, machine learning, reinforcement learning, (19 more...)

2502.00747

Country:

North America > United States (0.04)
Asia > Middle East > Jordan (0.04)
Asia > Japan (0.04)
Asia > China (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Consumer Products & Services > Restaurants (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
(2 more...)

arXiv.org Machine LearningFeb-2-2025

Orlicz-Sobolev Transport for Unbalanced Measures on a Graph

Le, Tam, Nguyen, Truyen, Hino, Hideitsu, Fukumizu, Kenji

Moving beyond $L^p$ geometric structure, Orlicz-Wasserstein (OW) leverages a specific class of convex functions for Orlicz geometric structure. While OW remarkably helps to advance certain machine learning approaches, it has a high computational complexity due to its two-level optimization formula. Recently, Le et al. (2024) exploits graph structure to propose generalized Sobolev transport (GST), i.e., a scalable variant for OW. However, GST assumes that input measures have the same mass. Unlike optimal transport (OT), it is nontrivial to incorporate a mass constraint to extend GST for measures on a graph, possibly having different total mass. In this work, we propose to take a step back by considering the entropy partial transport (EPT) for nonnegative measures on a graph. By leveraging Caffarelli & McCann (2010)'s observations, EPT can be reformulated as a standard complete OT between two corresponding balanced measures. Consequently, we develop a novel EPT with Orlicz geometric structure, namely Orlicz-EPT, for unbalanced measures on a graph. Especially, by exploiting the dual EPT formulation and geometric structures of the graph-based Orlicz-Sobolev space, we derive a novel regularization to propose Orlicz-Sobolev transport (OST). The resulting distance can be efficiently computed by simply solving a univariate optimization problem, unlike the high-computational two-level optimization problem for Orlicz-EPT. Additionally, we derive geometric structures for the OST and draw its relations to other transport distances. We empirically show that OST is several-order faster than Orlicz-EPT. We further illustrate preliminary evidences on the advantages of OST for document classification, and several tasks in topological data analysis.

artificial intelligence, machine learning, orlicz-ept, (19 more...)

2502.00739

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Maranjyan, Artavazd, Saad, El Mehdi, Richtárik, Peter, Orabona, Francesco

ATA: Adaptive Task Allocation for Efficient Resource Management in Distributed Machine Learning

arXiv.org Machine LearningFeb-2-2025

Asynchronous methods are fundamental for parallelizing computations in distributed machine learning. They aim to accelerate training by fully utilizing all available resources. However, their greedy approach can lead to inefficiencies using more computation than required, especially when computation times vary across devices. If the computation times were known in advance, training could be fast and resource-efficient by assigning more tasks to faster workers. The challenge lies in achieving this optimal allocation without prior knowledge of the computation time distributions. In this paper, we propose ATA (Adaptive Task Allocation), a method that adapts to heterogeneous and random distributions of worker computation times. Through rigorous theoretical analysis, we show that ATA identifies the optimal task allocation and performs comparably to methods with prior knowledge of computation times. Experimental results further demonstrate that ATA is resource-efficient, significantly reducing costs compared to the greedy approach, which can be arbitrarily expensive depending on the number of workers.

artificial intelligence, machine learning, optimization problem, (15 more...)

2502.00775

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

Kulmer, Dominik, Leitenstern, Maximilian, Weinmann, Marcel, Lienkamp, Markus

OpenLiDARMap: Zero-Drift Point Cloud Mapping using Map Priors

arXiv.org Artificial IntelligenceFeb-1-2025

Accurate localization is a critical component of mobile autonomous systems, especially in Global Navigation Satellite Systems (GNSS)-denied environments where traditional methods fail. In such scenarios, environmental sensing is essential for reliable operation. However, approaches such as LiDAR odometry and Simultaneous Localization and Mapping (SLAM) suffer from drift over long distances, especially in the absence of loop closures. Map-based localization offers a robust alternative, but the challenge lies in creating and georeferencing maps without GNSS support. To address this issue, we propose a method for creating georeferenced maps without GNSS by using publicly available data, such as building footprints and surface models derived from sparse aerial scans. Our approach integrates these data with onboard LiDAR scans to produce dense, accurate, georeferenced 3D point cloud maps. By combining an Iterative Closest Point (ICP) scan-to-scan and scan-to-map matching strategy, we achieve high local consistency without suffering from long-term drift. Thus, we eliminate the reliance on GNSS for the creation of georeferenced maps. The results demonstrate that LiDAR-only mapping can produce accurate georeferenced point cloud maps when augmented with existing map priors.

artificial intelligence, dataset, optimization problem, (15 more...)

2501.11111

Country:

Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)
North America > United States > Michigan (0.04)
Asia > South Korea (0.04)

Genre: Research Report (1.00)

Industry: Transportation > Ground > Road (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Vision (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

arXiv.org Artificial IntelligenceFeb-1-2025

Addressing Delayed Feedback in Conversion Rate Prediction via Influence Functions

Ding, Chenlu, Wu, Jiancan, Yuan, Yancheng, Fang, Junfeng, Li, Cunchun, Wang, Xiang, He, Xiangnan

In the realm of online digital advertising, conversion rate (CVR) prediction plays a pivotal role in maximizing revenue under cost-per-conversion (CPA) models, where advertisers are charged only when users complete specific actions, such as making a purchase. A major challenge in CVR prediction lies in the delayed feedback problem-conversions may occur hours or even weeks after initial user interactions. This delay complicates model training, as recent data may be incomplete, leading to biases and diminished performance. Although existing methods attempt to address this issue, they often fall short in adapting to evolving user behaviors and depend on auxiliary models, which introduces computational inefficiencies and the risk of model inconsistency. In this work, we propose an Influence Function-empowered framework for Delayed Feedback Modeling (IF-DFM). IF-DFM leverages influence functions to estimate how newly acquired and delayed conversion data impact model parameters, enabling efficient parameter updates without the need for full retraining. Additionally, we present a scalable algorithm that efficiently computes parameter updates by reframing the inverse Hessian-vector product as an optimization problem, striking a balance between computational efficiency and effectiveness. Extensive experiments on benchmark datasets demonstrate that IF-DFM consistently surpasses state-of-the-art methods, significantly enhancing both prediction accuracy and model adaptability.

artificial intelligence, machine learning, optimization problem, (14 more...)

2502.01669

Country:

Asia > China > Jiangsu Province > Yancheng (0.05)
Asia > China > Anhui Province > Hefei (0.05)
Asia > Singapore (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (1.00)

Industry: Marketing (0.48)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

arXiv.org Machine LearningFeb-1-2025

Distributed Primal-Dual Algorithms: Unification, Connections, and Insights

Wu, Runxiong, Liu, Dong, Wang, Xueqin, Wang, Andi

We study primal-dual algorithms for general empirical risk minimization problems in distributed settings, focusing on two prominent classes of algorithms. The first class is the communication-efficient distributed dual coordinate ascent (CoCoA), derived from the coordinate ascent method for solving the dual problem. The second class is the alternating direction method of multipliers (ADMM), including consensus ADMM, linearized ADMM, and proximal ADMM. We demonstrate that both classes of algorithms can be transformed into a unified update form that involves only primal and dual variables. This discovery reveals key connections between the two classes of algorithms: CoCoA can be interpreted as a special case of proximal ADMM for solving the dual problem, while consensus ADMM is closely related to a proximal ADMM algorithm. This discovery provides the insight that by adjusting the augmented Lagrangian parameter, we can easily enable the ADMM variants to outperform the CoCoA variants. We further explore linearized versions of ADMM and analyze the effects of tuning parameters on these ADMM variants in the distributed setting. Our theoretical findings are supported by extensive simulation studies and real-world data analysis.

algorithm, artificial intelligence, machine learning, (18 more...)

2502.0047

Country:

North America > United States > Wisconsin > Dane County > Madison (0.14)
Asia > Middle East > Jordan (0.04)
Asia > China > Anhui Province > Hefei (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

Bayraktar, Erhan, Ekren, Ibrahim, Zhou, Hongyi

Uniform-in-time weak propagation of chaos for consensus-based optimization

arXiv.org Artificial IntelligenceFeb-1-2025

We study the uniform-in-time weak propagation of chaos for the consensus-based optimization (CBO) method on a bounded searching domain. We apply the methodology for studying long-time behaviors of interacting particle systems developed in the work of Delarue and Tse (ArXiv:2104.14973). Our work shows that the weak error has order $O(N^{-1})$ uniformly in time, where $N$ denotes the number of particles. The main strategy behind the proofs are the decomposition of the weak errors using the linearized Fokker-Planck equations and the exponential decay of their Sobolev norms. Consequently, our result leads to the joint convergence of the empirical distribution of the CBO particle system to the Dirac-delta distribution at the global minimizer in population size and running time in Wasserstein-type metrics.

artificial intelligence, evolutionary algorithm, machine learning, (16 more...)

2502.00582

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)