AITopics

2411.09517

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.67)

arXiv.org Artificial IntelligenceMay-31-2024

Position Coupling: Leveraging Task Structure for Improved Length Generalization of Transformers

Cho, Hanseul, Cha, Jaeyoung, Awasthi, Pranjal, Bhojanapalli, Srinadh, Gupta, Anupam, Yun, Chulhee

Even for simple arithmetic tasks like integer addition, it is challenging for Transformers to generalize to longer sequences than those encountered during training. To tackle this problem, we propose position coupling, a simple yet effective method that directly embeds the structure of the tasks into the positional encoding of a (decoder-only) Transformer. Taking a departure from the vanilla absolute position mechanism assigning unique position IDs to each of the tokens, we assign the same position IDs to two or more "relevant" tokens; for integer addition tasks, we regard digits of the same significance as in the same position. On the empirical side, we show that with the proposed position coupling, a small (1-layer) Transformer trained on 1 to 30-digit additions can generalize up to 200-digit additions (6.67x of the trained length). On the theoretical side, we prove that a 1-layer Transformer with coupled positions can solve the addition task involving exponentially many digits, whereas any 1-layer Transformer without positional information cannot entirely solve it. We also demonstrate that position coupling can be applied to other algorithmic tasks such as addition with multiple summands, Nx2 multiplication, copy/reverse, and a two-dimensional task.

artificial intelligence, machine learning, natural language, (14 more...)

2405.20671

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceMar-18-2024

MAC Advice for Facility Location Mechanism Design

Barak, Zohar, Gupta, Anupam, Talgam-Cohen, Inbal

Algorithms with predictions have attracted much attention in the last years across various domains, including variants of facility location, as a way to surpass traditional worst-case analyses. We study the $k$-facility location mechanism design problem, where the $n$ agents are strategic and might misreport their location. Unlike previous models, where predictions are for the $k$ optimal facility locations, we receive $n$ predictions for the locations of each of the agents. However, these predictions are only "mostly" and "approximately" correct (or MAC for short) -- i.e., some $\delta$-fraction of the predicted locations are allowed to be arbitrarily incorrect, and the remainder of the predictions are allowed to be correct up to an $\varepsilon$-error. We make no assumption on the independence of the errors. Can such predictions allow us to beat the current best bounds for strategyproof facility location? We show that the $1$-median (geometric median) of a set of points is naturally robust under corruptions, which leads to an algorithm for single-facility location with MAC predictions. We extend the robustness result to a "balanced" variant of the $k$ facilities case. Without balancedness, we show that robustness completely breaks down, even for the setting of $k=2$ facilities on a line. For this "unbalanced" setting, we devise a truthful random mechanism that outperforms the best known result of Lu et al. [2010], which does not use predictions. En route, we introduce the problem of "second" facility location (when the first facility's location is already fixed). Our findings on the robustness of the $1$-median and more generally $k$-medians may be of independent interest, as quantitative versions of classic breakdown-point results in robust statistics.

artificial intelligence, machine learning, prediction, (19 more...)

2403.12181

Country:

Asia > Middle East > Israel (0.14)
North America > United States > Massachusetts (0.14)

Genre: Research Report > New Finding (0.66)

Industry: Information Technology (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Model-Based Reasoning (0.62)

arXiv.org Machine LearningOct-1-2023

Improving Length-Generalization in Transformers via Task Hinting

Awasthi, Pranjal, Gupta, Anupam

It has been observed in recent years that transformers have problems with length generalization for certain types of reasoning and arithmetic tasks. In particular, the performance of a transformer model trained on tasks (say addition) up to a certain length (e.g., 5 digit numbers) drops sharply when applied to longer instances of the same problem. This work proposes an approach based on task hinting towards addressing length generalization. Our key idea is that while training the model on task-specific data, it is helpful to simultaneously train the model to solve a simpler but related auxiliary task as well. We study the classical sorting problem as a canonical example to evaluate our approach. We design a multitask training framework and show that task hinting significantly improve length generalization. For sorting we show that it is possible to train models on data consisting of sequences having length at most $20$, and improve the test accuracy on sequences of length $100$ from less than 1% (for standard training) to more than 92% (via task hinting). Our study uncovers several interesting aspects of length generalization. We observe that while several auxiliary tasks may seem natural a priori, their effectiveness in improving length generalization differs dramatically. We further use probing and visualization-based techniques to understand the internal mechanisms via which the model performs the task, and propose a theoretical construction consistent with the observed learning behaviors of the model. Based on our construction, we show that introducing a small number of length dependent parameters into the training procedure can further boost the performance on unseen lengths. Finally, we also show the efficacy of our task hinting based approach beyond sorting, giving hope that these techniques will be applicable in broader contexts.

artificial intelligence, machine learning, natural language, (16 more...)

2310.00726

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Artificial IntelligenceNov-9-2022

Optimal shepherding and transport of a flock

Ranganathan, Aditya, Heyde, Alexander, Gupta, Anupam, Mahadevan, L.

We investigate how a shepherd should move in order to effectively herd and guide a flock of agents towards a target. Using a detailed agent-based model (ABM) for the members of the flock, we pose and solve an optimization problem for the shepherd that has to simultaneously work to keep the flock cohesive while coercing it towards a prescribed project. We find that three distinct strategies emerge as potential solutions as a function of just two parameters: the ratio of herd size to shepherd repulsion length and the ratio of herd speed to shepherd speed. We term these as: (i) mustering, in which the shepherd circles the herd to ensure compactness, (ii) droving, in which the shepherd chases the herd in a desired direction, and (iii) driving, a hitherto unreported strategy where the flock surrounds a shepherd that drives it from within. A minimal dynamical model for the size, shape and position of the herd captures the effective behavior of the ABM, and further allows us to characterize the different herding strategies in terms of the behavior of the shepherd that librates (mustering), oscillates (droving) or moves steadily (driving). All together, our study yields a simple and intuitive classification of herding strategies that ought to be of general interest in the context of controlling the collective behavior of active matter.

artificial intelligence, machine learning, shepherd, (19 more...)

2211.04352

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Machine LearningMar-28-2019

Better Algorithms for Stochastic Bandits with Adversarial Corruptions

Gupta, Anupam, Koren, Tomer, Talwar, Kunal

We study the stochastic multi-armed bandits problem in the presence of adversarial corruption. We present a new algorithm for this problem whose regret is nearly optimal, substantially improving upon previous work. Our algorithm is agnostic to the level of adversarial contamination and can tolerate a significant amount of corruption with virtually no degradation in performance.

algorithm, artificial intelligence, big data, (20 more...)

1902.08647

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Machine LearningJun-4-2017

Nearly Optimal Sampling Algorithms for Combinatorial Pure Exploration

Chen, Lijie, Gupta, Anupam, Li, Jian, Qiao, Mingda, Wang, Ruosong

We study the combinatorial pure exploration problem Best-Set in stochastic multi-armed bandits. In a Best-Set instance, we are given $n$ arms with unknown reward distributions, as well as a family $\mathcal{F}$ of feasible subsets over the arms. Our goal is to identify the feasible subset in $\mathcal{F}$ with the maximum total mean using as few samples as possible. The problem generalizes the classical best arm identification problem and the top-$k$ arm identification problem, both of which have attracted significant attention in recent years. We provide a novel instance-wise lower bound for the sample complexity of the problem, as well as a nontrivial sampling algorithm, matching the lower bound up to a factor of $\ln|\mathcal{F}|$. For an important class of combinatorial families, we also provide polynomial time implementation of the sampling algorithm, using the equivalence of separation and optimization for convex program, and approximate Pareto curves in multi-objective optimization. We also show that the $\ln|\mathcal{F}|$ factor is inevitable in general through a nontrivial lower bound construction. Our results significantly improve several previous results for several important combinatorial constraints, and provide a tighter understanding of the general Best-Set problem. We further introduce an even more general problem, formulated in geometric terms. We are given $n$ Gaussian arms with unknown means and unit variance. Consider the $n$-dimensional Euclidean space $\mathbb{R}^n$, and a collection $\mathcal{O}$ of disjoint subsets. Our goal is to determine the subset in $\mathcal{O}$ that contains the $n$-dimensional vector of the means. The problem generalizes most pure exploration bandit problems studied in the literature. We provide the first nearly optimal sample complexity upper and lower bounds for the problem.

algorithm, big data, optimization problem, (17 more...)

1706.01081

Country:

Asia > China (0.14)
North America > United States (0.14)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)

arXiv.org Machine LearningOct-20-2010

Forest Density Estimation

Liu, Han, Xu, Min, Gu, Haijie, Gupta, Anupam, Lafferty, John, Wasserman, Larry

We study graph estimation and density estimation in high dimensions, using a family of density estimators based on forest structured undirected graphical models. For density estimation, we do not assume the true distribution corresponds to a forest; rather, we form kernel density estimates of the bivariate and univariate marginals, and apply Kruskal's algorithm to estimate the optimal forest on held out data. We prove an oracle inequality on the excess risk of the resulting estimator relative to the risk of the best forest. For graph estimation, we consider the problem of estimating forests with restricted tree sizes. We prove that finding a maximum weight spanning forest with restricted tree size is NP-hard, and develop an approximation algorithm for this problem. Viewing the tree size as a complexity parameter, we then select a forest using data splitting, and prove bounds on excess risk and structure selection consistency of the procedure. Experiments with simulated data and microarray data indicate that the methods are a practical alternative to Gaussian graphical models.

artificial intelligence, health & medicine, machine learning, (17 more...)

1001.1557

Country:

Asia (0.28)
North America > United States (0.14)
Africa > Nigeria (0.14)

Genre: Research Report (0.63)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Neural Information Processing SystemsDec-31-2008

Selecting Observations against Adversarial Objectives

Krause, Andreas, Mcmahan, Brendan, Guestrin, Carlos, Gupta, Anupam

In many applications, one has to actively select among a set of expensive observations beforemaking an informed decision. Often, we want to select observations which perform well when evaluated with an objective function chosen by an adversary.

algorithm, artificial intelligence, optimization problem, (17 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre: Research Report (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.49)