AITopics

2502.04998

Country:

Europe (0.28)
North America > United States (0.14)

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.88)
Information Technology > Data Science > Data Mining > Big Data (0.88)

arXiv.org Artificial IntelligenceJan-8-2025

A partition cover approach to tokenization

Lim, Jia Peng, Choo, Davin, Lauw, Hady W.

Tokenization is the process of encoding strings into tokens from a fixed vocabulary of size $k$ and is widely utilized in Natural Language Processing applications. The leading tokenization algorithm today is Byte Pair Encoding (BPE), which formulates the tokenization problem as a compression problem and tackles it by performing sequences of merges. In this work, we formulate tokenization as an optimization objective, show that it is NP-hard via a simple reduction from vertex cover, and propose a polynomial-time greedy algorithm GreedTok. Our formulation naturally relaxes to the well-studied weighted maximum coverage problem which has a simple $(1 - 1/e)$-approximation algorithm GreedWMC. Through empirical evaluations on real-world corpora, we show that GreedTok outperforms BPE, while achieving a comparable objective score as GreedWMC (which could have achieved a higher score due to relaxation).

greedtok, large language model, natural language, (19 more...)

2501.06246

Country: Asia (0.14)

Genre: Research Report (0.81)

Industry: Health & Medicine (0.67)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)

arXiv.org Machine LearningNov-21-2024

Learning multivariate Gaussians with imperfect advice

Bhattacharyya, Arnab, Choo, Davin, John, Philips George, Gouleakis, Themis

The problem of approximating an underlying distribution from its observed samples is a fundamental scientific problem. The distribution learning problem has been studied for more than a century in statistics, and it is the underlying engine for much of applied machine learning. The emphasis in modern applications is on highdimensional distributions, with the goal being to understand when one can escape the curse of dimensionality. The survey by [Dia16] gives an excellent overview of classical and modern techniques for distribution learning, especially when there is some underlying structure to be exploited. In this work, we investigate how to go beyond worst case sample complexities for learning distributions. We consider the situation where the algorithm is also given the aid of possibly imperfect advice regarding the input distribution. We position our study in the context of algorithms with predictions, where the usual problem input is supplemented by "predictions" or "advice" (potentially drawn from modern machine learning models) and the algorithm's goal is to incorporate the advice in a way that improves performance if the advice is of high quality, but if the advice is inaccurate, there should not be degradation below the performance in the no-advice setting. Most previous work in this setting are in the context of online algorithms, e.g. for the ski-rental problem [GP19, WLW20, ADJ

artificial intelligence, machine learning, probability, (17 more...)

2411.127

Country: Asia (0.28)

Genre: Research Report (0.63)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning in High Dimensional Spaces (0.34)

arXiv.org Machine LearningNov-12-2024

Probably approximately correct high-dimensional causal effect estimation given a valid adjustment set

Choo, Davin, Squires, Chandler, Bhattacharyya, Arnab, Sontag, David

Accurate estimates of causal effects play a key role in decision-making across applications such as healthcare, economics, and operations. In the absence of randomized experiments, a common approach to estimating causal effects uses \textit{covariate adjustment}. In this paper, we study covariate adjustment for discrete distributions from the PAC learning perspective, assuming knowledge of a valid adjustment set $\bZ$, which might be high-dimensional. Our first main result PAC-bounds the estimation error of covariate adjustment by a term that is exponential in the size of the adjustment set; it is known that such a dependency is unavoidable even if one only aims to minimize the mean squared error. Motivated by this result, we introduce the notion of an \emph{$\eps$-Markov blanket}, give bounds on the misspecification error of using such a set for covariate adjustment, and provide an algorithm for $\eps$-Markov blanket discovery; our second main result upper bounds the sample complexity of this algorithm. Furthermore, we provide a misspecification error bound and a constraint-based algorithm that allow us to go beyond $\eps$-Markov blankets to even smaller adjustment sets. Our third main result upper bounds the sample complexity of this algorithm, and our final result combines the first three into an overall PAC bound. Altogether, our results highlight that one does not need to perfectly recover causal structure in order to ensure accurate estimates of causal effects.

artificial intelligence, correct high-dimensional causal effect estimation, valid adjustment

2411.08141

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence (0.53)

arXiv.org Machine LearningMay-23-2024

Online bipartite matching with imperfect advice

Choo, Davin, Gouleakis, Themis, Ling, Chun Kai, Bhattacharyya, Arnab

We study the problem of online unweighted bipartite matching with $n$ offline vertices and $n$ online vertices where one wishes to be competitive against the optimal offline algorithm. While the classic RANKING algorithm of Karp et al. [1990] provably attains competitive ratio of $1-1/e > 1/2$, we show that no learning-augmented method can be both 1-consistent and strictly better than $1/2$-robust under the adversarial arrival model. Meanwhile, under the random arrival model, we show how one can utilize methods from distribution testing to design an algorithm that takes in external advice about the online vertices and provably achieves competitive ratio interpolating between any ratio attainable by advice-free methods and the optimal ratio of 1, depending on the advice quality.

algorithm, artificial intelligence, machine learning, (17 more...)

2405.09784

Genre: Research Report (0.82)

Industry: Marketing (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Machine LearningFeb-13-2024

Causal Discovery under Off-Target Interventions

Choo, Davin, Shiragur, Kirankumar, Uhler, Caroline

Causal graph discovery is a significant problem with applications across various disciplines. However, with observational data alone, the underlying causal graph can only be recovered up to its Markov equivalence class, and further assumptions or interventions are necessary to narrow down the true graph. This work addresses the causal discovery problem under the setting of stochastic interventions with the natural goal of minimizing the number of interventions performed. We propose the following stochastic intervention model which subsumes existing adaptive noiseless interventions in the literature while capturing scenarios such as fat-hand interventions and CRISPR gene knockouts: any intervention attempt results in an actual intervention on a random subset of vertices, drawn from a distribution dependent on attempted action. Under this model, we study the two fundamental problems in causal discovery of verification and search and provide approximation algorithms with polylogarithmic competitive ratios and provide some preliminary experimental results.

artificial intelligence, intervention, machine learning, (17 more...)

2402.08229

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)

arXiv.org Machine LearningOct-10-2023

Learning bounded-degree polytrees with known skeleton

Choo, Davin, Yang, Joy Qiping, Bhattacharyya, Arnab, Canonne, Clément L.

We establish finite-sample guarantees for efficient proper learning of bounded-degree polytrees, a rich class of high-dimensional probability distributions and a subclass of Bayesian networks, a widely-studied type of graphical model. Recently, Bhattacharyya et al. (2021) obtained finite-sample guarantees for recovering tree-structured Bayesian networks, i.e., 1-polytrees. We extend their results by providing an efficient algorithm which learns $d$-polytrees in polynomial time and sample complexity for any bounded $d$ when the underlying undirected graph (skeleton) is known. We complement our algorithm with an information-theoretic sample complexity lower bound, showing that the dependence on the dimension and target accuracy parameters are nearly tight.

artificial intelligence, learning bounded-degree polytree, skeleton

2310.06333

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (0.73)

arXiv.org Artificial IntelligenceJul-10-2023

Learning and Testing Latent-Tree Ising Models Efficiently

Choo, Davin, Dagan, Yuval, Daskalakis, Constantinos, Kandiros, Anthimos Vardis

We provide time- and sample-efficient algorithms for learning and testing latent-tree Ising models, i.e. Ising models that may only be observed at their leaf nodes. On the learning side, we obtain efficient algorithms for learning a tree-structured Ising model whose leaf node distribution is close in Total Variation Distance, improving on the results of prior work. On the testing side, we provide an efficient algorithm with fewer samples for testing whether two latent-tree Ising models have leaf-node distributions that are close or far in Total Variation distance. We obtain our algorithms by showing novel localization results for the total variation distance between the leaf-node distributions of tree-structured Ising models, in terms of their marginals on pairs of leaves.

artificial intelligence, machine learning, topology, (18 more...)

2211.13291

Country: Asia (0.14)

Genre:

Workflow (0.92)
Research Report (0.63)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

arXiv.org Artificial IntelligenceJun-9-2023

Adaptivity Complexity for Causal Graph Discovery

Choo, Davin, Shiragur, Kirankumar

Causal discovery from interventional data is an important problem, where the task is to design an interventional strategy that learns the hidden ground truth causal graph $G(V,E)$ on $|V| = n$ nodes while minimizing the number of performed interventions. Most prior interventional strategies broadly fall into two categories: non-adaptive and adaptive. Non-adaptive strategies decide on a single fixed set of interventions to be performed while adaptive strategies can decide on which nodes to intervene on sequentially based on past interventions. While adaptive algorithms may use exponentially fewer interventions than their non-adaptive counterparts, there are practical concerns that constrain the amount of adaptivity allowed. Motivated by this trade-off, we study the problem of $r$-adaptivity, where the algorithm designer recovers the causal graph under a total of $r$ sequential rounds whilst trying to minimize the total number of interventions. For this problem, we provide a $r$-adaptive algorithm that achieves $O(\min\{r,\log n\} \cdot n^{1/\min\{r,\log n\}})$ approximation with respect to the verification number, a well-known lower bound for adaptive algorithms. Furthermore, for every $r$, we show that our approximation is tight. Our definition of $r$-adaptivity interpolates nicely between the non-adaptive ($r=1$) and fully adaptive ($r=n$) settings where our approximation simplifies to $O(n)$ and $O(\log n)$ respectively, matching the best-known approximation guarantees for both extremes. Our results also extend naturally to the bounded size interventions.

artificial intelligence, intervention, machine learning, (18 more...)

2306.05781

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report > New Finding (0.48)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.46)

arXiv.org Artificial IntelligenceMay-31-2023

Active causal structure learning with advice

Choo, Davin, Gouleakis, Themis, Bhattacharyya, Arnab

We introduce the problem of active causal structure learning with advice. In the typical well-studied setting, the learning algorithm is given the essential graph for the observational distribution and is asked to recover the underlying causal directed acyclic graph (DAG) $G^*$ while minimizing the number of interventions made. In our setting, we are additionally given side information about $G^*$ as advice, e.g. a DAG $G$ purported to be $G^*$. We ask whether the learning algorithm can benefit from the advice when it is close to being correct, while still having worst-case guarantees even when the advice is arbitrarily bad. Our work is in the same space as the growing body of research on algorithms with predictions. When the advice is a DAG $G$, we design an adaptive search algorithm to recover $G^*$ whose intervention cost is at most $O(\max\{1, \log \psi\})$ times the cost for verifying $G^*$; here, $\psi$ is a distance measure between $G$ and $G^*$ that is upper bounded by the number of variables $n$, and is exactly 0 when $G=G^*$. Our approximation factor matches the state-of-the-art for the advice-less setting.

artificial intelligence, intervention, machine learning, (14 more...)

2305.19588

Country:

Asia (0.46)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.92)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)