AITopics

2502.11953

Country: Europe > United Kingdom (0.28)

Genre: Research Report (0.83)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.48)
Information Technology > Data Science > Data Mining > Big Data (0.36)

arXiv.org Machine LearningFeb-4-2025

An Information-Theoretic Analysis of Thompson Sampling with Infinite Action Spaces

Gouverneur, Amaury, Gálvez, Borja Rodriguez, Oechtering, Tobias, Skoglund, Mikael

This paper studies the Bayesian regret of the Thompson Sampling algorithm for bandit problems, building on the information-theoretic framework introduced by Russo and Van Roy (2015). Specifically, it extends the rate-distortion analysis of Dong and Van Roy (2018), which provides near-optimal bounds for linear bandits. A limitation of these results is the assumption of a finite action space. We address this by extending the analysis to settings with infinite and continuous action spaces. Additionally, we specialize our results to bandit problems with expected rewards that are Lipschitz continuous with respect to the action space, deriving a regret bound that explicitly accounts for the complexity of the action space.

artificial intelligence, data mining, machine learning, (17 more...)

2502.0214

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.58)

arXiv.org Artificial IntelligenceJan-19-2025

Reinforcement Learning Based Goodput Maximization with Quantized Feedback in URLLC

Celebi, Hasan Basri, Skoglund, Mikael

This paper presents a comprehensive system model for goodput maximization with quantized feedback in Ultra-Reliable Low-Latency Communication (URLLC), focusing on dynamic channel conditions and feedback schemes. The study investigates a communication system, where the receiver provides quantized channel state information to the transmitter. The system adapts its feedback scheme based on reinforcement learning, aiming to maximize goodput while accommodating varying channel statistics. We introduce a novel Rician-$K$ factor estimation technique to enable the communication system to optimize the feedback scheme. This dynamic approach increases the overall performance, making it well-suited for practical URLLC applications where channel statistics vary over time.

artificial intelligence, machine learning, reinforcement learning, (3 more...)

2501.1119

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)

arXiv.org Machine LearningDec-3-2024

An Information-Theoretic Analysis of Thompson Sampling for Logistic Bandits

Gouverneur, Amaury, Rodríguez-Gálvez, Borja, Oechtering, Tobias J., Skoglund, Mikael

We study the performance of the Thompson Sampling algorithm for logistic bandit problems, where the agent receives binary rewards with probabilities determined by a logistic function $\exp(\beta \langle a, \theta \rangle)/(1+\exp(\beta \langle a, \theta \rangle))$. We focus on the setting where the action $a$ and parameter $\theta$ lie within the $d$-dimensional unit ball with the action space encompassing the parameter space. Adopting the information-theoretic framework introduced by (Russo $\&$ Van Roy, 2015), we analyze the information ratio, which is defined as the ratio of the expected squared difference between the optimal and actual rewards to the mutual information between the optimal action and the reward. Improving upon previous results, we establish that the information ratio is bounded by $\tfrac{9}{2}d$. Notably, we obtain a regret bound in $O(d\sqrt{T \log(\beta T/d)})$ that depends only logarithmically on the parameter $\beta$.

artificial intelligence, information ratio, machine learning, (15 more...)

2412.02861

Country: Europe > Sweden (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Artificial IntelligenceOct-21-2024

Information-Theoretic Minimax Regret Bounds for Reinforcement Learning based on Duality

Bongole, Raghav, Gouverneur, Amaury, Rodríguez-Gálvez, Borja, Oechtering, Tobias J., Skoglund, Mikael

We study agents acting in an unknown environment where the agent's goal is to find a robust policy. We consider robust policies as policies that achieve high cumulative rewards for all possible environments. To this end, we consider agents minimizing the maximum regret over different environment parameters, leading to the study of minimax regret. This research focuses on deriving information-theoretic bounds for minimax regret in Markov Decision Processes (MDPs) with a finite time horizon. Building on concepts from supervised learning, such as minimum excess risk (MER) and minimax excess risk, we use recent bounds on the Bayesian regret to derive minimax regret bounds. Specifically, we establish minimax theorems and use bounds on the Bayesian regret to perform minimax regret analysis using these minimax theorems. Our contributions include defining a suitable minimax regret in the context of MDPs, finding information-theoretic bounds for it, and applying these bounds in various scenarios.

artificial intelligence, machine learning, minimax regret, (14 more...)

2410.16013

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Machine LearningJul-10-2024

A Coding-Theoretic Analysis of Hyperspherical Prototypical Learning Geometry

Lindström, Martin, Rodríguez-Gálvez, Borja, Thobaben, Ragnar, Skoglund, Mikael

Hyperspherical Prototypical Learning (HPL) is a supervised approach to representation learning that designs class prototypes on the unit hypersphere. The prototypes bias the representations to class separation in a scale invariant and known geometry. Previous approaches to HPL have either of the following shortcomings: (i) they follow an unprincipled optimisation procedure; or (ii) they are theoretically sound, but are constrained to only one possible latent dimension. In this paper, we address both shortcomings. To address (i), we present a principled optimisation procedure whose solution we show is optimal. To address (ii), we construct well-separated prototypes in a wide range of dimensions using linear block codes. Additionally, we give a full characterisation of the optimal prototype placement in terms of achievable and converse bounds, showing that our proposed methods are near-optimal.

artificial intelligence, machine learning, natural language, (17 more...)

2407.07664

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Austria > Vienna (0.14)

Genre: Research Report (0.64)

Industry: Education > Curriculum > Subject-Specific Education (0.42)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

arXiv.org Machine LearningMar-25-2024

A note on generalization bounds for losses with finite moments

Rodríguez-Gálvez, Borja, Rivasplata, Omar, Thobaben, Ragnar, Skoglund, Mikael

This paper studies the truncation method from Alquier [1] to derive high-probability PAC-Bayes bounds for unbounded losses with heavy tails. Assuming that the $p$-th moment is bounded, the resulting bounds interpolate between a slow rate $1 / \sqrt{n}$ when $p=2$, and a fast rate $1 / n$ when $p \to \infty$ and the loss is essentially bounded. Moreover, the paper derives a high-probability PAC-Bayes bound for losses with a bounded variance. This bound has an exponentially better dependence on the confidence parameter and the dependency measure than previous bounds in the literature. Finally, the paper extends all results to guarantees in expectation and single-draw PAC-Bayes. In order to so, it obtains analogues of the PAC-Bayes fast rate bound for bounded losses from [2] in these settings.

artificial intelligence, machine learning, pac-bayes, (18 more...)

2403.16681

Country: Europe > United Kingdom > England (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

arXiv.org Artificial IntelligenceMar-21-2024

Adaptive Coded Federated Learning: Privacy Preservation and Straggler Mitigation

Li, Chengxi, Xiao, Ming, Skoglund, Mikael

In this article, we address the problem of federated learning in the presence of stragglers. For this problem, a coded federated learning framework has been proposed, where the central server aggregates gradients received from the non-stragglers and gradient computed from a privacy-preservation global coded dataset to mitigate the negative impact of the stragglers. However, when aggregating these gradients, fixed weights are consistently applied across iterations, neglecting the generation process of the global coded dataset and the dynamic nature of the trained model over iterations. This oversight may result in diminished learning performance. To overcome this drawback, we propose a new method named adaptive coded federated learning (ACFL). In ACFL, before the training, each device uploads a coded local dataset with additive noise to the central server to generate a global coded dataset under privacy preservation requirements. During each iteration of the training, the central server aggregates the gradients received from the non-stragglers and the gradient computed from the global coded dataset, where an adaptive policy for varying the aggregation weights is designed. Under this policy, we optimize the performance in terms of privacy and learning, where the learning performance is analyzed through convergence analysis and the privacy performance is characterized via mutual information differential privacy. Finally, we perform simulations to demonstrate the superiority of ACFL compared with the non-adaptive methods.

artificial intelligence, dataset, machine learning, (15 more...)

2403.14905

Country: Europe > Sweden (0.14)

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

arXiv.org Artificial IntelligenceMar-19-2024

Distributed Learning based on 1-Bit Gradient Coding in the Presence of Stragglers

Li, Chengxi, Skoglund, Mikael

This paper considers the problem of distributed learning (DL) in the presence of stragglers. For this problem, DL methods based on gradient coding have been widely investigated, which redundantly distribute the training data to the workers to guarantee convergence when some workers are stragglers. However, these methods require the workers to transmit real-valued vectors during the process of learning, which induces very high communication burden. To overcome this drawback, we propose a novel DL method based on 1-bit gradient coding (1-bit GCDL), where 1-bit data encoded from the locally computed gradients are transmitted by the workers to reduce the communication overhead. We theoretically provide the convergence guarantees of the proposed method for both the convex loss functions and nonconvex loss functions. It is shown empirically that 1-bit GC-DL outperforms the baseline methods, which attains better learning performance under the same communication overhead.

artificial intelligence, communication overhead, machine learning, (13 more...)

2403.14716

Country: Europe > Sweden (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

arXiv.org Machine LearningMar-5-2024

Chained Information-Theoretic bounds and Tight Regret Rate for Linear Bandit Problems

Gouverneur, Amaury, Rodríguez-Gálvez, Borja, Oechtering, Tobias J., Skoglund, Mikael

Bandit problems are a class of decision problems in which an agent interacts sequentially with an unknown environment by choosing actions and earning rewards in return. The goal of the agent is to maximize its expected cumulative reward, which is the expected sum of rewards that it will earn throughout its interaction with the environment. This necessitates a delicate balance between the exploration of different actions to gather information for potential future rewards, and the exploitation of known actions to receive immediate gains. The theoretical study of the performance of an algorithm in a bandit problem is done by analyzing the expected regret, which is defined as the difference between the cumulative reward of the algorithm and the hypothetical cumulative reward that an oracle would obtain by choosing the optimal action at each time step. An effective method for achieving small regret is the Thomson Sampling (TS) algorithm [3], which, despite its simplicity, has shown remarkable performance [4, 5, 6]. Studying the Thomspon Sampling regret, [1] introduced the concept of information ratio, a statistic that captures the trade-off between the information gained by the algorithm about the environment and the immediate regret.

artificial intelligence, bandit problem, machine learning, (17 more...)

2403.03361

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Data Science > Data Mining > Big Data (0.84)