AITopics | Muehlebach, Michael

Collaborating Authors

Muehlebach, Michael

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Partially Observable Reinforcement Learning with Memory Traces

Eberhard, Onno, Muehlebach, Michael, Vernade, Claire

arXiv.org Artificial IntelligenceMar-19-2025

Partially observable environments present a considerable computational challenge in reinforcement learning due to the need to consider long histories. Learning with a finite window of observations quickly becomes intractable as the window length grows. In this work, we introduce memory traces. Inspired by eligibility traces, these are compact representations of the history of observations in the form of exponential moving averages. We prove sample complexity bounds for the problem of offline on-policy evaluation that quantify the value errors achieved with memory traces for the class of Lipschitz continuous value estimates. We establish a close connection to the window approach, and demonstrate that, in certain environments, learning with memory traces is significantly more sample efficient. Finally, we underline the effectiveness of memory traces empirically in online reinforcement learning experiments for both value prediction and control.

machine learning, memory trace, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2503.152

Country: Europe > Germany (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Quantization-Free Autoregressive Action Transformer

Sheebaelhamd, Ziyad, Tschannen, Michael, Muehlebach, Michael, Vernade, Claire

arXiv.org Artificial IntelligenceMar-18-2025

Psenka et al., 2023), which will be discussed in the next two paragraphs. Current transformer-based imitation learning approaches introduce discrete action representations Existing autoregressive policies, on the one hand, sidestep and train an autoregressive transformer decoder the challenge of learning in a continuous domain by discretizing on the resulting latent code. However, the initial the actions (Lee et al., 2024; Shafiullah et al., quantization breaks the continuous structure of the 2022). This discretization can introduce several drawbacks: action space thereby limiting the capabilities of It discards the inherent structure of the continuous space, the generative model. We propose a quantizationfree increases complexity by adding a separate quantization step, method instead that leverages Generative and may limit expressiveness or accuracy when fine-grained Infinite-Vocabulary Transformers (GIVT) as a direct, control is required.

artificial intelligence, machine learning, natural language, (12 more...)

arXiv.org Artificial Intelligence

2503.14259

Country: Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Decision-Dependent Stochastic Optimization: The Role of Distribution Dynamics

He, Zhiyu, Bolognani, Saverio, Dörfler, Florian, Muehlebach, Michael

arXiv.org Artificial IntelligenceMar-10-2025

Distribution shifts have long been regarded as troublesome external forces that a decision-maker should either counteract or conform to. An intriguing feedback phenomenon termed decision dependence arises when the deployed decision affects the environment and alters the data-generating distribution. In the realm of performative prediction, this is encoded by distribution maps parameterized by decisions due to strategic behaviors. In contrast, we formalize an endogenous distribution shift as a feedback process featuring nonlinear dynamics that couple the evolving distribution with the decision. Stochastic optimization in this dynamic regime provides a fertile ground to examine the various roles played by dynamics in the composite problem structure. To this end, we develop an online algorithm that achieves optimal decision-making by both adapting to and shaping the dynamic distribution. Throughout the paper, we adopt a distributional perspective and demonstrate how this view facilitates characterizations of distribution dynamics and the optimality and generalization performance of the proposed algorithm. We showcase the theoretical results in an opinion dynamics context, where an opportunistic party maximizes the affinity of a dynamic polarized population, and in a recommender system scenario, featuring performance optimization with discrete distributions in the probability simplex.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2503.07324

Country:

North America > United States (0.27)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (0.40)

Industry: Government > Regional Government (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.66)

Add feedback

Adversarial Training for Defense Against Label Poisoning Attacks

Bal, Melis Ilayda, Cevher, Volkan, Muehlebach, Michael

arXiv.org Artificial IntelligenceFeb-24-2025

As machine learning models grow in complexity and increasingly rely on publicly sourced data, such as the human-annotated labels used in training large language models, they become more vulnerable to label poisoning attacks. These attacks, in which adversaries subtly alter the labels within a training dataset, can severely degrade model performance, posing significant risks in critical applications. In this paper, we propose FLORAL, a novel adversarial training defense strategy based on support vector machines (SVMs) to counter these threats. Utilizing a bilevel optimization framework, we cast the training process as a non-zero-sum Stackelberg game between an attacker, who strategically poisons critical training labels, and the model, which seeks to recover from such attacks. Our approach accommodates various model architectures and employs a projected gradient descent algorithm with kernel SVMs for adversarial training. We provide a theoretical analysis of our algorithm's convergence properties and empirically evaluate FLORAL's effectiveness across diverse classification tasks. Compared to robust baselines and foundation models such as RoBERTa, FLORAL consistently achieves higher robust accuracy under increasing attacker budgets. These results underscore the potential of FLORAL to enhance the resilience of machine learning models against label poisoning threats, thereby ensuring robust classification in adversarial settings.

artificial intelligence, machine learning, optimization problem, (18 more...)

arXiv.org Artificial Intelligence

2502.17121

Country: Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

The Sample Complexity of Online Reinforcement Learning: A Multi-model Perspective

Muehlebach, Michael, He, Zhiyu, Jordan, Michael I.

arXiv.org Machine LearningJan-27-2025

We study the sample complexity of online reinforcement learning for nonlinear dynamical systems with continuous state and action spaces. Our analysis accommodates a large class of dynamical systems ranging from a finite set of nonlinear candidate models to models with bounded and Lipschitz continuous dynamics, to systems that are parametrized by a compact and real-valued set of parameters. In the most general setting, our algorithm achieves a policy regret of $\mathcal{O}(N \epsilon^2 + \mathrm{ln}(m(\epsilon))/\epsilon^2)$, where $N$ is the time horizon, $\epsilon$ is a user-specified discretization width, and $m(\epsilon)$ measures the complexity of the function class under consideration via its packing number. In the special case where the dynamics are parametrized by a compact and real-valued set of parameters (such as neural networks, transformers, etc.), we prove a policy regret of $\mathcal{O}(\sqrt{N p})$, where $p$ denotes the number of parameters, recovering earlier sample-complexity results that were derived for linear time-invariant dynamical systems. While this article focuses on characterizing sample complexity, the proposed algorithms are likely to be useful in practice, due to their simplicity, the ability to incorporate prior knowledge, and their benign transient behavior.

artificial intelligence, machine learning, online reinforcement learning, (2 more...)

arXiv.org Machine Learning

2501.1591

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.60)

Add feedback

Controlling Participation in Federated Learning with Feedback

Cummins, Michael, Er, Guner Dilsad, Muehlebach, Michael

arXiv.org Artificial IntelligenceNov-28-2024

We address the problem of client participation in federated learning, where traditional methods typically rely on a random selection of a small subset of clients for each training round. In contrast, we propose FedBack, a deterministic approach that leverages control-theoretic principles to manage client participation in ADMM-based federated learning. FedBack models client participation as a discrete-time dynamical system and employs an integral feedback controller to adjust each client's participation rate individually, based on the client's optimization dynamics. We provide global convergence guarantees for our approach by building on the recent federated learning research. Numerical experiments on federated image classification demonstrate that FedBack achieves up to 50\% improvement in communication and computational efficiency over algorithms that rely on a random selection of clients.

algorithm, artificial intelligence, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2411.19242

Country:

North America > United States (0.28)
Europe (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Conformal Generative Modeling with Improved Sample Efficiency through Sequential Greedy Filtering

Kladny, Klaus-Rudolf, Schölkopf, Bernhard, Muehlebach, Michael

arXiv.org Artificial IntelligenceOct-2-2024

Generative models lack rigorous statistical guarantees for their outputs and are therefore unreliable in safety-critical applications. In this work, we propose Sequential Conformal Prediction for Generative Models (SCOPE-Gen), a sequential conformal prediction method producing prediction sets that satisfy a rigorous statistical guarantee called conformal admissibility control. This guarantee states that with high probability, the prediction sets contain at least one admissible (or valid) example. To this end, our method first samples an initial set of i.i.d. examples from a black box generative model. Then, this set is iteratively pruned via so-called greedy filters. As a consequence of the iterative generation procedure, admissibility of the final prediction set factorizes as a Markov chain. This factorization is crucial, because it allows to control each factor separately, using conformal prediction. In comparison to prior work, our method demonstrates a large reduction in the number of admissibility evaluations during calibration. This reduction is important in safety-critical applications, where these evaluations must be conducted manually by domain experts and are therefore costly and time consuming. We highlight the advantages of our method in terms of admissibility evaluations and cardinality of the prediction sets through experiments in natural language generation and molecular graph extension tasks.

artificial intelligence, natural language, prediction, (15 more...)

arXiv.org Artificial Intelligence

2410.0166

Country:

North America > United States > Wisconsin (0.14)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)

Genre: Research Report (0.81)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Government > Military (1.00)
Health & Medicine > Therapeutic Area (0.95)
Health & Medicine > Diagnostic Medicine (0.68)

Technology: Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)

Add feedback

Bi-Level Motion Imitation for Humanoid Robots

Zhao, Wenshuai, Zhao, Yi, Pajarinen, Joni, Muehlebach, Michael

arXiv.org Artificial IntelligenceOct-2-2024

Imitation learning from human motion capture (MoCap) data provides a promising way to train humanoid robots. However, due to differences in morphology, such as varying degrees of joint freedom and force limits, exact replication of human behaviors may not be feasible for humanoid robots. Consequently, incorporating physically infeasible MoCap data in training datasets can adversely affect the performance of the robot policy. To address this issue, we propose a bi-level optimization-based imitation learning framework that alternates between optimizing both the robot policy and the target MoCap data. Specifically, we first develop a generative latent dynamics model using a novel self-consistent auto-encoder, which learns sparse and structured motion representations while capturing desired motion patterns in the dataset. The dynamics model is then utilized to generate reference motions while the latent representation regularizes the bi-level motion imitation process. Simulations conducted with a realistic model of a humanoid robot demonstrate that our method enhances the robot policy by modifying reference motions to be physically consistent.

artificial intelligence, machine learning, reference motion, (17 more...)

arXiv.org Artificial Intelligence

2410.01968

Country: Europe > Germany (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Robots > Humanoid Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.89)

Add feedback

Subgroup-Specific Risk-Controlled Dose Estimation in Radiotherapy

Fischer, Paul, Willms, Hannah, Schneider, Moritz, Thorwarth, Daniela, Muehlebach, Michael, Baumgartner, Christian F.

arXiv.org Artificial IntelligenceJul-11-2024

Cancer remains a leading cause of death, highlighting the importance of effective radiotherapy (RT). Magnetic resonance-guided linear accelerators (MR-Linacs) enable imaging during RT, allowing for inter-fraction, and perhaps even intra-fraction, adjustments of treatment plans. However, achieving this requires fast and accurate dose calculations. While Monte Carlo simulations offer accuracy, they are computationally intensive. Deep learning frameworks show promise, yet lack uncertainty quantification crucial for high-risk applications like RT. Risk-controlling prediction sets (RCPS) offer model-agnostic uncertainty quantification with mathematical guarantees. However, we show that naive application of RCPS may lead to only certain subgroups such as the image background being risk-controlled. In this work, we extend RCPS to provide prediction intervals with coverage guarantees for multiple subgroups with unknown subgroup membership at test time. We evaluate our algorithm on real clinical planing volumes from five different anatomical regions and show that our novel subgroup RCPS (SG-RCPS) algorithm leads to prediction intervals that jointly control the risk for multiple subgroups. In particular, our method controls the risk of the crucial voxels along the radiation beam significantly better than conventional RCPS.

artificial intelligence, machine learning, subgroup, (15 more...)

arXiv.org Artificial Intelligence

2407.08432

Country: Europe > Germany > Baden-Württemberg (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Nuclear Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

Add feedback

A Pontryagin Perspective on Reinforcement Learning

Eberhard, Onno, Vernade, Claire, Muehlebach, Michael

arXiv.org Artificial IntelligenceMay-28-2024

Reinforcement learning has traditionally focused on learning state-dependent policies to solve optimal control problems in a closed-loop fashion. In this work, we introduce the paradigm of open-loop reinforcement learning where a fixed action sequence is learned instead. We present three new algorithms: one robust model-based method and two sample-efficient model-free methods. Rather than basing our algorithms on Bellman's equation from dynamic programming, our work builds on Pontryagin's principle from the theory of open-loop optimal control. We provide convergence guarantees and evaluate all methods empirically on a pendulum swing-up task, as well as on two high-dimensional MuJoCo tasks, demonstrating remarkable performance compared to existing baselines.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2405.181

Country: Europe > Germany (0.14)

Genre: Research Report (0.64)

Industry: Energy > Renewable (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback