AITopics

2503.00924

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.68)

Industry: Transportation > Air (0.41)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)

arXiv.org Artificial IntelligenceFeb-3-2025

Memento No More: Coaching AI Agents to Master Multiple Tasks via Hints Internalization

Alakuijala, Minttu, Gao, Ya, Ananov, Georgy, Kaski, Samuel, Marttinen, Pekka, Ilin, Alexander, Valpola, Harri

As the general capabilities of artificial intelligence (AI) agents continue to evolve, their ability to learn to master multiple complex tasks through experience remains a key challenge. Current LLM agents, particularly those based on proprietary language models, typically rely on prompts to incorporate knowledge about the target tasks. This approach does not allow the agent to internalize this information and instead relies on ever-expanding prompts to sustain its functionality in diverse scenarios. This resembles a system of notes used by a person affected by anterograde amnesia, the inability to form new memories. In this paper, we propose a novel method to train AI agents to incorporate knowledge and skills for multiple tasks without the need for either cumbersome note systems or prior high-quality demonstration data. Our approach employs an iterative process where the agent collects new experiences, receives corrective feedback from humans in the form of hints, and integrates this feedback into its weights via a context distillation training procedure. We demonstrate the efficacy of our approach by implementing it in a Llama-3-based agent which, after only a few rounds of feedback, outperforms advanced models GPT-4o and DeepSeek-V3 in a taskset requiring correct sequencing of information retrieval, tool use, and question answering.

large language model, machine learning, natural language, (15 more...)

2502.01562

Country: Asia > Japan (0.14)

Genre:

Workflow (0.93)
Research Report > Promising Solution (0.34)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Machine LearningJan-2-2025

Amortized Bayesian Experimental Design for Decision-Making

Huang, Daolang, Guo, Yujia, Acerbi, Luigi, Kaski, Samuel

Many critical decisions, such as personalized medical diagnoses and product pricing, are made based on insights gained from designing, observing, and analyzing a series of experiments. This highlights the crucial role of experimental design, which goes beyond merely collecting information on system parameters as in traditional Bayesian experimental design (BED), but also plays a key part in facilitating downstream decision-making. Most recent BED methods use an amortized policy network to rapidly design experiments. However, the information gathered through these methods is suboptimal for down-the-line decision-making, as the experiments are not inherently designed with downstream objectives in mind. In this paper, we present an amortized decision-aware BED framework that prioritizes maximizing downstream decision utility. We introduce a novel architecture, the Transformer Neural Decision Process (TNDP), capable of instantly proposing the next experimental design, whilst inferring the downstream decision, thus effectively amortizing both tasks within a unified workflow. We demonstrate the performance of our method across several tasks, showing that it can deliver informative designs and facilitate accurate decision-making.

artificial intelligence, decision support system, machine learning, (17 more...)

2411.02064

Country:

Europe > Finland (0.14)
North America > United States (0.14)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Diagnostic Medicine (0.48)
Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Decision Support Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)
(3 more...)

arXiv.org Artificial IntelligenceDec-11-2024

Towards modeling evolving longitudinal health trajectories with a transformer-based deep learning model

Moen, Hans, Raj, Vishnu, Vabalas, Andrius, Perola, Markus, Kaski, Samuel, Ganna, Andrea, Marttinen, Pekka

Health registers contain rich information about individuals' health histories. Here our interest lies in understanding how individuals' health trajectories evolve in a nationwide longitudinal dataset with coded features, such as clinical codes, procedures, and drug purchases. We introduce a straightforward approach for training a Transformer-based deep learning model in a way that lets us analyze how individuals' trajectories change over time. This is achieved by modifying the training objective and by applying a causal attention mask. We focus here on a general task of predicting the onset of a range of common diseases in a given future forecast interval. However, instead of providing a single prediction about diagnoses that could occur in this forecast interval, our approach enable the model to provide continuous predictions at every time point up until, and conditioned on, the time of the forecast period. We find that this model performs comparably to other models, including a bi-directional transformer model, in terms of basic prediction performance while at the same time offering promising trajectory modeling properties. We explore a couple of ways to use this model for analyzing health trajectories and aiding in early detection of events that forecast possible later disease onsets. We hypothesize that this method may be helpful in continuous monitoring of peoples' health trajectories and enabling interventions in ongoing health trajectories, as well as being useful in retrospective analyses.

artificial intelligence, health trajectory, machine learning, (18 more...)

2412.08873

Country: North America > United States (0.28)

Genre: Research Report > Experimental Study (0.88)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Machine LearningNov-5-2024

Proxy-informed Bayesian transfer learning with unknown sources

Sloman, Sabina J., Martinelli, Julien, Kaski, Samuel

Generalization outside the scope of one's training data requires leveraging prior knowledge about the effects that transfer, and the effects that don't, between different data sources. Bayesian transfer learning is a principled paradigm for specifying this knowledge, and refining it on the basis of data from the source (training) and target (prediction) tasks. We address the challenging transfer learning setting where the learner (i) cannot fine-tune in the target task, and (ii) does not know which source data points correspond to the same task (i.e., the data sources are unknown). We propose a proxy-informed robust method for probabilistic transfer learning (PROMPT), which provides a posterior predictive estimate tailored to the structure of the target task, without requiring the learner have access to any outcome information from the target task. Instead, PROMPT relies on the availability of proxy information. PROMPT uses the same proxy information for two purposes: (i) estimation of effects specific to the target task, and (ii) construction of a robust reweighting of the source data for estimation of effects that transfer between tasks. We provide theoretical results on the effect of this reweighting on the risk of negative transfer, and demonstrate application of PROMPT in two synthetic settings.

artificial intelligence, information, machine learning, (20 more...)

2411.03263

Country:

Europe > Finland (0.14)
Europe > United Kingdom > England (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)

arXiv.org Machine LearningOct-20-2024

Amortized Probabilistic Conditioning for Optimization, Simulation and Inference

Chang, Paul E., Loka, Nasrulloh, Huang, Daolang, Remes, Ulpu, Kaski, Samuel, Acerbi, Luigi

Amortized meta-learning methods based on pre-training have propelled fields like natural language processing and vision. Transformer-based neural processes and their variants are leading models for probabilistic meta-learning with a tractable objective. Often trained on synthetic data, these models implicitly capture essential latent information in the data-generation process. However, existing methods do not allow users to flexibly inject (condition on) and extract (predict) this probabilistic latent information at runtime, which is key to many tasks. We introduce the Amortized Conditioning Engine (ACE), a new transformer-based meta-learning model that explicitly represents latent variables of interest. ACE affords conditioning on both observed data and interpretable latent variables, the inclusion of priors at runtime, and outputs predictive distributions for discrete and continuous data and latents. We show ACE's modeling flexibility and performance in diverse tasks such as image completion and classification, Bayesian optimization, and simulation-based inference.

experiment, machine learning, natural language, (16 more...)

2410.1532

Country:

North America > United States (0.28)
Europe > United Kingdom (0.28)

Genre:

Research Report > New Finding (0.67)
Research Report > Experimental Study (0.67)

Industry: Health & Medicine > Therapeutic Area (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

arXiv.org Artificial IntelligenceOct-15-2024

LoKO: Low-Rank Kalman Optimizer for Online Fine-Tuning of Large Models

Abdi, Hossein, Sun, Mingfei, Zhang, Andi, Kaski, Samuel, Pan, Wei

Training large models with millions or even billions of parameters from scratch incurs substantial computational costs. Parameter Efficient Fine-Tuning (PEFT) methods, particularly Low-Rank Adaptation (LoRA), address this challenge by adapting only a reduced number of parameters to specific tasks with gradient-based optimizers. In this paper, we cast PEFT as an optimal filtering/state estimation problem and present Low-Rank Kalman Optimizer (LoKO) to estimate the optimal trainable parameters in an online manner. We leverage the low-rank decomposition in LoRA to significantly reduce matrix sizes in Kalman iterations and further capitalize on a diagonal approximation of the covariance matrix to effectively decrease computational complexity from quadratic to linear in the number of trainable parameters. Moreover, we discovered that the initialization of the covariance matrix within the Kalman algorithm and the accurate estimation of the observation noise covariance are the keys in this formulation, and we propose robust approaches that work well across a vast range of well-established computer vision and language models. Our results show that LoKO converges with fewer iterations and yields better performance models compared to commonly used optimizers with LoRA in both image classifications and language tasks. Our study opens up the possibility of leveraging the Kalman filter as an effective optimizer for the online fine-tuning of large models.

arxiv preprint arxiv, machine learning, natural language, (18 more...)

2410.11551

Genre: Research Report > New Finding (1.00)

Industry: Education > Educational Setting > Online (0.68)

Technology:

Information Technology > Sensing and Signal Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

arXiv.org Machine LearningOct-10-2024

Identifying latent disease factors differently expressed in patient subgroups using group factor analysis

Ferreira, Fabio S., Ashburner, John, Bouzigues, Arabella, Suksasilp, Chatrin, Russell, Lucy L., Foster, Phoebe H., Ferry-Bolder, Eve, van Swieten, John C., Jiskoot, Lize C., Seelaar, Harro, Sanchez-Valle, Raquel, Laforce, Robert, Graff, Caroline, Galimberti, Daniela, Vandenberghe, Rik, de Mendonca, Alexandre, Tiraboschi, Pietro, Santana, Isabel, Gerhard, Alexander, Levin, Johannes, Sorbi, Sandro, Otto, Markus, Pasquier, Florence, Ducharme, Simon, Butler, Chris R., Ber, Isabelle Le, Finger, Elizabeth, Tartaglia, Maria C., Masellis, Mario, Rowe, James B., Synofzik, Matthis, Moreno, Fermin, Borroni, Barbara, Kaski, Samuel, Rohrer, Jonathan D., Mourao-Miranda, Janaina

The heterogeneity of neurological and mental health disorders has been a key confound to disease understanding, treatment development and outcome prediction, as patient populations are thought to include multiple disease pathways that selectively respond to treatment (Kapur et al., 2012). These challenges are reflected in poor treatment outcomes; for instance, in depression, approximately only 40% of patients remit after first-line antidepressant treatment or psychotherapy (Amick et al., 2015; Cuijpers et al., 2014; Fava and Davidson, 1996; Trivedi et al., 2006). Diagnostic categories in psychiatry have historically been defined based on signs and symptoms, prioritising diagnostic agreement between clinicians, rather than underlying biological mechanisms (Freedman et al., 2013; Robins and Guze, 1970). Resultingly, the usefulness of supervised machine learning methods as diagnostic tools for mental health disorders (i.e., classifying patients vs. healthy controls) is questionable, as they may simply inherit the flaws of current diagnostic categories. Additional challenges in neurological and mental health disorders are comorbidity (i.e., individuals with one disorder often develop another disorder during their lifespan) and that different disorders can share similar symptoms (Kessler et al., 2005). To address the limitations of current diagnostic categories in psychiatry, the National Institute of Mental Health launched the Research Domain Criteria framework (RDoC) in 2009 (https://www.nimh.nih.gov/research/ 2 research-funded-by-nimh/rdoc) as an attempt to move beyond diagnostic categories and ground psychiatry within neurobiological constructs that combine multiple levels of measures or sources of information (Insel et al., 2010). Multivariate methods, such as Canonical Correlation Analysis (CCA) and related methods, that do not rely on the diagnostic categories, have been widely used to uncover latent disease dimensions capturing associations between brain imaging and non-imaging data (e.g., self-report questionnaires, cognitive tests and genetics). The identified latent dimensions provide information on how a set of non-imaging features (e.g.

artificial intelligence, machine learning, university, (16 more...)

2410.0789

Country:

Europe > Portugal (0.70)
Europe > Italy (0.70)
North America > Canada > Ontario (0.69)
(2 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)
Government > Regional Government > North America Government > United States Government (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)

arXiv.org Machine LearningOct-10-2024

Cost-aware Simulation-based Inference

Bharti, Ayush, Huang, Daolang, Kaski, Samuel, Briol, François-Xavier

Simulation-based inference (SBI) is the preferred framework for estimating parameters of intractable models in science and engineering. A significant challenge in this context is the large computational cost of simulating data from complex models, and the fact that this cost often depends on parameter values. We therefore propose \textit{cost-aware SBI methods} which can significantly reduce the cost of existing sampling-based SBI methods, such as neural SBI and approximate Bayesian computation. This is achieved through a combination of rejection and self-normalised importance sampling, which significantly reduces the number of expensive simulations needed. Our approach is studied extensively on models from epidemiology to telecommunications engineering, where we obtain significant reductions in the overall cost of inference.

artificial intelligence, bayesian inference, machine learning, (18 more...)

2410.0793

Country: Europe > United Kingdom > England (0.14)

Genre: Research Report (0.50)

Industry: Health & Medicine > Epidemiology (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.35)

arXiv.org Artificial IntelligenceJul-7-2024

Open Ad Hoc Teamwork with Cooperative Game Theory

Wang, Jianhong, Li, Yang, Zhang, Yuan, Pan, Wei, Kaski, Samuel

Ad hoc teamwork poses a challenging problem, requiring the design of an agent to collaborate with teammates without prior coordination or joint training. Open ad hoc teamwork (OAHT) further complicates this challenge by considering environments with a changing number of teammates, referred to as open teams. One promising solution in practice to this problem is leveraging the generalizability of graph neural networks to handle an unrestricted number of agents with various agent-types, named graph-based policy learning (GPL). However, its joint Q-value representation over a coordination graph lacks convincing explanations. In this paper, we establish a new theory to understand the representation of the joint Q-value for OAHT and its learning paradigm, through the lens of cooperative game theory. Building on our theory, we propose a novel algorithm named CIAO, based on GPL's framework, with additional provable implementation tricks that can facilitate learning. The demos of experimental results are available on https://sites.google.com/view/ciao2024, and the code of experiments is published on https://github.com/hsvgbkhgbv/CIAO.

artificial intelligence, deep learning, machine learning, (16 more...)

2402.15259

Country:

North America > United States (0.92)
Europe > Austria > Vienna (0.14)

Genre:

Research Report > New Finding (0.67)
Research Report > Promising Solution (0.48)

Industry:

Leisure & Entertainment > Games (0.71)
Leisure & Entertainment > Sports > Soccer (0.46)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.48)