AITopics

2502.19308

Country:

North America > United States > Oregon > Benton County > Corvallis (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(5 more...)

Genre: Research Report (0.83)

Industry:

Food & Agriculture > Agriculture (1.00)
Consumer Products & Services > Food, Beverage, Tobacco & Cannabis > Beverages (1.00)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

arXiv.org Machine LearningFeb-26-2025

Enhancing Gradient-based Discrete Sampling via Parallel Tempering

Liang, Luxu, Jia, Yuhang, Zhou, Feng

While gradient-based discrete samplers are effective in sampling from complex distributions, they are susceptible to getting trapped in local minima, particularly in high-dimensional, multimodal discrete distributions, owing to the discontinuities inherent in these landscapes. To circumvent this issue, we combine parallel tempering, also known as replica exchange, with the discrete Langevin proposal and develop the Parallel Tempering enhanced Discrete Langevin Proposal (PTDLP), which are simulated at a series of temperatures. Significant energy differences prompt sample swaps, which are governed by a Metropolis criterion specifically designed for discrete sampling to ensure detailed balance is maintained. Additionally, we introduce an automatic scheme to determine the optimal temperature schedule and the number of chains, ensuring adaptability across diverse tasks with minimal tuning. Theoretically, we establish that our algorithm converges non-asymptotically to the target energy and exhibits faster mixing compared to a single chain. Empirical results further emphasize the superiority of our method in sampling from complex, multimodal discrete distributions, including synthetic problems, restricted Boltzmann machines, and deep energy-based models.

algorithm, enhancing gradient-based discrete sampling, equation, (12 more...)

arXiv.org Machine Learning

2502.1924

Country:

Asia > China (0.04)
North America > Canada > British Columbia (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.49)

Farias, Erick da Silva, Junior, Eduardo Palhares

Application of Attention Mechanism with Bidirectional Long Short-Term Memory (BiLSTM) and CNN for Human Conflict Detection using Computer Vision

The automatic detection of human conflicts through videos is a crucial area in computer vision, with significant applications in monitoring and public safety policies. However, the scarcity of public datasets and the complexity of human interactions make this task challenging. This study investigates the integration of advanced deep learning techniques, including Attention Mechanism, Convolutional Neural Networks (CNNs), and Bidirectional Long ShortTerm Memory (BiLSTM), to improve the detection of violent behaviors in videos. The research explores how the use of the attention mechanism can help focus on the most relevant parts of the video, enhancing the accuracy and robustness of the model. The experiments indicate that the combination of CNNs with BiLSTM and the attention mechanism provides a promising solution for conflict monitoring, offering insights into the effectiveness of different strategies. This work opens new possibilities for the development of automated surveillance systems that can operate more efficiently in real-time detection of violent events.

accuracy, attention mechanism, experiment, (13 more...)

2502.18555

Country:

South America > Brazil > Amazonas > Manaus (0.04)
North America > United States > California > San Diego County > San Diego (0.04)

Genre: Research Report (0.70)

Industry: Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.66)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Sequeira, Pedro, Sadhu, Vidyasagar, Gervasio, Melinda

ToMCAT: Theory-of-Mind for Cooperative Agents in Teams via Multiagent Diffusion Policies

In this paper we present ToMCAT (Theory-of-Mind for Cooperative Agents in Teams), a new framework for generating ToM-conditioned trajectories. It combines a meta-learning mechanism, that performs ToM reasoning over teammates' underlying goals and future behavior, with a multiagent denoising-diffusion model, that generates plans for an agent and its teammates conditioned on both the agent's goals and its teammates' characteristics, as computed via ToM. We implemented an online planning system that dynamically samples new trajectories (replans) from the diffusion model whenever it detects a divergence between a previously generated plan and the current state of the world. We conducted several experiments using ToMCAT in a simulated cooking domain. Our results highlight the importance of the dynamic replanning mechanism in reducing the usage of resources without sacrificing team performance. We also show that recent observations about the world and teammates' behavior collected by an agent over the course of an episode combined with ToM inferences are crucial to generate team-aware plans for dynamic adaptation to teammates, especially when no prior information is provided about them.

agent, teammate, trajectory, (15 more...)

2502.18438

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Switzerland (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(3 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry:

Education (0.46)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Main, James C. A., Randour, Mickael

Mixing Any Cocktail with Limited Ingredients: On the Structure of Payoff Sets in Multi-Objective MDPs and its Impact on Randomised Strategies

We consider multi-dimensional payoff functions in Markov decision processes, and ask whether a given expected payoff vector can be achieved or not. In general, pure strategies (i.e., not resorting to randomisation) do not suffice for this problem. We study the structure of the set of expected payoff vectors of all strategies given a multi-dimensional payoff function and its consequences regarding randomisation requirements for strategies. In particular, we prove that for any payoff for which the expectation is well-defined under all strategies, it is sufficient to mix (i.e., randomly select a pure strategy at the start of a play and committing to it for the rest of the play) finitely many pure strategies to approximate any expected payoff vector up to any precision. Furthermore, for any payoff for which the expected payoff is finite under all strategies, any expected payoff can be obtained exactly by mixing finitely many strategies.

convex combination, payoff, pure strategy, (16 more...)

2502.18296

Country:

Europe > Austria > Vienna (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
(5 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Causal AI-based Root Cause Identification: Research to Practice at Scale

Jha, Saurabh, Rahane, Ameet, Shwartz, Laura, Palaci-Olgun, Marc, Bagehorn, Frank, Rios, Jesus, Stingaciu, Dan, Kattinakere, Ragu, Banerjee, Debasish

Modern applications are increasingly built as vast, intricate, distributed systems. These systems comprise various software modules, often developed by different teams using different programming languages and deployed across hundreds to thousands of machines, sometimes spanning multiple data centers. Given the ir scale and complexity, these applications are often designed to tolerate failures and performance issues through inbuilt failure recovery techniques (e.g., hardware or software redundancy) or extern al methods (e.g., health check - based restarts). Computer systems experience frequent failures despite every effort: performance degradations and violations of reliability and K ey Performance Indicators (K PI s) are inevitable. These failures, depending on their nature, can lead to catastrophic incidents impacting critical systems and customers. Swift and accurate root cause identification is thus essential to avert significant incidents impacting both service quality and end users. In this complex landscape, observability platforms that provide deep insights into system behavior and help identify performance bottlenecks are not just helpful -- they are essential for maintaining reliability, ensuring optimal performance, and quickly resolving issues in production. The ability to reason a bout these systems in real - time is critical to ensuring the scalability and stability of modern services. To aid in these investigations, observability platforms that collect various telemetry data t o inform about application behavior and its underlying infrastructure are getting popular .

instana, probability, request type, (15 more...)

2502.1824

Country: Europe > Switzerland > Zürich > Zürich (0.04)

Genre: Research Report (0.81)

Industry: Information Technology > Services (0.34)

Technology:

Information Technology > Cloud Computing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.93)
Information Technology > Artificial Intelligence > Natural Language (0.67)
(3 more...)

Klioui, Sifal, Sellami, Sana, Trardi, Youssef

Patient Trajectory Prediction: Integrating Clinical Notes with Transformers

Keywords: Trajectory prediction, Transformers, Knowledge integration, Deep learning Abstract: Predicting disease trajectories from electronic health records (EHRs) is a complex task due to major challenges such as data non-stationarity, high granularity of medical codes, and integration of multimodal data. EHRs contain both structured data, such as diagnostic codes, and unstructured data, such as clinical notes, which hold essential information often overlooked. Current models, primarily based on structured data, struggle to capture the complete medical context of patients, resulting in a loss of valuable information. To address this issue, we propose an approach that integrates unstructured clinical notes into transformer-based deep learning models for sequential disease prediction. Experiments on MIMIC-IV datasets demonstrate that the proposed approach outperforms traditional models relying solely on structured data. 1 INTRODUCTION In healthcare, the exponential growth of Electronic Health Records (EHRs) has revolutionized patient care while posing new challenges. Healthcare professionals now frequently interact with medical records spanning several decades, having to process and analyze this vast amount of information to make informed decisions about patients' future health status. This evolution has accelerated the development of automated systems to predict future diagnoses from past medical data, thus becoming a key element of personalized and proactive medicine (Figure 1). Machine learning techniques, particularly deep learning, have seen increasing growth in medicine (Egger et al., 2022), thanks to their adaptability and good results.

clinical note, information, representation, (15 more...)

2502.18009

Country:

Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.05)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
Asia > Singapore (0.04)
Asia > Indonesia > Bali (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Health Care Technology > Medical Record (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Jeong, So Won, Rockova, Veronika

From Small to Large Language Models: Revisiting the Federalist Papers

arXiv.org Machine LearningFeb-25-2025

For a long time, the authorship of the Federalist Papers had been a subject of inquiry and debate, not only by linguists and historians but also by statisticians. In what was arguably the first Bayesian case study, Mosteller and Wallace (1963) provided the first statistical evidence for attributing all disputed papers to Madison. Our paper revisits this historical dataset but from a lens of modern language models, both small and large. We review some of the more popular Large Language Model (LLM) tools and examine them from a statistical point of view in the context of text classification. We investigate whether, without any attempt to fine-tune, the general embedding constructs can be useful for stylometry and attribution. We explain differences between various word/phrase embeddings and discuss how to aggregate them in a document. Contrary to our expectations, we exemplify that dimension expansion with word embeddings may not always be beneficial for attribution relative to dimension reduction with topic embeddings. Our experiments demonstrate that default LLM embeddings (even after manual fine-tuning) may not consistently improve authorship attribution accuracy. Instead, Bayesian analysis with topic embeddings trained on ``function words" yields superior out-of-sample classification performance. This suggests that traditional (small) statistical language models, with their interpretability and solid theoretical foundation, can offer significant advantages in authorship attribution tasks. The code used in this analysis is available at github.com/sowonjeong/slm-to-llm

language model, probability, representation, (14 more...)

arXiv.org Machine Learning

2503.01869

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > New Mexico > Santa Fe County > Santa Fe (0.04)
Asia > Middle East > Jordan (0.04)
Asia > India > Bihar > Patna (0.04)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)
(3 more...)

arXiv.org Machine LearningFeb-24-2025

On Quantile Regression Forests for Modelling Mixed-Frequency and Longitudinal Data

Andreani, Mila

The aim of this thesis is to extend the applications of the Quantile Regression Forest (QRF) algorithm to handle mixed-frequency and longitudinal data. To this end, standard statistical approaches have been exploited to build two novel algorithms: the Mixed- Frequency Quantile Regression Forest (MIDAS-QRF) and the Finite Mixture Quantile Regression Forest (FM-QRF). The MIDAS-QRF combines the flexibility of QRF with the Mixed Data Sampling (MIDAS) approach, enabling non-parametric quantile estimation with variables observed at different frequencies. FM-QRF, on the other hand, extends random effects machine learning algorithms to a QR framework, allowing for conditional quantile estimation in a longitudinal data setting. The contributions of this dissertation lie both methodologically and empirically. Methodologically, the MIDAS-QRF and the FM-QRF represent two novel approaches for handling mixed-frequency and longitudinal data in QR machine learning framework. Empirically, the application of the proposed models in financial risk management and climate-change impact evaluation demonstrates their validity as accurate and flexible models to be applied in complex empirical settings.

artificial intelligence, covariate, machine learning, (14 more...)

arXiv.org Machine Learning

2502.17137

Country:

Europe > United Kingdom (0.46)
Asia (0.45)
North America > United States > Texas (0.13)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Energy > Oil & Gas > Upstream (1.00)
Energy > Oil & Gas > Trading (1.00)
Banking & Finance > Trading (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.45)

arXiv.org Artificial IntelligenceFeb-24-2025

Genetics-Driven Personalized Disease Progression Model

Yang, Haoyu, Dey, Sanjoy, Meyer, Pablo

Modeling disease progression through multiple stages is critical for clinical decision-making for chronic diseases, e.g., cancer, diabetes, chronic kidney diseases, and so on. Existing approaches often model the disease progression as a uniform trajectory pattern at the population level. However, chronic diseases are highly heterogeneous and often have multiple progression patterns depending on a patient's individual genetics and environmental effects due to lifestyles. We propose a personalized disease progression model to jointly learn the heterogeneous progression patterns and groups of genetic profiles. In particular, an end-to-end pipeline is designed to simultaneously infer the characteristics of patients from genetic markers using a variational autoencoder and how it drives the disease progressions using an RNN-based state-space model based on clinical observations. Our proposed model shows improvement on real-world and synthetic clinical data.

clinical observation, progression, progression model, (14 more...)

2503.00028

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Minnesota (0.04)
Europe > Denmark (0.04)
(2 more...)

Genre: Research Report > Experimental Study (0.66)

Industry:

Health & Medicine > Therapeutic Area > Nephrology (1.00)
Health & Medicine > Epidemiology (1.00)
Government > Regional Government > North America Government > United States Government (0.93)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Biomedical Informatics (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)