AITopics

Activity schedules, which represent the activities and associated travel behaviours of individuals, are a core component of many applied models in the transport, energy and epidemiology domains. Our data driven approach learns human preferences and scheduling logic without the need for complex interacting combinations of sub-models and custom-rules, this makes our approach significantly faster and simpler to operate that existing approaches. We find activity schedule data combines aspects of both continuous image data and also discrete text data, requiring novel approaches. We additionally contribute a novel schedule representation and comprehensive evaluation framework for generated schedules. Evaluation shows our approach is able to rapidly generate large, diverse and realistic synthetic samples of activity schedules.

activity schedule, artificial intelligence, machine learning, (18 more...)

2501.10221

Country:

Europe > United Kingdom (0.14)
Europe > France > Hauts-de-France > Nord > Lille (0.04)
North America > United States > Texas (0.04)

Genre: Research Report (0.84)

Industry: Health & Medicine (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

A Survey on LLM Test-Time Compute via Search: Tasks, LLM Profiling, Search Algorithms, and Relevant Frameworks

Li, Xinzhe

LLM test-time compute (or LLM inference) via search has emerged as a promising research area with rapid developments. However, current frameworks often adopt distinct perspectives on three key aspects (task definition, LLM profiling, and search procedures), making direct comparisons challenging. Moreover, the search algorithms employed often diverge from standard implementations, and their specific characteristics are not thoroughly specified. In this survey, we provide a comprehensive technical review that unifies task definitions and provides modular definitions of LLM profiling and search procedures. The definitions enable precise comparisons of various LLM inference frameworks while highlighting their departures from conventional search algorithms. We also discuss the applicability, performance, and efficiency of these methods. For further details and ongoing updates, please refer to our GitHub repository: https://github.com/xinzhel/LLM-Agent-Survey/blob/main/search.md

large language model, machine learning, natural language, (19 more...)

2501.10069

Country:

North America > Canada > Ontario > Toronto (0.04)
Europe > Italy > Tuscany > Florence (0.04)
Europe > Germany > Berlin (0.04)
(5 more...)

Genre:

Workflow (1.00)
Research Report (1.00)
Overview (1.00)

Industry:

Leisure & Entertainment > Games (1.00)
Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Li, Chenhao, Krause, Andreas, Hutter, Marco

Robotic World Model: A Neural Network Simulator for Robust Policy Optimization in Robotics

Learning robust and generalizable world models is crucial for enabling efficient and scalable robotic control in real-world environments. In this work, we introduce a novel framework for learning world models that accurately capture complex, partially observable, and stochastic dynamics. The proposed method employs a dual-autoregressive mechanism and self-supervised training to achieve reliable long-horizon predictions without relying on domain-specific inductive biases, ensuring adaptability across diverse robotic tasks. We further propose a policy optimization framework that leverages world models for efficient training in imagined environments and seamless deployment in real-world systems. Through extensive experiments, our approach consistently outperforms state-of-the-art methods, demonstrating superior autoregressive prediction accuracy, robustness to noise, and generalization across manipulation and locomotion tasks. Notably, policies trained with our method are successfully deployed on ANYmal D hardware in a zero-shot transfer, achieving robust performance with minimal sim-to-real performance loss. This work advances model-based reinforcement learning by addressing the challenges of long-horizon prediction, error accumulation, and sim-to-real transfer. By providing a scalable and robust framework, the introduced methods pave the way for adaptive and efficient robotic systems in real-world applications.

machine learning, prediction, reinforcement learning, (16 more...)

2501.101

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report > Promising Solution (0.87)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(3 more...)

ColorGrid: A Multi-Agent Non-Stationary Environment for Goal Inference and Assistance

Risukhin, Andrey, Rao, Kavel, Caffee, Ben, Fan, Alan

Autonomous agents' interactions with humans are increasingly focused on adapting to their changing preferences in order to improve assistance in real-world tasks. Effective agents must learn to accurately infer human goals, which are often hidden, to collaborate well. However, existing Multi-Agent Reinforcement Learning (MARL) environments lack the necessary attributes required to rigorously evaluate these agents' learning capabilities. To this end, we introduce ColorGrid, a novel MARL environment with customizable non-stationarity, asymmetry, and reward structure. We investigate the performance of Independent Proximal Policy Optimization (IPPO), a state-of-the-art (SOTA) MARL algorithm, in ColorGrid and find through extensive ablations that, particularly with simultaneous non-stationary and asymmetric goals between a ``leader'' agent representing a human and a ``follower'' assistant agent, ColorGrid is unsolved by IPPO. To support benchmarking future MARL algorithms, we release our environment code, model checkpoints, and trajectory visualizations at https://github.com/andreyrisukhin/ColorGrid.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

2501.10593

Country: North America > United States > Georgia > Fulton County > Atlanta (0.04)

Genre: Research Report (0.83)

Industry:

Education (0.47)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

Rozada, Sergio, Orejuela, Jose Luis, Marques, Antonio G.

Solving Finite-Horizon MDPs via Low-Rank Tensors

We study the problem of learning optimal policies in finite-horizon Markov Decision Processes (MDPs) using low-rank reinforcement learning (RL) methods. In finite-horizon MDPs, the policies, and therefore the value functions (VFs) are not stationary. This aggravates the challenges of high-dimensional MDPs, as they suffer from the curse of dimensionality and high sample complexity. To address these issues, we propose modeling the VFs of finite-horizon MDPs as low-rank tensors, enabling a scalable representation that renders the problem of learning optimal policies tractable. We introduce an optimization-based framework for solving the Bellman equations with low-rank constraints, along with block-coordinate descent (BCD) and block-coordinate gradient descent (BCGD) algorithms, both with theoretical convergence guarantees. For scenarios where the system dynamics are unknown, we adapt the proposed BCGD method to estimate the VFs using sampled trajectories. Numerical experiments further demonstrate that the proposed framework reduces computational demands in controlled synthetic scenarios and more realistic resource allocation problems.

machine learning, reinforcement learning, tensor, (19 more...)

2501.10598

Country:

Europe > Spain > Galicia > Madrid (0.04)
Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)
North America > United States (0.04)
Africa > Senegal > Kolda Region > Kolda (0.04)

Genre: Research Report (0.64)

Industry: Energy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Neural Information Processing SystemsJan-16-2025, 22:58:11 GMT

Masked Prediction: A Parameter Identifiability View

The vast majority of work in self-supervised learning have focused on assessing recovered features by a chosen set of downstream tasks. While there are several commonly used benchmark datasets, this lens of feature learning requires assumptions on the downstream tasks which are not inherent to the data distribution itself. In this paper, we present an alternative lens, one of parameter identifiability: assuming data comes from a parametric probabilistic model, we train a self-supervised learning predictor with a suitable parametric form, and ask whether the parameters of the optimal predictor can be used to extract the parameters of the ground truth generative model.Specifically, we focus on latent-variable models capturing sequential structures, namely Hidden Markov Models with both discrete and conditionally Gaussian observations. We focus on masked prediction as the self-supervised learning task and study the optimal masked predictor. We show that parameter identifiability is governed by the task difficulty, which is determined by the choice of data model and the amount of tokens to predict.

masked prediction, parameter identifiability view, predictor, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Donato, Enrico, Preti, Matteo Lo, Beccai, Lucia, Falotico, Egidio

Sensorimotor Control Strategies for Tactile Robotics

arXiv.org Artificial IntelligenceJan-16-2025

Physical contacts are at the base of each embodied interaction. As for living beings, also robots continuously establish diverse contacts to fulfill their tasks. Over the last decades, one of the bold goals of robotics research has been to provide artificial agents with dexterity and adaptability - typical of biological systems - while interacting with their surroundings. Despite the huge work and the excellent outputs in this field, such capabilities still require hard refinements and studies to be fully delivered on our robots. The scientific contribution to this objective builds upon three pillars: the design of an appropriate embodiment - concerning its morphology, actuation strategy, and sensing technology; feature extraction algorithms from tactile signals to build a perception model of the experience; closed-loop robot control strategies that drive robot decisions according to either raw tactile feedback or perceptual representations.

artificial intelligence, machine learning, sensor, (19 more...)

2501.09468

Country:

Europe (0.28)
Asia > Singapore (0.14)

Genre: Research Report (0.50)

Industry:

Health & Medicine (0.93)
Energy > Oil & Gas > Upstream (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots > Manipulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.93)

arXiv.org Artificial IntelligenceJan-16-2025

NS-Gym: Open-Source Simulation Environments and Benchmarks for Non-Stationary Markov Decision Processes

Keplinger, Nathaniel S., Luo, Baiting, Bektas, Iliyas, Zhang, Yunuo, Wray, Kyle Hollins, Laszka, Aron, Dubey, Abhishek, Mukhopadhyay, Ayan

In many real-world applications, agents must make sequential decisions in environments where conditions are subject to change due to various exogenous factors. These non-stationary environments pose significant challenges to traditional decision-making models, which typically assume stationary dynamics. Non-stationary Markov decision processes (NS-MDPs) offer a framework to model and solve decision problems under such changing conditions. However, the lack of standardized benchmarks and simulation tools has hindered systematic evaluation and advance in this field. We present NS-Gym, the first simulation toolkit designed explicitly for NS-MDPs, integrated within the popular Gymnasium framework. In NS-Gym, we segregate the evolution of the environmental parameters that characterize non-stationarity from the agent's decision-making module, allowing for modular and flexible adaptations to dynamic environments. We review prior work in this domain and present a toolkit encapsulating key problem characteristics and types in NS-MDPs. This toolkit is the first effort to develop a set of standardized interfaces and benchmark problems to enable consistent and reproducible evaluation of algorithms under non-stationary conditions. We also benchmark six algorithmic approaches from prior work on NS-MDPs using NS-Gym. Our vision is that NS-Gym will enable researchers to assess the adaptability and robustness of their decision-making algorithms to non-stationary conditions.

agent, experiment, pamct, (15 more...)

2501.09646

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
North America > United States > Tennessee > Davidson County > Nashville (0.04)
North America > United States > Pennsylvania > Centre County > University Park (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Workflow (0.93)

Industry: Leisure & Entertainment > Games > Computer Games (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)

arXiv.org Machine LearningJan-16-2025

Reward-Guided Controlled Generation for Inference-Time Alignment in Diffusion Models: Tutorial and Review

Uehara, Masatoshi, Zhao, Yulai, Wang, Chenyu, Li, Xiner, Regev, Aviv, Levine, Sergey, Biancalani, Tommaso

This tutorial provides an in-depth guide on inference-time guidance and alignment methods for optimizing downstream reward functions in diffusion models. While diffusion models are renowned for their generative modeling capabilities, practical applications in fields such as biology often require sample generation that maximizes specific metrics (e.g., stability, affinity in proteins, closeness to target structures). In these scenarios, diffusion models can be adapted not only to generate realistic samples but also to explicitly maximize desired measures at inference time without fine-tuning. This tutorial explores the foundational aspects of such inference-time algorithms. We review these methods from a unified perspective, demonstrating that current techniques -- such as Sequential Monte Carlo (SMC)-based guidance, value-based sampling, and classifier guidance -- aim to approximate soft optimal denoising processes (a.k.a. policies in RL) that combine pre-trained denoising processes with value functions serving as look-ahead functions that predict from intermediate states to terminal rewards. Within this framework, we present several novel algorithms not yet covered in the literature. Furthermore, we discuss (1) fine-tuning methods combined with inference-time techniques, (2) inference-time algorithms based on search algorithms such as Monte Carlo tree search, which have received limited attention in current research, and (3) connections between inference-time algorithms in language models and diffusion models. The code of this tutorial on protein design is available at https://github.com/masa-ue/AlignInversePro

arxiv preprint arxiv, diffusion model, value function, (13 more...)

arXiv.org Machine Learning

2501.09685

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Texas (0.04)
Asia > Middle East > Jordan (0.04)
Asia > Japan (0.04)

Genre: Instructional Material > Course Syllabus & Notes (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)
(3 more...)

Neural Information Processing SystemsJan-15-2025, 16:35:49 GMT

Spectral Learning of Large Structured HMMs for Comparative Epigenomics

We develop a latent variable model and an efficient spectral algorithm motivated by the recent emergence of very large data sets of chromatin marks from multiple human cell types. A natural model for chromatin data in one cell type is a Hidden Markov Model (HMM); we model the relationship between multiple cell types by connecting their hidden states by a fixed tree of known structure. The main challenge with learning parameters of such models is that iterative methods such as EM are very slow, while naive spectral methods result in time and space complexity exponential in the number of cell types. We exploit properties of the tree structure of the hidden states to provide spectral algorithms that are more computationally efficient for current biological datasets. We provide sample complexity bounds for our algorithm and evaluate it experimentally on biological data from nine human cell types.

cell type, comparative epigenomic, spectral learning, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)