Markov Models
"Generative Models for Financial Time Series Data: Enhancing Signal-to-Noise Ratio and Addressing Data Scarcity in A-Share Market
The financial industry is increasingly seeking robust methods to address the challenges posed by data scarcity and low signal-to-noise ratios, which limit the application of deep learning techniques in stock market analysis. This paper presents two innovative generative model-based approaches to synthesize stock data, specifically tailored for different scenarios within the A-share market in China. The first method, a sector-based synthesis approach, enhances the signal-to-noise ratio of stock data by classifying the characteristics of stocks from various sectors in China's A-share market. This method employs an Approximate Non-Local Total Variation algorithm to smooth the generated data, a bandpass filtering method based on Fourier Transform to eliminate noise, and Denoising Diffusion Implicit Models to accelerate sampling speed. The second method, a recursive stock data synthesis approach based on pattern recognition, is designed to synthesize data for stocks with short listing periods and limited comparable companies. It leverages pattern recognition techniques and Markov models to learn and generate variable-length stock sequences, while introducing a sub-time-level data augmentation method to alleviate data scarcity issues.We validate the effectiveness of these methods through extensive experiments on various datasets, including those from the main board, STAR Market, Growth Enterprise Market Board, Beijing Stock Exchange, NASDAQ, NYSE, and AMEX. The results demonstrate that our synthesized data not only improve the performance of predictive models but also enhance the signal-to-noise ratio of individual stock signals in price trading strategies. Furthermore, the introduction of sub-time-level data significantly improves the quality of synthesized data.
Bridging the Gap: A Decade Review of Time-Series Clustering Methods
Paparrizos, John, Yang, Fan, Li, Haojun
Time series, as one of the most fundamental representations of sequential data, has been extensively studied across diverse disciplines, including computer science, biology, geology, astronomy, and environmental sciences. The advent of advanced sensing, storage, and networking technologies has resulted in high-dimensional time-series data, however, posing significant challenges for analyzing latent structures over extended temporal scales. Time-series clustering, an established unsupervised learning strategy that groups similar time series together, helps unveil hidden patterns in these complex datasets. In this survey, we trace the evolution of time-series clustering methods from classical approaches to recent advances in neural networks. While previous surveys have focused on specific methodological categories, we bridge the gap between traditional clustering methods and emerging deep learning-based algorithms, presenting a comprehensive, unified taxonomy for this research area. This survey highlights key developments and provides insights to guide future research in time-series clustering.
A Survey on Time-Series Distance Measures
Paparrizos, John, Li, Haojun, Yang, Fan, Wu, Kaize, d'Hondt, Jens E., Papapetrou, Odysseas
Distance measures have been recognized as one of the fundamental building blocks in time-series analysis tasks, e.g., querying, indexing, classification, clustering, anomaly detection, and similarity search. The vast proliferation of time-series data across a wide range of fields has increased the relevance of evaluating the effectiveness and efficiency of these distance measures. To provide a comprehensive view of this field, this work considers over 100 state-of-the-art distance measures, classified into 7 categories: lock-step measures, sliding measures, elastic measures, kernel measures, feature-based measures, model-based measures, and embedding measures. Beyond providing comprehensive mathematical frameworks, this work also delves into the distinctions and applications across these categories for both univariate and multivariate cases. By providing comprehensive collections and insights, this study paves the way for the future development of innovative time-series distance measures.
The intrinsic motivation of reinforcement and imitation learning for sequential tasks
This work in the field of developmental cognitive robotics aims to devise a new domain bridging between reinforcement learning and imitation learning, with a model of the intrinsic motivation for learning agents to learn with guidance from tutors multiple tasks, including sequential tasks. The main contribution has been to propose a common formulation of intrinsic motivation based on empirical progress for a learning agent to choose automatically its learning curriculum by actively choosing its learning strategy for simple or sequential tasks: which task to learn, between autonomous exploration or imitation learning, between low-level actions or task decomposition, between several tutors. The originality is to design a learner that benefits not only passively from data provided by tutors, but to actively choose when to request tutoring and what and whom to ask. The learner is thus more robust to the quality of the tutoring and learns faster with fewer demonstrations. We developed the framework of socially guided intrinsic motivation with machine learning algorithms to learn multiple tasks by taking advantage of the generalisability properties of human demonstrations in a passive manner or in an active manner through requests of demonstrations from the best tutor for simple and composing subtasks. The latter relies on a representation of subtask composition proposed for a construction process, which should be refined by representations used for observational processes of analysing human movements and activities of daily living. With the outlook of a language-like communication with the tutor, we investigated the emergence of a symbolic representation of the continuous sensorimotor space and of tasks using intrinsic motivation. We proposed within the reinforcement learning framework, a reward function for interacting with tutors for automatic curriculum learning in multi-task learning.
Competition Dynamics Shape Algorithmic Phases of In-Context Learning
Park, Core Francisco, Lubana, Ekdeep Singh, Pres, Itamar, Tanaka, Hidenori
In-Context Learning (ICL) has significantly expanded the general-purpose nature of large language models, allowing them to adapt to novel tasks using merely the inputted context. This has motivated a series of papers that analyze tractable synthetic domains and postulate precise mechanisms that may underlie ICL. However, the use of relatively distinct setups that often lack a sequence modeling nature to them makes it unclear how general the reported insights from such studies are. Motivated by this, we propose a synthetic sequence modeling task that involves learning to simulate a finite mixture of Markov chains. As we show, models trained on this task reproduce most well-known results on ICL, hence offering a unified setting for studying the concept. Building on this setup, we demonstrate we can explain a model's behavior by decomposing it into four broad algorithms that combine a fuzzy retrieval vs. inference approach with either unigram or bigram statistics of the context. These algorithms engage in a competition dynamics to dominate model behavior, with the precise experimental conditions dictating which algorithm ends up superseding others: e.g., we find merely varying context size or amount of training yields (at times sharp) transitions between which algorithm dictates the model behavior, revealing a mechanism that explains the transient nature of ICL. In this sense, we argue ICL is best thought of as a mixture of different algorithms, each with its own peculiarities, instead of a monolithic capability. This also implies that making general claims about ICL that hold universally across all settings may be infeasible.
Towards General Purpose Robots at Scale: Lifelong Learning and Learning to Use Memory
The widespread success of artificial intelligence in fields like natural language processing and computer vision has not yet fully transferred to robotics, where progress is hindered by the lack of large-scale training data and the complexity of real-world tasks. To address this, many robot learning researchers are pushing to get robots deployed at scale in everyday unstructured environments like our homes to initiate a data flywheel. While current robot learning systems are effective for certain short-horizon tasks, they are not designed to autonomously operate over long time horizons in unstructured environments. This thesis focuses on addressing two key challenges for robots operating over long time horizons: memory and lifelong learning. We propose two novel methods to advance these capabilities. First, we introduce t-DGR, a trajectory-based deep generative replay method that achieves state-of-the-art performance on Continual World benchmarks, advancing lifelong learning. Second, we develop a framework that leverages human demonstrations to teach agents effective memory utilization, improving learning efficiency and success rates on Memory Gym tasks. Finally, we discuss future directions for achieving the lifelong learning and memory capabilities necessary for robots to function at scale in real-world settings.
Reinforcement Learning Driven Multi-Robot Exploration via Explicit Communication and Density-Based Frontier Search
Calzolari, Gabriele, Sumathy, Vidya, Kanellakis, Christoforos, Nikolakopoulos, George
Collaborative multi-agent exploration of unknown environments is crucial for search and rescue operations. Effective real-world deployment must address challenges such as limited inter-agent communication and static and dynamic obstacles. This paper introduces a novel decentralized collaborative framework based on Reinforcement Learning to enhance multi-agent exploration in unknown environments. Our approach enables agents to decide their next action using an agent-centered field-of-view occupancy grid, and features extracted from $\text{A}^*$ algorithm-based trajectories to frontiers in the reconstructed global map. Furthermore, we propose a constrained communication scheme that enables agents to share their environmental knowledge efficiently, minimizing exploration redundancy. The decentralized nature of our framework ensures that each agent operates autonomously, while contributing to a collective exploration mission. Extensive simulations in Gymnasium and real-world experiments demonstrate the robustness and effectiveness of our system, while all the results highlight the benefits of combining autonomous exploration with inter-agent map sharing, advancing the development of scalable and resilient robotic exploration systems.
Bidding Games on Markov Decision Processes with Quantitative Reachability Objectives
Avni, Guy, Kureฤka, Martin, Mallik, Kaushik, Novotnรฝ, Petr, Sadhukhan, Suman
Graph games are fundamental in strategic reasoning of multi-agent systems and their environments. We study a new family of graph games which combine stochastic environmental uncertainties and auction-based interactions among the agents, formalized as bidding games on (finite) Markov decision processes (MDP). Normally, on MDPs, a single decision-maker chooses a sequence of actions, producing a probability distribution over infinite paths. In bidding games on MDPs, two players -- called the reachability and safety players -- bid for the privilege of choosing the next action at each step. The reachability player's goal is to maximize the probability of reaching a target vertex, whereas the safety player's goal is to minimize it. These games generalize traditional bidding games on graphs, and the existing analysis techniques do not extend. For instance, the central property of traditional bidding games is the existence of a threshold budget, which is a necessary and sufficient budget to guarantee winning for the reachability player. For MDPs, the threshold becomes a relation between the budgets and probabilities of reaching the target. We devise value-iteration algorithms that approximate thresholds and optimal policies for general MDPs, and compute the exact solutions for acyclic MDPs, and show that finding thresholds is at least as hard as solving simple-stochastic games.
Learning Human-Aware Robot Policies for Adaptive Assistance
Qin, Jason, Ban, Shikun, Zhu, Wentao, Wang, Yizhou, Samaras, Dimitris
Learning Human-A ware Robot Policies for Adaptive Assistance Jason Qin 1, Shikun Ban 2, Wentao Zhu 2, Yizhou Wang 2, and Dimitris Samaras 1 Abstract -- Developing robots that can assist humans efficiently, safely, and adaptively is crucial for real-world applications such as healthcare. While previous work often assumes a centralized system for co-optimizing human-robot interactions, we argue that real-world scenarios are much more complicated, as humans have individual preferences regarding how tasks are performed. However, to provide effective assistance, robots must still be able to recognize and adapt to the individual needs and preferences of different users. T o address these challenges, we propose a novel framework in which robots infer human intentions and reason about human utilities through interaction. Our approach features two critical modules: the anticipation module is a motion predictor that captures the spatial-temporal relationship between the robot agent and user agent, which contributes to predicting human behavior; the utility module infers the underlying human utility functions through progressive task demonstration sampling. Extensive experiments across various robot types and assistive tasks demonstrate that the proposed framework not only enhances task success and efficiency but also significantly improves user satisfaction, paving the way for more personalized and adaptive assistive robotic systems. Code and demos are available at https: //asonin.github.io/Human-Aware-Assistance/ . I. I NTRODUCTION Developing robots that understand and assist humans is a critical long-term goal in Artificial Intelligence (AI) research.
Efficient and Scalable Deep Reinforcement Learning for Mean Field Control Games
Mean Field Control Games (MFCGs) provide a powerful theoretical framework for analyzing systems of infinitely many interacting agents, blending elements from Mean Field Games (MFGs) and Mean Field Control (MFC). However, solving the coupled Hamilton-Jacobi-Bellman and Fokker-Planck equations that characterize MFCG equilibria remains a significant computational challenge, particularly in high-dimensional or complex environments. This paper presents a scalable deep Reinforcement Learning (RL) approach to approximate equilibrium solutions of MFCGs. Building on previous works, We reformulate the infinite-agent stochastic control problem as a Markov Decision Process, where each representative agent interacts with the evolving mean field distribution. We use the actor-critic based algorithm from a previous paper (Angiuli et.al., 2024) as the baseline and propose several versions of more scalable and efficient algorithms, utilizing techniques including parallel sample collection (batching); mini-batching; target network; proximal policy optimization (PPO); generalized advantage estimation (GAE); and entropy regularization. By leveraging these techniques, we effectively improved the efficiency, scalability, and training stability of the baseline algorithm. We evaluate our method on a linear-quadratic benchmark problem, where an analytical solution to the MFCG equilibrium is available. Our results show that some versions of our proposed approach achieve faster convergence and closely approximate the theoretical optimum, outperforming the baseline algorithm by an order of magnitude in sample efficiency. Our work lays the foundation for adapting deep RL to solve more complicated MFCGs closely related to real life, such as large-scale autonomous transportation systems, multi-firm economic competition, and inter-bank borrowing problems.