Markov Models
Probabilistic Formal Modelling to Uncover and Interpret Interaction Styles
Andrei, Oana, Calder, Muffy, Chalmers, Matthew, Morrison, Alistair
We present a study using new computational methods, based on a novel combination of machine learning for inferring admixture hidden Markov models and probabilistic model checking, to uncover interaction styles in a mobile app. These styles are then used to inform a redesign, which is implemented, deployed, and then analysed using the same methods. The data sets are logged user traces, collected over two six-month deployments of each version, involving thousands of users and segmented into different time intervals. The methods do not assume tasks or absolute metrics such as measures of engagement, but uncover the styles through unsupervised inference of clusters and analysis with probabilistic temporal logic. For both versions there was a clear distinction between the styles adopted by users during the first day/week/month of usage, and during the second and third months, a result we had not anticipated.
Exposure-Based Multi-Agent Inspection of a Tumbling Target Using Deep Reinforcement Learning
Aurand, Joshua, Cutlip, Steven, Lei, Henry, Lang, Kendra, Phillips, Sean
As space becomes more congested, on orbit inspection is an increasingly relevant activity whether to observe a defunct satellite for planning repairs or to de-orbit it. However, the task of on orbit inspection itself is challenging, typically requiring the careful coordination of multiple observer satellites. This is complicated by a highly nonlinear environment where the target may be unknown or moving unpredictably without time for continuous command and control from the ground. There is a need for autonomous, robust, decentralized solutions to the inspection task. To achieve this, we consider a hierarchical, learned approach for the decentralized planning of multi-agent inspection of a tumbling target. Our solution consists of two components: a viewpoint or high-level planner trained using deep reinforcement learning and a navigation planner handling point-to-point navigation between pre-specified viewpoints. We present a novel problem formulation and methodology that is suitable not only to reinforcement learning-derived robust policies, but extendable to unknown target geometries and higher fidelity information theoretic objectives received directly from sensor inputs. Operating under limited information, our trained multi-agent high-level policies successfully contextualize information within the global hierarchical environment and are correspondingly able to inspect over 90% of non-convex tumbling targets, even in the absence of additional agent attitude control.
Efficient Sensitivity Analysis for Parametric Robust Markov Chains
Badings, Thom, Junges, Sebastian, Marandi, Ahmadreza, Topcu, Ufuk, Jansen, Nils
We provide a novel method for sensitivity analysis of parametric robust Markov chains. These models incorporate parameters and sets of probability distributions to alleviate the often unrealistic assumption that precise probabilities are available. We measure sensitivity in terms of partial derivatives with respect to the uncertain transition probabilities regarding measures such as the expected reward. As our main contribution, we present an efficient method to compute these partial derivatives. To scale our approach to models with thousands of parameters, we present an extension of this method that selects the subset of $k$ parameters with the highest partial derivative. Our methods are based on linear programming and differentiating these programs around a given value for the parameters. The experiments show the applicability of our approach on models with over a million states and thousands of parameters. Moreover, we embed the results within an iterative learning scheme that profits from having access to a dedicated sensitivity analysis.
Offline RL for Natural Language Generation with Implicit Language Q Learning
Snell, Charlie, Kostrikov, Ilya, Su, Yi, Yang, Mengjiao, Levine, Sergey
Left: an abstract depiction of an MDP where single-step RL fails to discover the optimal policy. Right: A notional illustrative example where we might expect full "multi-step" RL methods (such as ILQL) to perform significantly better than "single-step" methods. In this example, good utterances tend to start with "The movie was...", while bad utterances start with "The movie wasn't..." However, the very best examples also start with "The movie wasn't...", requiring multi-step planning or multiple steps of policy improvement to derive effective strategies. Methods that implement just a single step of policy improvement will fail to produce maximally positive sentiment outputs. While this example may appear somewhat contrived, we see in our experiments that multi-step RL methods do lead to improvements in a number of more real settings. Since ILQL performs multiple steps of policy improvement, it can significantly improve over Monte Carlo estimators or single-step RL when the underlying data is sub-optimal. One example corresponds to the notional task in Figure 4, in which the optimal sequence of actions requires traversing a state that's also frequented by sub-optimal examples. In this case, single-step RL will learn to take actions that appear safer according to the dataset -- such as the transition "The movie" "was" in Figure 4 -- whereas full ("multi-step") RL methods would recover the optimal policy. We demonstrate this empirically on the Wordle game below.
Explanation through Reward Model Reconciliation using POMDP Tree Search
Kraske, Benjamin D., Saksena, Anshu, Buczak, Anna L., Sunberg, Zachary N.
As artificial intelligence (AI) algorithms are increasingly used in mission-critical applications, promoting user-trust of these systems will be essential to their success. Ensuring users understand the models over which algorithms reason promotes user trust. This work seeks to reconcile differences between the reward model that an algorithm uses for online partially observable Markov decision (POMDP) planning and the implicit reward model assumed by a human user. Action discrepancies, differences in decisions made by an algorithm and user, are leveraged to estimate a user's objectives as expressed in weightings of a reward function.
Exploring Segmentation Approaches for Neural Machine Translation of Code-Switched Egyptian Arabic-English Text
Gaser, Marwa, Mager, Manuel, Hamed, Injy, Habash, Nizar, Abdennadher, Slim, Vu, Ngoc Thang
Data sparsity is one of the main challenges posed by code-switching (CS), which is further exacerbated in the case of morphologically rich languages. For the task of machine translation (MT), morphological segmentation has proven successful in alleviating data sparsity in monolingual contexts; however, it has not been investigated for CS settings. In this paper, we study the effectiveness of different segmentation approaches on MT performance, covering morphology-based and frequency-based segmentation techniques. We experiment on MT from code-switched Arabic-English to English. We provide detailed analysis, examining a variety of conditions, such as data size and sentences with different degrees of CS. Empirical results show that morphology-aware segmenters perform the best in segmentation tasks but under-perform in MT. Nevertheless, we find that the choice of the segmentation setup to use for MT is highly dependent on the data size. For extreme low-resource scenarios, a combination of frequency and morphology-based segmentations is shown to perform the best. For more resourced settings, such a combination does not bring significant improvements over the use of frequency-based segmentation.
SRL-Assisted AFM: Generating Planar Unstructured Quadrilateral Meshes with Supervised and Reinforcement Learning-Assisted Advancing Front Method
Tong, Hua, Qian, Kuanren, Halilaj, Eni, Zhang, Yongjie Jessica
High-quality mesh generation is the foundation of accurate finite element analysis. Due to the vast interior vertices search space and complex initial boundaries, mesh generation for complicated domains requires substantial manual processing and has long been considered the most challenging and time-consuming bottleneck of the entire modeling and analysis process. In this paper, we present a novel computational framework named ``SRL-assisted AFM" for meshing planar geometries by combining the advancing front method with neural networks that select reference vertices and update the front boundary using ``policy networks." These deep neural networks are trained using a unique pipeline that combines supervised learning with reinforcement learning to iteratively improve mesh quality. First, we generate different initial boundaries by randomly sampling points in a square domain and connecting them sequentially. These boundaries are used for obtaining input meshes and extracting training datasets in the supervised learning module. We then iteratively improve the reinforcement learning model performance with reward functions designed for special requirements, such as improving the mesh quality and controlling the number and distribution of extraordinary points. Our proposed supervised learning neural networks achieve an accuracy higher than 98% on predicting commercial software. The final reinforcement learning neural networks automatically generate high-quality quadrilateral meshes for complex planar domains with sharp features and boundary layers.
Indexability of Finite State Restless Multi-Armed Bandit and Rollout Policy
Mittal, Vishesh, Meshram, Rahul, Dev, Deepak, Prakash, Surya
We consider finite state restless multi-armed bandit problem. The decision maker can act on M bandits out of N bandits in each time step. The play of arm (active arm) yields state dependent rewards based on action and when the arm is not played, it also provides rewards based on the state and action. The objective of the decision maker is to maximize the infinite horizon discounted reward. The classical approach to restless bandits is Whittle index policy. In such policy, the M arms with highest indices are played at each time step. Here, one decouples the restless bandits problem by analyzing relaxed constrained restless bandits problem. Then by Lagrangian relaxation problem, one decouples restless bandits problem into N single-armed restless bandit problems. We analyze the single-armed restless bandit. In order to study the Whittle index policy, we show structural results on the single armed bandit model. We define indexability and show indexability in special cases. We propose an alternative approach to verify the indexable criteria for a single armed bandit model using value iteration algorithm. We demonstrate the performance of our algorithm with different examples. We provide insight on condition of indexability of restless bandits using different structural assumptions on transition probability and reward matrices. We also study online rollout policy and discuss the computation complexity of algorithm and compare that with complexity of index computation. Numerical examples illustrate that index policy and rollout policy performs better than myopic policy.
Time series clustering based on prediction accuracy of global forecasting models
Oriona, Ángel López, Manso, Pablo Montero, Fernández, José Antonio Vilar
In this paper, a novel method to perform model-based clustering of time series is proposed. The procedure relies on two iterative steps: (i) K global forecasting models are fitted via pooling by considering the series pertaining to each cluster and (ii) each series is assigned to the group associated with the model producing the best forecasts according to a particular criterion. Unlike most techniques proposed in the literature, the method considers the predictive accuracy as the main element for constructing the clustering partition, which contains groups jointly minimizing the overall forecasting error. Thus, the approach leads to a new clustering paradigm where the quality of the clustering solution is measured in terms of its predictive capability. In addition, the procedure gives rise to an effective mechanism for selecting the number of clusters in a time series database and can be used in combination with any class of regression model. An extensive simulation study shows that our method outperforms several alternative techniques concerning both clustering effectiveness and predictive accuracy. The approach is also applied to perform clustering in several datasets used as standard benchmarks in the time series literature, obtaining great results.
New bootstrap tests for categorical time series. A comparative study
López-Oriona, Ángel, Fernández, José Antonio Vilar, D'Urso, Pierpaolo
The problem of testing the equality of the generating processes of two categorical time series is addressed in this work. To this aim, we propose three tests relying on a dissimilarity measure between categorical processes. Particular versions of these tests are constructed by considering three specific distances evaluating discrepancy between the marginal distributions and the serial dependence patterns of both processes. Proper estimates of these dissimilarities are an essential element of the constructed tests, which are based on the bootstrap. Specifically, a parametric bootstrap method assuming the true generating models and extensions of the moving blocks bootstrap and the stationary bootstrap are considered. The approaches are assessed in a broad simulation study including several types of categorical models with different degrees of complexity. Advantages and disadvantages of each one of the methods are properly discussed according to their behavior under the null and the alternative hypothesis. The impact that some important input parameters have on the results of the tests is also analyzed. An application involving biological sequences highlights the usefulness of the proposed techniques.