AITopics

2202.00792

Country:

North America > Canada > Ontario > Toronto (0.14)
Asia > Middle East > Jordan (0.04)
North America > United States > New York (0.04)
North America > United States > Indiana > St. Joseph County > Notre Dame (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.71)
Health & Medicine > Therapeutic Area > Immunology (0.71)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Park, Hongju, Faradonbeh, Mohamad Kazem Shirani

Efficient Algorithms for Learning to Control Bandits with Unobserved Contexts

arXiv.org Machine LearningFeb-1-2022

Contextual bandits are commonly used for sequential decision-making with finitely many control actions. In this setting, available context observations can be utilized in a tractable way, thanks to the linearity of the relationship between the reward and the context vectors. The arms provide rewards depending on the contexts that represent their individual characteristics. The range of real-world applications is notably extensive, including personalized recommendations for Mobile Context-Aware Recommender Systems and mobile-health interventions [1, 2, 3]. To get satisfactory performances in bandits, the exploration-exploitation trade-off must be addressed. The theoretical analysis of efficient policies for the multi-armed bandits goes back to algorithms that decide based on Upper-Confident-Bounds (UCB) [4]. In fact, UCB employs an optimistic approximate of the unknown reward based on the history of observations, to allow an appropriate degree of exploration. Further theoretical results for UCB in contextual bandits, as well as in other settings, are available in the literature [5, 6, 7, 8, 9]. Posterior sampling is another ubiquitous reinforcement learning algorithm that effectively balances exploitation versus exploration.

algorithm, bandit, posterior, (11 more...)

2202.00867

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.40)

Industry: Health & Medicine (0.89)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.92)
Information Technology > Data Science > Data Mining > Big Data (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Stankevičius, Lukas, Lukoševičius, Mantas, Kapočiūtė-Dzikienė, Jurgita, Briedienė, Monika, Krilavičius, Tomas

Correcting diacritics and typos with ByT5 transformer model

arXiv.org Machine LearningJan-31-2022

Due to the fast pace of life and online communications, the prevalence of English and the QWERTY keyboard, people tend to forgo using diacritics, make typographical errors (typos) when typing. Restoring diacritics and correcting spelling is important for proper language use and disambiguation of texts for both humans and downstream algorithms. However, both of these problems are typically addressed separately, i.e., state-of-the-art diacritics restoration methods do not tolerate other typos. In this work, we tackle both problems at once by employing newly-developed ByT5 byte-level transformer models. Our simultaneous diacritics restoration and typos correction approach demonstrates near state-of-the-art performance in 13 languages, reaching >96% of the alpha-word accuracy. We also perform diacritics restoration alone on 12 benchmark datasets with the additional one for the Lithuanian language. The experimental investigation proves that our approach is able to achieve comparable results (>98%) to previously reported despite being trained on fewer data. Our approach is also able to restore diacritics in words not seen during training with >76% accuracy. We also show the accuracies to further improve with longer training. All this shows a great real-world application potential of our suggested methods to more data, languages, and error classes.

computational linguistic, diacritic restoration, proceedings, (14 more...)

2201.13242

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > New York > New York County > New York City (0.14)
North America > Dominican Republic (0.04)
(26 more...)

Genre: Research Report > New Finding (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(4 more...)

Majumdar, Anirudha, Pacelli, Vincent

Fundamental Performance Limits for Sensor-Based Robot Control and Policy Learning

arXiv.org Artificial IntelligenceJan-31-2022

Our goal is to develop theory and algorithms for establishing fundamental limits on performance for a given task imposed by a robot's sensors. In order to achieve this, we define a quantity that captures the amount of task-relevant information provided by a sensor. Using a novel version of the generalized Fano inequality from information theory, we demonstrate that this quantity provides an upper bound on the highest achievable expected reward for one-step decision making tasks. We then extend this bound to multi-step problems via a dynamic programming approach. We present algorithms for numerically computing the resulting bounds, and demonstrate our approach on three examples: (i) the lava problem from the literature on partially observable Markov decision processes, (ii) an example with continuous state and observation spaces corresponding to a robot catching a freely-falling object, and (iii) obstacle avoidance using a depth sensor with non-Gaussian noise. We demonstrate the ability of our approach to establish strong limits on achievable performance for these problems by comparing our upper bounds with achievable lower bounds (computed by synthesizing or learning concrete control policies).

inequality, robot, sensor, (16 more...)

2202.00129

Country:

North America > United States > New Jersey > Mercer County > Princeton (0.04)
North America > United States > Massachusetts (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(2 more...)

Genre: Research Report (0.50)

Industry: Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Matthews, Alexander G. D. G., Arbel, Michael, Rezende, Danilo J., Doucet, Arnaud

Continual Repeated Annealed Flow Transport Monte Carlo

arXiv.org Machine LearningJan-31-2022

We propose Continual Repeated Annealed Flow Transport Monte Carlo (CRAFT), a method that combines a sequential Monte Carlo (SMC) sampler (itself a generalization of Annealed Importance Sampling) with variational inference using normalizing flows. The normalizing flows are directly trained to transport between annealing temperatures using a KL divergence for each transition. This optimization objective is itself estimated using the normalizing flow/SMC approximation. We show conceptually and using multiple empirical examples that CRAFT improves on Annealed Flow Transport Monte Carlo (Arbel et al., 2021), on which it builds and also on Markov chain Monte Carlo (MCMC) based Stochastic Normalizing Flows (Wu et al., 2020). By incorporating CRAFT within particle MCMC, we show that such learnt samplers can achieve impressively accurate results on a challenging lattice field theory example.

algorithm, monte carlo, sampler, (14 more...)

2201.13117

Country: Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

arXiv.org Artificial IntelligenceJan-30-2022

Potential Destination Prediction Based on Knowledge Graph Under Low Predictability Data Condition

Li, Guilong, Chen, Yixian, Liao, Qionghua, He, Zhaocheng

Destination prediction has been a critical topic in transportation research, and there are a large number of studies. However, almost all existing studies are based on high predictability data conditions while pay less attention to the data condition with low predictability, where the regularity of single individuals is not exposed. Based on a certain period of observation, there is a fact that individuals may choose destinations beyond observation, which we call "potential destinations". The number of potential destinations is very large and can't be ignored for the data condition with low predictability formed by short-term observation.To reveal the choice pattern of potential destination of individuals under the data condition with low predictability, we propose a global optimization method based on knowledge graph embedding. First, we joint the trip data of all individuals by constructing Trip Knowledge Graph(TKG). Next, we optimize the general algorithm of knowledge graph embedding for our data and task in training strategy and objective function, then implement it on TKG. It can achieve global optimization for association paths that exist between almost any two entities in TKG. On this basis, a method for potential destination prediction is proposed, giving the possible ranking of unobserved destinations for each individual. In addition, we improve the performance by fusing static statistical information that is not passed to TKG. Finally, we validate our method in a real-world dataset, and the prediction results are highly consistent with individuals' potential destination choice behaviour.

artificial intelligence, machine learning, relation, (18 more...)

2201.12845

Country:

Asia > China > Guangdong Province > Guangzhou (0.04)
Asia > Japan > Honshū > Chūbu > Toyama Prefecture > Toyama (0.04)

Genre: Research Report (1.00)

Industry: Transportation > Infrastructure & Services (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

arXiv.org Artificial IntelligenceJan-29-2022

Learning to Coordinate with Humans using Action Features

Ma, Mingwei, Liu, Jizhou, Sokota, Samuel, Kleiman-Weiner, Max, Foerster, Jakob

An unaddressed challenge in human-AI coordination is to enable AI agents to exploit the semantic relationships between the features of actions and the features of observations. Humans take advantage of these relationships in highly intuitive ways. For instance, in the absence of a shared language, we might point to the object we desire or hold up our fingers to indicate how many objects we want. To address this challenge, we investigate the effect of network architecture on the propensity of learning algorithms to exploit these semantic relationships. Across a procedurally generated coordination task, we find that attention-based architectures that jointly process a featurized representation of observations and actions have a better inductive bias for zero-shot coordination. Through fine-grained evaluation and scenario analysis, we show that the resulting policies are human-interpretable. Moreover, such agents coordinate with people without training on any human data.

agent, architecture, guesser, (15 more...)

2201.12658

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report > New Finding (0.67)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

arXiv.org Artificial IntelligenceJan-28-2022

Do You Need the Entropy Reward (in Practice)?

Yu, Haonan, Zhang, Haichao, Xu, Wei

Maximum entropy (MaxEnt) RL maximizes a combination of the original task reward and an entropy reward. It is believed that the regularization imposed by entropy, on both policy improvement and policy evaluation, together contributes to good exploration, training convergence, and robustness of learned policies. This paper takes a closer look at entropy as an intrinsic reward, by conducting various ablation studies on soft actor-critic (SAC), a popular representative of MaxEnt RL. Our findings reveal that in general, entropy rewards should be applied with caution to policy evaluation. On one hand, the entropy reward, like any other intrinsic reward, could obscure the main task reward if it is not properly managed. We identify some failure cases of the entropy reward especially in episodic Markov decision processes (MDPs), where it could cause the policy to be overly optimistic or pessimistic. On the other hand, our large-scale empirical study shows that using entropy regularization alone in policy improvement, leads to comparable or even better performance and robustness than using it in both policy improvement and policy evaluation. Based on these observations, we recommend either normalizing the entropy reward to a zero mean (SACZero), or simply removing it from policy evaluation (SACLite) for better practical results.

entropy reward, environment step, saclite, (13 more...)

2201.12434

Country: North America > United States > California > Santa Clara County > Cupertino (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Leisure & Entertainment (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Scholl, Philipp, Dietrich, Felix, Otte, Clemens, Udluft, Steffen

Safe Policy Improvement Approaches on Discrete Markov Decision Processes

arXiv.org Artificial IntelligenceJan-28-2022

Safe Policy Improvement (SPI) aims at provable guarantees that a learned policy is at least approximately as good as a given baseline policy. Building on SPI with Soft Baseline Bootstrapping (Soft-SPIBB) by Nadjahi et al., we identify theoretical issues in their approach, provide a corrected theory, and derive a new algorithm that is provably safe on finite Markov Decision Processes (MDP). Additionally, we provide a heuristic algorithm that exhibits the best performance among many state of the art SPI algorithms on two different benchmarks. Furthermore, we introduce a taxonomy of SPI algorithms and empirically show an interesting property of two classes of SPI algorithms: while the mean performance of algorithms that incorporate the uncertainty as a penalty on the action-value is higher, actively restricting the set of policies more consistently produces good policies and is, thus, safer.

algorithm, safe policy improvement approach, state-action pair, (12 more...)

2201.12175

Country:

Europe > Germany > Bavaria > Upper Bavaria > Munich (0.05)
North America > United States > Massachusetts (0.04)
North America > United States > California > Los Angeles County > Santa Monica (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.86)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.69)

Charotia, Himanshi, Garg, Abhishek, Dhama, Gaurav, Maheshwari, Naman

Dynamic Temporal Reconciliation by Reinforcement learning

arXiv.org Artificial IntelligenceJan-28-2022

Planning based on long and short term time series forecasts is a common practice across many industries. In this context, temporal aggregation and reconciliation techniques have been useful in improving forecasts, reducing model uncertainty, and providing a coherent forecast across different time horizons. However, an underlying assumption spanning all these techniques is the complete availability of data across all levels of the temporal hierarchy, while this offers mathematical convenience but most of the time low frequency data is partially completed and it is not available while forecasting. On the other hand, high frequency data can significantly change in a scenario like the COVID pandemic and this change can be used to improve forecasts that will otherwise significantly diverge from long term actuals. We propose a dynamic reconciliation method whereby we formulate the problem of informing low frequency forecasts based on high frequency actuals as a Markov Decision Process (MDP) allowing for the fact that we do not have complete information about the dynamics of the process. This allows us to have the best long term estimates based on the most recent data available even if the low frequency cycles have only been partially completed. The MDP has been solved using a Time Differenced Reinforcement learning (TDRL) approach with customizable actions and improves the long terms forecasts dramatically as compared to relying solely on historical low frequency data. The result also underscores the fact that while low frequency forecasts can improve the high frequency forecasts as mentioned in the temporal reconciliation literature (based on the assumption that low frequency forecasts have lower noise to signal ratio) the high frequency forecasts can also be used to inform the low frequency forecasts.

forecast, frequency forecast, reinforcement, (12 more...)

2201.11964

Country: Asia > India (0.04)

Genre: Research Report (0.40)

Industry:

Energy (0.69)
Banking & Finance > Trading (0.47)
Leisure & Entertainment > Games (0.46)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)