AITopics

2111.07395

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Italy > Emilia-Romagna > Metropolitan City of Bologna > Bologna (0.04)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

arXiv.org Machine LearningNov-13-2021

Nystr\"{o}m Regularization for Time Series Forecasting

Sun, Zirui, Dai, Mingwei, Wang, Yao, Lin, Shao-Bo

This paper focuses on learning rate analysis of Nystr\"{o}m regularization with sequential sub-sampling for $\tau$-mixing time series. Using a recently developed Banach-valued Bernstein inequality for $\tau$-mixing sequences and an integral operator approach based on second-order decomposition, we succeed in deriving almost optimal learning rates of Nystr\"{o}m regularization with sequential sub-sampling for $\tau$-mixing time series. A series of numerical experiments are carried out to verify our theoretical results, showing the excellent learning performance of Nystr\"{o}m regularization with sequential sub-sampling in learning massive time series data. All these results extend the applicable range of Nystr\"{o}m regularization from i.i.d. samples to non-i.i.d. sequences.

artificial intelligence, machine learning, nyström regularization, (15 more...)

2111.07109

Country:

North America > United States (0.67)
Asia > China (0.28)

Genre: Research Report > New Finding (0.88)

Industry: Energy > Oil & Gas (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Chennakesavalu, Shriram, Rotskoff, Grant M.

Cooperative multi-agent reinforcement learning for high-dimensional nonequilibrium control

arXiv.org Machine LearningNov-12-2021

Experimental advances enabling high-resolution external control create new opportunities to produce materials with exotic properties. In this work, we investigate how a multi-agent reinforcement learning approach can be used to design external control protocols for self-assembly. We find that a fully decentralized approach performs remarkably well even with a "coarse" level of external control. More importantly, we see that a partially decentralized approach, where we include information about the local environment allows us to better control our system towards some target distribution. We explain this by analyzing our approach as a partially-observed Markov decision process. With a partially decentralized approach, the agent is able to act more presciently, both by preventing the formation of undesirable structures and by better stabilizing target structures as compared to a fully decentralized approach.

information, protocol, reinforcement, (15 more...)

2111.06875

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
North America > United States > California > Santa Clara County > Palo Alto (0.05)
Europe > Sweden > Stockholm > Stockholm (0.05)
North America > United States > California > Los Angeles County > Long Beach (0.04)

Genre: Research Report (0.50)

Industry: Energy (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.35)

Shi, Chengchun, Uehara, Masatoshi, Jiang, Nan

A Minimax Learning Approach to Off-Policy Evaluation in Partially Observable Markov Decision Processes

arXiv.org Machine LearningNov-12-2021

We consider off-policy evaluation (OPE) in Partially Observable Markov Decision Processes (POMDPs), where the evaluation policy depends only on observable variables and the behavior policy depends on unobservable latent variables. Existing works either assume no unmeasured confounders, or focus on settings where both the observation and the state spaces are tabular. As such, these methods suffer from either a large bias in the presence of unmeasured confounders, or a large variance in settings with continuous or large observation/state spaces. In this work, we first propose novel identification methods for OPE in POMDPs with latent confounders, by introducing bridge functions that link the target policy's value and the observed data distribution. In fully-observable MDPs, these bridge functions reduce to the familiar value functions and marginal density ratios between the evaluation and the behavior policies. We next propose minimax estimation methods for learning these bridge functions. Our proposal permits general function approximation and is thus applicable to settings with continuous or large observation/state spaces. Finally, we construct three estimators based on these estimated bridge functions, corresponding to a value function-based estimator, a marginalized importance sampling estimator, and a doubly-robust estimator. Their nonasymptotic and asymptotic properties are investigated in detail.

bridge function, estimator, weight bridge function, (14 more...)

2111.06784

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Illinois > Champaign County > Urbana (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

arXiv.org Artificial IntelligenceNov-12-2021

Obstacle Avoidance for UAS in Continuous Action Space Using Deep Reinforcement Learning

Hu, Jueming, Yang, Xuxi, Wang, Weichang, Wei, Peng, Ying, Lei, Liu, Yongming

Obstacle avoidance for small unmanned aircraft is vital for the safety of future urban air mobility (UAM) and Unmanned Aircraft System (UAS) Traffic Management (UTM). There are many techniques for real-time robust drone guidance, but many of them solve in discretized airspace and control, which would require an additional path smoothing step to provide flexible commands for UAS. To provide a safe and efficient computational guidance of operations for unmanned aircraft, we explore the use of a deep reinforcement learning algorithm based on Proximal Policy Optimization (PPO) to guide autonomous UAS to their destinations while avoiding obstacles through continuous control. The proposed scenario state representation and reward function can map the continuous state space to continuous control for both heading angle and speed. To verify the performance of the proposed learning framework, we conducted numerical experiments with static and moving obstacles. Uncertainties associated with the environments and safety operation bounds are investigated in detail. Results show that the proposed model can provide accurate and robust guidance and resolve conflict with a success rate of over 99%.

agent, avoidance, obstacle avoidance, (12 more...)

2111.07037

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
North America > United States > Arizona > Maricopa County > Tempe (0.04)
North America > United States > New York (0.04)
(5 more...)

Genre: Research Report > New Finding (0.48)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Air (1.00)
Aerospace & Defense > Aircraft (1.00)
Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.69)

arXiv.org Artificial IntelligenceNov-12-2021

One model Packs Thousands of Items with Recurrent Conditional Query Learning

Li, Dongda, Gu, Zhaoquan, Wang, Yuexuan, Ren, Changwei, Lau, Francis C. M.

Recent studies have revealed that neural combinatorial optimization (NCO) has advantages over conventional algorithms in many combinatorial optimization problems such as routing, but it is less efficient for more complicated optimization tasks such as packing which involves mutually conditioned action spaces. In this paper, we propose a Recurrent Conditional Query Learning (RCQL) method to solve both 2D and 3D packing problems. We first embed states by a recurrent encoder, and then adopt attention with conditional queries from previous actions. The conditional query mechanism fills the information gap between learning steps, which shapes the problem as a Markov decision process. Benefiting from the recurrence, a single RCQL model is capable of handling different sizes of packing problems. Experiment results show that RCQL can effectively learn strong heuristics for offline and online strip packing problems (SPPs), outperforming a wide range of baselines in space utilization ratio. RCQL reduces the average bin gap ratio by 1.83% in offline 2D 40-box cases and 7.84% in 3D cases compared with state-of-the-art methods. Meanwhile, our method also achieves 5.64% higher space utilization ratio for SPPs with 1000 items than the state of the art.

algorithm, model pack thousand, reinforcement, (12 more...)

doi: 10.1016/j.knosys.2021.107683

2111.06726

Country:

Europe > United Kingdom > England > Nottinghamshire > Nottingham (0.04)
Asia > China > Hong Kong (0.04)
Asia > China > Guangdong Province > Guangzhou (0.04)

Genre: Research Report > Promising Solution (0.34)

Industry: Transportation (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.97)
(4 more...)

arXiv.org Artificial IntelligenceNov-12-2021

On-the-Fly Rectification for Robust Large-Vocabulary Topic Inference

Lee, Moontae, Cho, Sungjun, Dong, Kun, Mimno, David, Bindel, David

Across many data domains, co-occurrence statistics about the joint appearance of objects are powerfully informative. By transforming unsupervised learning problems into decompositions of co-occurrence statistics, spectral algorithms provide transparent and efficient algorithms for posterior inference such as latent topic analysis and community detection. As object vocabularies grow, however, it becomes rapidly more expensive to store and run inference algorithms on co-occurrence statistics. Rectifying co-occurrence, the key process to uphold model assumptions, becomes increasingly more vital in the presence of rare terms, but current techniques cannot scale to large vocabularies. We propose novel methods that simultaneously compress and rectify co-occurrence statistics, scaling gracefully with the size of vocabulary and the dimension of latent space. We also present new algorithms learning latent variables from the compressed statistics, and verify that our methods perform comparably to previous approaches on both textual and non-textual data.

algorithm, inference, on-the-fly rectification, (13 more...)

2111.0658

Country:

Asia > Middle East > Jordan (0.05)
North America > United States > New York > Tompkins County > Ithaca (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(2 more...)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Boyarskaya, Margarita, Ipeirotis, Panos

Full Characterization of Adaptively Strong Majority Voting in Crowdsourcing

arXiv.org Artificial IntelligenceNov-11-2021

A commonly used technique for quality control in crowdsourcing is to task the workers with examining an item and voting on whether the item is labeled correctly. To counteract possible noise in worker responses, one solution is to keep soliciting votes from more workers until the difference between the numbers of votes for the two possible outcomes exceeds a pre-specified threshold {\delta}. We show a way to model such {\delta}-margin voting consensus aggregation process using absorbing Markov chains. We provide closed-form equations for the key properties of this voting process -- namely, for the quality of the results, the expected number of votes to completion, the variance of the required number of votes, and other moments of the distribution. Using these results, we show further that one can adapt the value of the threshold {\delta} to achieve quality-equivalence across voting processes that employ workers of different accuracy levels. We then use this result to provide efficiency-equalizing payment rates for groups of workers characterized by different levels of response accuracy. Finally, we perform a set of simulated experiments using both fully synthetic data as well as real-life crowdsourced votes. We show that our theoretical model characterizes the outcomes of the consensus aggregation process well.

adaptively strong majority voting, consensus, vote, (12 more...)

2111.0639

Country:

North America > United States > New York (0.04)
North America > United States > New Jersey > Middlesex County > New Brunswick (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Communications > Social Media > Crowdsourcing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.49)

Furui, Akira, Akiyama, Tomoyuki, Tsuji, Toshio

A Time-Series Scale Mixture Model of EEG with a Hidden Markov Structure for Epileptic Seizure Detection

arXiv.org Machine LearningNov-11-2021

In this paper, we propose a time-series stochastic model based on a scale mixture distribution with Markov transitions to detect epileptic seizures in electroencephalography (EEG). In the proposed model, an EEG signal at each time point is assumed to be a random variable following a Gaussian distribution. The covariance matrix of the Gaussian distribution is weighted with a latent scale parameter, which is also a random variable, resulting in the stochastic fluctuations of covariances. By introducing a latent state variable with a Markov chain in the background of this stochastic relationship, time-series changes in the distribution of latent scale parameters can be represented according to the state of epileptic seizures. In an experiment, we evaluated the performance of the proposed model for seizure detection using EEGs with multiple frequency bands decomposed from a clinical dataset. The results demonstrated that the proposed model can detect seizures with high sensitivity and outperformed several baselines.

epileptic seizure detection, hidden markov structure, time-series scale mixture model, (1 more...)

2111.06526

Genre: Research Report (0.40)

Industry:

Health & Medicine > Therapeutic Area > Neurology > Epilepsy (0.80)
Health & Medicine > Therapeutic Area > Genetic Disease (0.80)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.69)

Raisbeck, John C., Allen, Matthew W., Lee, Hakho

Agent Spaces

arXiv.org Artificial IntelligenceNov-10-2021

Exploration is one of the most important tasks in Reinforcement Learning, but it is not well-defined beyond finite problems in the Dynamic Programming paradigm (see Subsection 2.4). We provide a reinterpretation of exploration which can be applied to any online learning method. We come to this definition by approaching exploration from a new direction. After finding that concepts of exploration created to solve simple Markov decision processes with Dynamic Programming are no longer broadly applicable, we reexamine exploration. Instead of extending the ends of dynamic exploration procedures, we extend their means. That is, rather than repeatedly sampling every state-action pair possible in a process, we define the act of modifying an agent to itself be explorative. The resulting definition of exploration can be applied in infinite problems and non-dynamic learning methods, which the dynamic notion of exploration cannot tolerate. To understand the way that modifications of an agent affect learning, we describe a novel structure on the set of agents: a collection of distances (see footnote 7) $d_{a} \in A$, which represent the perspectives of each agent possible in the process. Using these distances, we define a topology and show that many important structures in Reinforcement Learning are well behaved under the topology induced by convergence in the agent space.

agent, exploration, reinforcement learning, (14 more...)

2111.06005

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > Wyoming > Albany County > Laramie (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
(6 more...)

Genre:

Research Report (0.50)
Overview (0.46)

Industry:

Education (0.48)
Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.87)
(2 more...)