AITopics

2004.10019

Country:

North America > United States > Illinois (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

Kim, Seokhwan, Eric, Mihail, Gopalakrishnan, Karthik, Hedayatnia, Behnam, Liu, Yang, Hakkani-Tur, Dilek

Beyond Domain APIs: Task-oriented Conversational Modeling with Unstructured Knowledge Access

arXiv.org Artificial IntelligenceJun-5-2020

Most prior work on task-oriented dialogue systems are restricted to a limited coverage of domain APIs, while users oftentimes have domain related requests that are not covered by the APIs. In this paper, we propose to expand coverage of task-oriented dialogue systems by incorporating external unstructured knowledge sources. We define three sub-tasks: knowledge-seeking turn detection, knowledge selection, and knowledge-grounded response generation, which can be modeled individually or jointly. We introduce an augmented version of MultiWOZ 2.1, which includes new out-of-API-coverage turns and responses grounded on external knowledge sources. We present baselines for each sub-task using both conventional and neural approaches. Our experimental results demonstrate the need for further research in this direction to enable more informative conversational systems.

artificial intelligence, machine learning, natural language, (22 more...)

2006.03533

Country:

North America > United States > California > Santa Clara County > Sunnyvale (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)

Genre: Research Report > New Finding (0.34)

Industry:

Consumer Products & Services (0.47)
Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Zhang, Ziyao, Ma, Liang, Leung, Kin K., Poularakis, Konstantinos, Srivatsa, Mudhakar

State Action Separable Reinforcement Learning

arXiv.org Artificial IntelligenceJun-5-2020

Reinforcement Learning (RL) based methods have seen their paramount successes in solving serial decision-making and control problems in recent years. For conventional RL formulations, Markov Decision Process (MDP) and state-action-value function are the basis for the problem modeling and policy evaluation. However, several challenging issues still remain. Among most cited issues, the enormity of state/action space is an important factor that causes inefficiency in accurately approximating the state-action-value function. We observe that although actions directly define the agents' behaviors, for many problems the next state after a state transition matters more than the action taken, in determining the return of such a state transition. In this regard, we propose a new learning paradigm, State Action Separable Reinforcement Learning (sasRL), wherein the action space is decoupled from the value function learning process for higher efficiency. Then, a light-weight transition model is learned to assist the agent to determine the action that triggers the associated state transition. In addition, our convergence analysis reveals that under certain conditions, the convergence time of sasRL is $O(T^{1/k})$, where $T$ is the convergence time for updating the value function in the MDP-based formulation and $k$ is a weighting factor. Experiments on several gaming scenarios show that sasRL outperforms state-of-the-art MDP-based RL algorithms by up to $75\%$.

machine learning, reinforcement learning, state transition, (16 more...)

2006.03713

Country:

Europe > United Kingdom > England > Greater London > London (0.04)
North America > United States > Connecticut > New Haven County > New Haven (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
(2 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Shlezinger, Nir, Farsad, Nariman, Eldar, Yonina C., Goldsmith, Andrea J.

Inference from Stationary Time Sequences via Learned Factor Graphs

arXiv.org Machine LearningJun-5-2020

The design of methods for inference from time sequences has traditionally relied on statistical models that describe the relation between a latent desired sequence and the observed one. A broad family of model-based algorithms have been derived to carry out inference at controllable complexity using recursive computations over the factor graph representing the underlying distribution. An alternative model-agnostic approach utilizes machine learning (ML) methods. Here we propose a framework that combines model-based inference algorithms and data-driven ML tools for stationary time sequences. In the proposed approach, neural networks are developed to separately learn specific components of a factor graph describing the distribution of the time sequence, rather than the complete inference task. By exploiting stationary properties of this distribution, the resulting approach can be applied to sequences of varying temporal duration. Additionally, this approach facilitates the use of compact neural networks which can be trained with small training sets, or alternatively, can be used to improve upon existing deep inference systems. We present an inference algorithm based on learned stationary factor graphs, referred to as StaSPNet, which learns to implement the sum product scheme from labeled data, and can be applied to sequences of different lengths. Our experimental results demonstrate the ability of the proposed StaSPNet to learn to carry out accurate inference from small training sets for sleep stage detection using the Sleep-EDF dataset, as well as for symbol detection in digital communications with unknown channels.

artificial intelligence, factor graph, machine learning, (16 more...)

2006.03258

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.14)
Asia > Middle East > Israel (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > India (0.04)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

#artificialintelligenceJun-4-2020, 09:31:06 GMT

Advanced AI: Deep Reinforcement Learning in Python

Online Courses Udemy Advanced AI: Deep Reinforcement Learning in Python, The Complete Guide to Mastering Artificial Intelligence using Deep Learning and Neural Networks Created by Lazy Programmer Team, Lazy Programmer Inc. English [Auto-generated], Indonesian [Auto-generated], 5 more Students also bought Deep Learning: Convolutional Neural Networks in Python Deep Learning: Recurrent Neural Networks in Python Unsupervised Machine Learning Hidden Markov Models in Python Bayesian Machine Learning in Python: A/B Testing Data Science: Supervised Machine Learning in Python Preview this course GET COUPON CODE Description This course is all about the application of deep learning and neural networks to reinforcement learning. If you've taken my first reinforcement learning class, then you know that reinforcement learning is on the bleeding edge of what we can do with AI. Specifically, the combination of deep learning with reinforcement learning has led to AlphaGo beating a world champion in the strategy game Go, it has led to self-driving cars, and it has led to machines that can play video games at a superhuman level. Reinforcement learning has been around since the 70s but none of this has been possible until now. The world is changing at a very fast pace.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

#artificialintelligence

Country: North America > United States > California (0.05)

Genre: Instructional Material > Course Syllabus & Notes (0.51)

Industry:

Leisure & Entertainment > Games (1.00)
Education > Educational Setting > Online (1.00)
Education > Educational Technology > Educational Software > Computer Based Training (0.73)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.97)

arXiv.org Machine LearningJun-4-2020

DASC: Towards A Road Damage-Aware Social-Media-Driven Car Sensing Framework for Disaster Response Applications

Rashid, Md Tahmid, Daniel, null, Zhang, null, Wang, Dong

While vehicular sensor networks (VSNs) have earned the stature of a mobile sensing paradigm utilizing sensors built into cars, they have limited sensing scopes since car drivers only opportunistically discover new events. Conversely, social sensing is emerging as a new sensing paradigm where measurements about the physical world are collected from humans. In contrast to VSNs, social sensing is more pervasive, but one of its key limitations lies in its inconsistent reliability stemming from the data contributed by unreliable human sensors. In this paper, we present DASC, a road Damage-Aware Social-media-driven Car sensing framework that exploits the collective power of social sensing and VSNs for reliable disaster response applications. However, integrating VSNs with social sensing introduces a new set of challenges: i) How to leverage noisy and unreliable social signals to route the vehicles to accurate regions of interest? ii) How to tackle the inconsistent availability (e.g., churns) caused by car drivers being rational actors? iii) How to efficiently guide the cars to the event locations with little prior knowledge of the road damage caused by the disaster, while also handling the dynamics of the physical world and social media? The DASC framework addresses the above challenges by establishing a novel hybrid social-car sensing system that employs techniques from game theory, feedback control, and Markov Decision Process (MDP). In particular, DASC distills signals emitted from social media and discovers the road damages to effectively drive cars to target areas for verifying emergency events. We implement and evaluate DASC in a reputed vehicle simulator that can emulate real-world disaster response scenarios. The results of a real-world application demonstrate the superiority of DASC over current VSNs-based solutions in detection accuracy and efficiency.

ground transportation, road damage, upstream oil & gas, (23 more...)

2006.02681

Country:

North America > United States > Texas > Harris County > Houston (0.14)
North America > United States > Texas > Harris County > Pasadena (0.14)
Europe (0.14)

Genre:

Overview (0.67)
Research Report > New Finding (0.46)

Industry:

Transportation > Ground > Road (1.00)
Health & Medicine (1.00)
Government (1.00)
(3 more...)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.66)

Baucum, Matt, Khojandi, Anahita, Papamarkou, Theodore

Hidden Markov models are recurrent neural networks: A disease progression modeling application

arXiv.org Machine LearningJun-4-2020

Hidden Markov models (HMMs) are commonly used for sequential data modeling when the true state of the system is not fully known. We formulate a special case of recurrent neural networks (RNNs), which we name hidden Markov recurrent neural networks (HMRNNs), and prove that each HMRNN has the same likelihood function as a corresponding discrete-observation HMM. We experimentally validate this theoretical result on synthetic datasets by showing that parameter estimates from HMRNNs are numerically close to those obtained from HMMs via the Baum-Welch algorithm. We demonstrate our method's utility in a case study on Alzheimer's disease progression, in which we augment HMRNNs with other predictive neural networks. The augmented HMRNN yields parameter estimates that offer a novel clinical interpretation and fit the patient data better than HMM parameter estimates from the Baum-Welch algorithm.

artificial intelligence, machine learning, neural network, (17 more...)

2006.03151

Country:

North America > United States > California (0.28)
North America > United States > Tennessee > Knox County > Knoxville (0.14)
South America > Colombia (0.04)
(2 more...)

Genre: Research Report > Experimental Study (0.47)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Roy, Josh, Konidaris, George

Visual Transfer for Reinforcement Learning via Wasserstein Domain Confusion

arXiv.org Machine LearningJun-4-2020

We introduce Wasserstein Adversarial Proximal Policy Optimization (WAPPO), a novel algorithm for visual transfer in Reinforcement Learning that explicitly learns to align the distributions of extracted features between a source and target task. WAPPO approximates and minimizes the Wasserstein-1 distance between the distributions of features from source and target domains via a novel Wasserstein Confusion objective. WAPPO outperforms the prior state-of-the-art in visual transfer and successfully transfers policies across Visual Cartpole and two instantiations of 16 OpenAI Procgen environments.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

2006.03465

Country: North America > United States > Rhode Island > Providence County > Providence (0.04)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

arXiv.org Artificial IntelligenceJun-2-2020

Kernel Taylor-Based Value Function Approximation for Continuous-State Markov Decision Processes

Xu, Junhong, Yin, Kai, Liu, Lantao

We propose a principled kernel-based policy iteration algorithm to solve the continuous-state Markov Decision Processes (MDPs). In contrast to most decision-theoretic planning frameworks, which assume fully known state transition models, we design a method that eliminates such a strong assumption, which is oftentimes extremely difficult to engineer in reality. To achieve this, we first apply the second-order Taylor expansion of the value function. The Bellman optimality equation is then approximated by a partial differential equation, which only relies on the first and second moments of the transition model. By combining the kernel representation of value function, we then design an efficient policy iteration algorithm whose policy evaluation step can be represented as a linear system of equations characterized by a finite set of supporting states. We have validated the proposed method through extensive simulations in both simplified and realistic planning scenarios, and the experiments show that our proposed approach leads to a much superior performance over several baseline methods.

artificial intelligence, machine learning, planning & scheduling, (16 more...)

2006.02008

Country:

North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
North America > United States > Virginia (0.04)
North America > United States > Indiana > Monroe County > Bloomington (0.04)
(3 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.71)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.64)

arXiv.org Artificial IntelligenceJun-2-2020

Maximizing Cumulative User Engagement in Sequential Recommendation: An Online Optimization Perspective

Zhao, Yifei, Zhou, Yu-Hang, Ou, Mingdong, Xu, Huan, Li, Nan

To maximize cumulative user engagement (e.g. cumulative clicks) in sequential recommendation, it is often needed to tradeoff two potentially conflicting objectives, that is, pursuing higher immediate user engagement (e.g., click-through rate) and encouraging user browsing (i.e., more items exposured). Existing works often study these two tasks separately, thus tend to result in sub-optimal results. In this paper, we study this problem from an online optimization perspective, and propose a flexible and practical framework to explicitly tradeoff longer user browsing length and high immediate user engagement. Specifically, by considering items as actions, user's requests as states and user leaving as an absorbing state, we formulate each user's behavior as a personalized Markov decision process (MDP), and the problem of maximizing cumulative user engagement is reduced to a stochastic shortest path (SSP) problem. Meanwhile, with immediate user engagement and quit probability estimation, it is shown that the SSP problem can be efficiently solved via dynamic programming. Experiments on real-world datasets demonstrate the effectiveness of the proposed approach. Moreover, this approach is deployed at a large E-commerce platform, achieved over 7% improvement of cumulative clicks.

artificial intelligence, machine learning, user engagement, (18 more...)

2006.0452

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Oceania > Australia > Victoria > Melbourne (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
(12 more...)

Genre: Research Report (0.50)

Industry: Information Technology > Services (0.34)

Technology:

Information Technology > Human Computer Interaction (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)