Goto

Collaborating Authors

Results


Conversational Agents: Theory and Applications

arXiv.org Artificial Intelligence

In this chapter, we provide a review of conversational agents (CAs), discussing chatbots, intended for casual conversation with a user, as well as task-oriented agents that generally engage in discussions intended to reach one or several specific goals, often (but not always) within a specific domain. We also consider the concept of embodied conversational agents, briefly reviewing aspects such as character animation and speech processing. The many different approaches for representing dialogue in CAs are discussed in some detail, along with methods for evaluating such agents, emphasizing the important topics of accountability and interpretability. A brief historical overview is given, followed by an extensive overview of various applications, especially in the fields of health and education. We end the chapter by discussing benefits and potential risks regarding the societal impact of current and future CA technology.


Challenges of Artificial Intelligence -- From Machine Learning and Computer Vision to Emotional Intelligence

arXiv.org Artificial Intelligence

Artificial intelligence (AI) has become a part of everyday conversation and our lives. It is considered as the new electricity that is revolutionizing the world. AI is heavily invested in both industry and academy. However, there is also a lot of hype in the current AI debate. AI based on so-called deep learning has achieved impressive results in many problems, but its limits are already visible. AI has been under research since the 1940s, and the industry has seen many ups and downs due to over-expectations and related disappointments that have followed. The purpose of this book is to give a realistic picture of AI, its history, its potential and limitations. We believe that AI is a helper, not a ruler of humans. We begin by describing what AI is and how it has evolved over the decades. After fundamentals, we explain the importance of massive data for the current mainstream of artificial intelligence. The most common representations for AI, methods, and machine learning are covered. In addition, the main application areas are introduced. Computer vision has been central to the development of AI. The book provides a general introduction to computer vision, and includes an exposure to the results and applications of our own research. Emotions are central to human intelligence, but little use has been made in AI. We present the basics of emotional intelligence and our own research on the topic. We discuss super-intelligence that transcends human understanding, explaining why such achievement seems impossible on the basis of present knowledge,and how AI could be improved. Finally, a summary is made of the current state of AI and what to do in the future. In the appendix, we look at the development of AI education, especially from the perspective of contents at our own university.


Empowering Things with Intelligence: A Survey of the Progress, Challenges, and Opportunities in Artificial Intelligence of Things

arXiv.org Artificial Intelligence

In the Internet of Things (IoT) era, billions of sensors and devices collect and process data from the environment, transmit them to cloud centers, and receive feedback via the internet for connectivity and perception. However, transmitting massive amounts of heterogeneous data, perceiving complex environments from these data, and then making smart decisions in a timely manner are difficult. Artificial intelligence (AI), especially deep learning, is now a proven success in various areas including computer vision, speech recognition, and natural language processing. AI introduced into the IoT heralds the era of artificial intelligence of things (AIoT). This paper presents a comprehensive survey on AIoT to show how AI can empower the IoT to make it faster, smarter, greener, and safer. Specifically, we briefly present the AIoT architecture in the context of cloud computing, fog computing, and edge computing. Then, we present progress in AI research for IoT from four perspectives: perceiving, learning, reasoning, and behaving. Next, we summarize some promising applications of AIoT that are likely to profoundly reshape our world. Finally, we highlight the challenges facing AIoT and some potential research opportunities.


Online learning in MDPs with linear function approximation and bandit feedback

arXiv.org Machine Learning

We consider an online learning problem where the learner interacts with a Markov decision process in a sequence of episodes, where the reward function is allowed to change between episodes in an adversarial manner and the learner only gets to observe the rewards associated with its actions. We allow the state space to be arbitrarily large, but we assume that all action-value functions can be represented as linear functions in terms of a known low-dimensional feature map, and that the learner has access to a simulator of the environment that allows generating trajectories from the true MDP dynamics. Our main contribution is developing a computationally efficient algorithm that we call MDP-LinExp3, and prove that its regret is bounded by $\widetilde{\mathcal{O}}\big(H^2 T^{2/3} (dK)^{1/3}\big)$, where $T$ is the number of episodes, $H$ is the number of steps in each episode, $K$ is the number of actions, and $d$ is the dimension of the feature map. We also show that the regret can be improved to $\widetilde{\mathcal{O}}\big(H^2 \sqrt{TdK}\big)$ under much stronger assumptions on the MDP dynamics. To our knowledge, MDP-LinExp3 is the first provably efficient algorithm for this problem setting.


On Online Learning in Kernelized Markov Decision Processes

arXiv.org Machine Learning

Abstract-- We develop algorithms with low regret for learning episodic Markov decision processes based on kernel approximation techniques. The algorithms are based on both the Upper Confidence Bound (UCB) as well as Posterior or Thompson Sampling (PSRL) philosophies, and work in the general setting of continuous state and action spaces when the true unknown transition dynamics are assumed to have smoothness induced by an appropriate Reproducing Kernel Hilbert Space (RKHS). I. INTRODUCTION The goal of reinforcement learning (RL) is to learn optimal behavior by repeated interaction with an unknown environment, usually modeled as a Markov Decision Process (MDP). Performance is typically measured by the amount of interaction, in terms of episodes or rounds, needed to arriv e at an optimal (or near-optimal) policy; this is also known as the sample complexity of RL [1]. The sample complexity objective encourages efficient exploration across states a nd actions, but, at the same time, is indifferent to the reward earned during the learning phase.


Overview of Udacity Artificial Intelligence Engineer Nanodegree, Term 1

#artificialintelligence

After finishing Udacity Deep Learning Foundation I felt that I got a good introduction to Deep Learning, but to understand things, I must dig deeper. Besides I had a guaranteed admission to Self-Driving Car Engineer, Artificial Intelligence, or Robotics Nanodegree programs.


Fast rates for online learning in Linearly Solvable Markov Decision Processes

arXiv.org Machine Learning

We study the problem of online learning in a class of Markov decision processes known as linearly solvable MDPs. In the stationary version of this problem, a learner interacts with its environment by directly controlling the state transitions, attempting to balance a fixed state-dependent cost and a certain smooth cost penalizing extreme control inputs. In the current paper, we consider an online setting where the state costs may change arbitrarily between consecutive rounds, and the learner only observes the costs at the end of each respective round. We are interested in constructing algorithms for the learner that guarantee small regret against the best stationary control policy chosen in full knowledge of the cost sequence. Our main result is showing that the smoothness of the control cost enables the simple algorithm of following the leader to achieve a regret of order $\log^2 T$ after $T$ rounds, vastly improving on the best known regret bound of order $T^{3/4}$ for this setting.


A Stackelberg Game Approach for Incentivizing Participation in Online Educational Forums with Heterogeneous Student Population

AAAI Conferences

Increased interest in web-based education has spurred the proliferation of online learning environments. However, these platforms suffer from high dropout rates due to lack of sustained motivation among the students taking the course. In an effort to address this problem, we propose an incentive-based, instructor-driven approach to orchestrate the interactions in online educational forums (OEFs). Our approach takes into account the heterogeneity in skills among the students as well as the limited budget available to the instructor. We first analytically model OEFs in a non-strategic setting using ideas from lumpable continuous time Markov chains and compute expected aggregate transient net-rewards for the instructor and the students. We next consider a strategic setting where we use the rewards computed above to set up a mixed-integer linear program which views an OEF as a single-leader-multiple-followers Stackelberg game and recommends an optimal plan to the instructor for maximizing student participation. Our experimental results reveal several interesting phenomena including a striking non-monotonicity in the level of participation of students vis-a-vis the instructor's arrival rate.


Online learning in MDPs with side information

arXiv.org Machine Learning

We study online learning of finite Markov decision process (MDP) problems when a side information vector is available. The problem is motivated by applications such as clinical trials, recommendation systems, etc. Such applications have an episodic structure, where each episode corresponds to a patient/customer. Our objective is to compete with the optimal dynamic policy that can take side information into account. We propose a computationally efficient algorithm and show that its regret is at most $O(\sqrt{T})$, where $T$ is the number of rounds. To best of our knowledge, this is the first regret bound for this setting.


Online learning in episodic Markovian decision processes by relative entropy policy search

Neural Information Processing Systems

We study the problem of online learning in finite episodic Markov decision processes (MDPs)where the loss function is allowed to change between episodes. The natural performance measure in this learning problem is the regret defined as the difference between the total loss of the best stationary policy and the total loss suffered by the learner. We assume that the learner is given access to a finite action space A and the state space X has a layered structure with L layers, so that state transitions are only possible between consecutive layers. We describe a variant of the recently proposed Relative Entropy Policy Search algorithm and show that its regret after T episodes is 2 L X A T log( X A /L) in the bandit setting and 2L T log( X A /L) in the full information setting, given that the learner has perfect knowledge of the transition probabilities of the underlying MDP. These guarantees largely improve previously known results under much milder assumptions andcannot be significantly improved under general assumptions.