Markov Models
Deep Reinforcement Learning in Computer Vision: A Comprehensive Survey
Le, Ngan, Rathour, Vidhiwar Singh, Yamazaki, Kashu, Luu, Khoa, Savvides, Marios
Recent works have demonstrated the remarkable successes of deep reinforcement learning in various domains including finance, medicine, healthcare, video games, robotics, and computer vision. In this work, we provide a detailed review of recent and state-of-the-art research advances of deep reinforcement learning in computer vision. We start with comprehending the theories of deep learning, reinforcement learning, and deep reinforcement learning. We then propose a categorization of deep reinforcement learning methodologies and discuss their advantages and limitations. In particular, we divide deep reinforcement learning into seven main categories according to their applications in computer vision, i.e. (i) landmark localization (ii) object detection; (iii) object tracking; (iv) registration on both 2D image and 3D image volumetric data (v) image segmentation; (vi) videos analysis; and (vii) other applications. Each of these categories is further analyzed with reinforcement learning techniques, network design, and performance. Moreover, we provide a comprehensive analysis of the existing publicly available datasets and examine source code availability. Finally, we present some open issues and discuss future research directions on deep reinforcement learning in computer vision.
From Statistical Relational to Neural Symbolic Artificial Intelligence: a Survey
Marra, Giuseppe, Dumančić, Sebastijan, Manhaeve, Robin, De Raedt, Luc
The integration of learning and reasoning is one of the key challenges in artificial intelligence and machine learning today, and various communities have been addressing it. That is especially true for the field of neural-symbolic computation (NeSy) [10, 21], where the goal is to integrate symbolic reasoning and neural networks. NeSy already has a long tradition, and it has recently attracted a lot of attention from various communities (cf. the keynotes of Y. Bengio and H. Kautz on this topic at AAAI 2020, the AI Debate [9] between Y. Bengio and G. Marcus). Another domain that has a rich tradition in integrating learning and reasoning is that of statistical relational learning and artificial intelligence (StarAI) [39, 85]. But rather than focusing on integrating logic and neural networks, it is centred around the question of integrating logic with probabilistic reasoning, more specifically probabilistic graphical models. Despite the common interest in combining symbolic reasoning with a basic paradigm for learning, i.e., probabilistic graphical models or neural networks, it is surprising that there are not more interactions between these two fields.
Congratulations to the #IJCAI2021 best paper award winners
The IJCAI-2021 awards were announced during the opening ceremony of the International Joint Conference on Artificial Intelligence (IJCAI-21). The honours included the 2021 AIJ classic paper award, the AIJ prominent paper award, and the IJCAI-JAIR best paper prize. This award recognizes outstanding papers, exceptional in their significance and impact, that were published at least 15 years ago, in the journal Artificial Intelligence (AIJ). This paper brought partially observable Markov decision processes (POMDPs) from the field of operational research to the field of AI. It provides an excellent account of the theory behind POMDPs, which demystified the field for a generation of researchers, and popularised their use in both AI and robotics.
Quantum adaptive agents with efficient long-term memories
Elliott, Thomas J., Gu, Mile, Garner, Andrew J. P., Thompson, Jayne
Central to the success of adaptive systems is their ability to interpret signals from their environment and respond accordingly -- they act as agents interacting with their surroundings. Such agents typically perform better when able to execute increasingly complex strategies. This comes with a cost: the more information the agent must recall from its past experiences, the more memory it will need. Here we investigate the power of agents capable of quantum information processing. We uncover the most general form a quantum agent need adopt to maximise memory compression advantages, and provide a systematic means of encoding their memory states. We show these encodings can exhibit extremely favourable scaling advantages relative to memory-minimal classical agents when information must be retained about events increasingly far into the past.
Automatic Speech Recognition using limited vocabulary: A survey
Fendji, Jean Louis K. E., Tala, Diane M., Yenke, Blaise O., Atemkeng, Marcellin
Automatic Speech Recognition (ASR) is an active field of research due to its huge number of applications and the proliferation of interfaces or computing devices that can support speech processing. But the bulk of applications is based on well-resourced languages that overshadow under-resourced ones. Yet ASR represents an undeniable mean to promote such languages, especially when design human-to-human or human-to-machine systems involving illiterate people. An approach to design an ASR system targeting under-resourced languages is to start with a limited vocabulary. ASR using a limited vocabulary is a subset of the speech recognition problem that focuses on the recognition of a small number of words or sentences. This paper aims to provide a comprehensive view of mechanisms behind ASR systems as well as techniques, tools, projects, recent contributions, and possibly future directions in ASR using a limited vocabulary. This work consequently provides a way to go when designing ASR system using limited vocabulary. Although an emphasis is put on limited vocabulary, most of the tools and techniques reported in this survey applied to ASR systems in general.
Sequential Stochastic Optimization in Separable Learning Environments
Bishop, R. Reid, White, Chelsea C. III
We consider a class of sequential decision-making problems under uncertainty that can encompass various types of supervised learning concepts. These problems have a completely observed state process and a partially observed modulation process, where the state process is affected by the modulation process only through an observation process, the observation process only observes the modulation process, and the modulation process is exogenous to control. We model this broad class of problems as a partially observed Markov decision process (POMDP). The belief function for the modulation process is control invariant, thus separating the estimation of the modulation process from the control of the state process. We call this specially structured POMDP the separable POMDP, or SEP-POMDP, and show it (i) can serve as a model for a broad class of application areas, e.g., inventory control, finance, healthcare systems, (ii) inherits value function and optimal policy structure from a set of completely observed MDPs, (iii) can serve as a bridge between classical models of sequential decision making under uncertainty having fully specified model artifacts and such models that are not fully specified and require the use of predictive methods from statistics and machine learning, and (iv) allows for specialized approximate solution procedures.
Towards Personalized and Human-in-the-Loop Document Summarization
The ubiquitous availability of computing devices and the widespread use of the internet have generated a large amount of data continuously. Therefore, the amount of available information on any given topic is far beyond humans' processing capacity to properly process, causing what is known as information overload. To efficiently cope with large amounts of information and generate content with significant value to users, we require identifying, merging and summarising information. Data summaries can help gather related information and collect it into a shorter format that enables answering complicated questions, gaining new insight and discovering conceptual boundaries. This thesis focuses on three main challenges to alleviate information overload using novel summarisation techniques. It further intends to facilitate the analysis of documents to support personalised information extraction. This thesis separates the research issues into four areas, covering (i) feature engineering in document summarisation, (ii) traditional static and inflexible summaries, (iii) traditional generic summarisation approaches, and (iv) the need for reference summaries. We propose novel approaches to tackle these challenges, by: i)enabling automatic intelligent feature engineering, ii) enabling flexible and interactive summarisation, iii) utilising intelligent and personalised summarisation approaches. The experimental results prove the efficiency of the proposed approaches compared to other state-of-the-art models. We further propose solutions to the information overload problem in different domains through summarisation, covering network traffic data, health data and business process data.
Reinforcement Learning to Optimize Lifetime Value in Cold-Start Recommendation
Ji, Luo, Qi, Qin, Han, Bingqing, Yang, Hongxia
Recommender system plays a crucial role in modern E-commerce platform. Due to the lack of historical interactions between users and items, cold-start recommendation is a challenging problem. In order to alleviate the cold-start issue, most existing methods introduce content and contextual information as the auxiliary information. Nevertheless, these methods assume the recommended items behave steadily over time, while in a typical E-commerce scenario, items generally have very different performances throughout their life period. In such a situation, it would be beneficial to consider the long-term return from the item perspective, which is usually ignored in conventional methods. Reinforcement learning (RL) naturally fits such a long-term optimization problem, in which the recommender could identify high potential items, proactively allocate more user impressions to boost their growth, therefore improve the multi-period cumulative gains. Inspired by this idea, we model the process as a Partially Observable and Controllable Markov Decision Process (POC-MDP), and propose an actor-critic RL framework (RL-LTV) to incorporate the item lifetime values (LTV) into the recommendation. In RL-LTV, the critic studies historical trajectories of items and predict the future LTV of fresh item, while the actor suggests a score-based policy which maximizes the future LTV expectation. Scores suggested by the actor are then combined with classical ranking scores in a dual-rank framework, therefore the recommendation is balanced with the LTV consideration. Our method outperforms the strong live baseline with a relative improvement of 8.67% and 18.03% on IPV and GMV of cold-start items, on one of the largest E-commerce platform.
Explainable Reinforcement Learning for Broad-XAI: A Conceptual Framework and Survey
Dazeley, Richard, Vamplew, Peter, Cruz, Francisco
Broad Explainable Artificial Intelligence moves away from interpreting individual decisions based on a single datum and aims to provide integrated explanations from multiple machine learning algorithms into a coherent explanation of an agent's behaviour that is aligned to the communication needs of the explainee. Reinforcement Learning (RL) methods, we propose, provide a potential backbone for the cognitive model required for the development of Broad-XAI. RL represents a suite of approaches that have had increasing success in solving a range of sequential decision-making problems. However, these algorithms all operate as black-box problem solvers, where they obfuscate their decision-making policy through a complex array of values and functions. EXplainable RL (XRL) is relatively recent field of research that aims to develop techniques to extract concepts from the agent's: perception of the environment; intrinsic/extrinsic motivations/beliefs; Q-values, goals and objectives. This paper aims to introduce a conceptual framework, called the Causal XRL Framework (CXF), that unifies the current XRL research and uses RL as a backbone to the development of Broad-XAI. Additionally, we recognise that RL methods have the ability to incorporate a range of technologies to allow agents to adapt to their environment. CXF is designed for the incorporation of many standard RL extensions and integrated with external ontologies and communication facilities so that the agent can answer questions that explain outcomes and justify its decisions.
Development of a Conversation State Prediction System
With the evolution of the concept of Speaker diarization using LSTM, it's relatively easier to understand the speaker identities for specific segments of input audio stream data than manually tagging the data. With such a concept, it's highly desirable to consider the possibility of using the identified speaker identities to aid in predicting the future Speaker States in a conversation. In this study, the Markov Chains are used to identify and update the Speaker States for the next conversations between the same set of speakers, to enable identification of their states in the most natural and long conversations. The model is based on several audio samples from natural conversations of three or greater than three speakers in two datasets, with overall total error percentages for recognized states being lesser than or equal to 12%. The findings imply that the proposed extension to the Speaker diarization is effective to predict the states for a conversation.