AITopics

Monte-Carlo Tree Search (MCTS) is one of the most-widely used methods for planning, and has powered many recent advances in artificial intelligence. In MCTS, one typically performs computations (i.e., simulations) to collect statistics about the possible future consequences of actions, and then chooses accordingly. Many popular MCTS methods such as UCT and its variants decide which computations to perform by trading-off exploration and exploitation. In this work, we take a more direct approach, and explicitly quantify the value of a computation based on its expected impact on the quality of the action eventually chosen. Our approach goes beyond the "myopic" limitations of existing computation-value-based methods in two senses: (I) we are able to account for the impact of non-immediate (ie, future) computations (II) on non-immediate actions. We show that policies that greedily optimize computation values are optimal under certain assumptions and obtain results that are competitive with the state-of-the-art.

computation, latexit latexit sha1, voc, (16 more...)

2002.04335

Country:

North America > United States > Virginia > Arlington County > Arlington (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(4 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

ReClor: A Reading Comprehension Dataset Requiring Logical Reasoning

Yu, Weihao, Jiang, Zihang, Dong, Yanfei, Feng, Jiashi

Recent powerful pre-trained language models have achieved remarkable performance on most of the popular datasets for reading comprehension. It is time to introduce more challenging datasets to push the development of this field towards more comprehensive reasoning of text. In this paper, we introduce a new Reading Comprehension dataset requiring logical reasoning (ReClor) extracted from standardized graduate admission examinations. As earlier studies suggest, human-annotated datasets usually contain biases, which are often exploited by models to achieve high accuracy without truly understanding the text. In order to comprehensively evaluate the logical reasoning ability of models on ReClor, we propose to identify biased data points and separate them into EASY set while the rest as HARD set. Empirical results show that state-of-the-art models have an outstanding ability to capture biases contained in the dataset with high accuracy on EASY set. However, they struggle on HARD set with poor performance near that of random guess, indicating more research is needed to essentially enhance the logical reasoning ability of current models. 1

conference paper, dataset, reasoning, (11 more...)

2002.04326

Country:

Atlantic Ocean > North Atlantic Ocean > Bay of Fundy (0.04)
Asia > Singapore (0.04)
Asia > Japan (0.04)

Genre: Research Report > New Finding (0.34)

Industry:

Education > Assessment & Standards > Student Performance (1.00)
Education > Educational Setting > Higher Education (0.94)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.67)

Arthur, Philip, Cohn, Trevor, Haffari, Gholamreza

Learning Coupled Policies for Simultaneous Machine Translation

In simultaneous machine translation, the system needs to incrementally generate the output translation before the input sentence ends. This is a coupled decision process consisting of a programmer and interpreter. The programmer's policy decides about when to WRITE the next output or READ the next input, and the interpreter's policy decides what word to write. We present an imitation learning (IL) approach to efficiently learn effective coupled programmer-interpreter policies. To enable IL, we present an algorithmic oracle to produce oracle READ/WRITE actions for training bilingual sentence-pairs using the notion of word alignments. We attribute the effectiveness of the learned coupled policies to (i) scheduled sampling addressing the coupled exposure bias, and (ii) quality of oracle actions capturing enough information from the partial input before writing the output. Experiments show our method outperforms strong baselines in terms of translation quality and delay, when translating from German/Arabic/Czech/Bulgarian/Romanian to English.

interpreter, programmer, translation, (14 more...)

2002.04306

Country:

Europe > Italy > Tuscany > Florence (0.04)
Europe > Spain > Valencian Community > Valencia Province > Valencia (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Human-to-Robot Attention Transfer for Robot Execution Failure Avoidance Using Stacked Neural Networks

Song, Boyi, Peng, Yuntao, Luo, Ruijiao, Liu, Rui

Due to world dynamics and hardware uncertainty, robots inevitably fail in task executions, leading to undesired or even dangerous executions. To avoid failures for improved robot performance, it is critical to identify and correct robot abnormal executions in an early stage. However, limited by reasoning capability and knowledge level, it is challenging for a robot to self diagnose and correct their abnormal behaviors. To solve this problem, a novel method is proposed, human-to-robot attention transfer (H2R-AT) to seek help from a human. H2R-AT is developed based on a novel stacked neural networks model, transferring human attention embedded in verbal reminders to robot attention embedded in robot visual perceiving. With the attention transfer from a human, a robot understands what and where human concerns are to identify and correct its abnormal executions. To validate the effectiveness of H2R-AT, two representative task scenarios, "serve water for a human in a kitchen" and "pick up a defective gear in a factory" with abnormal robot executions, were designed in an open-access simulation platform V-REP; $252$ volunteers were recruited to provide about 12000 verbal reminders to learn and test the attention transfer model H2R-AT. With an accuracy of $73.68\%$ in transferring attention and accuracy of $66.86\%$ in avoiding robot execution failures, the effectiveness of H2R-AT was validated.

execution, international conference, robot, (14 more...)

2002.04242

Country:

North America > United States > Washington > King County > Seattle (0.04)
South America > Uruguay > Maldonado > Maldonado (0.04)
Oceania > Australia > Queensland > Brisbane (0.04)
(6 more...)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Hyper-Meta Reinforcement Learning with Sparse Reward

Hua, Yun, Wang, Xiangfeng, Jin, Bo, Li, Wenhao, Yan, Junchi, He, Xiaofeng, Zha, Hongyuan

Despite their success, existing meta reinforcement learning methods still have difficulty in learning a meta policy effectively for RL problems with sparse reward. To this end, we develop a novel meta reinforcement learning framework, Hyper-Meta RL (HMRL), for sparse reward RL problems. It consists of meta state embedding, meta reward shaping and meta policy learning modules: The cross-environment meta state embedding module constructs a common meta state space to adapt to different environments; The meta state based environment-specific meta reward shaping effectively extends the original sparse reward trajectory by cross-environmental knowledge complementarity; As a consequence, the meta policy then achieves better generalization and efficiency with the shaped meta reward. Experiments with sparse reward show the superiority of HMRL on both transferability and policy learning efficiency.

meta reward, meta state, sparse reward, (14 more...)

2002.04238

Country:

Asia > China > Shanghai > Shanghai (0.05)
North America > United States (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Bhattacharyya, Arnab, Gayen, Sutanu, Kandasamy, Saravanan, Maran, Ashwin, Vinodchandran, N. V.

Efficiently Learning and Sampling Interventional Distributions from Observations

We study the problem of efficiently estimating the effect of an intervention on a single variable using observational samples in a causal Bayesian network. Our goal is to give algorithms that are efficient in both time and sample complexity in a non-parametric setting. Tian and Pearl (AAAI `02) have exactly characterized the class of causal graphs for which causal effects of atomic interventions can be identified from observational data. We make their result quantitative. Suppose P is a causal model on a set V of n observable variables with respect to a given causal graph G with observable distribution $P$. Let $P_x$ denote the interventional distribution over the observables with respect to an intervention of a designated variable X with x. We show that assuming that G has bounded in-degree, bounded c-components, and that the observational distribution is identifiable and satisfies certain strong positivity condition: 1. [Evaluation] There is an algorithm that outputs with probability $2/3$ an evaluator for a distribution $P'$ that satisfies $d_{tv}(P_x, P') \leq \epsilon$ using $m=\tilde{O}(n\epsilon^{-2})$ samples from $P$ and $O(mn)$ time. The evaluator can return in $O(n)$ time the probability $P'(v)$ for any assignment $v$ to $V$. 2. [Generation] There is an algorithm that outputs with probability $2/3$ a sampler for a distribution $\hat{P}$ that satisfies $d_{tv}(P_x, \hat{P}) \leq \epsilon$ using $m=\tilde{O}(n\epsilon^{-2})$ samples from $P$ and $O(mn)$ time. The sampler returns an iid sample from $\hat{P}$ with probability $1-\delta$ in $O(n\epsilon^{-1} \log\delta^{-1})$ time. We extend our techniques to estimate marginals $P_x|_Y$ over a given $Y \subset V$ of interest. We also show lower bounds for the sample complexity showing that our sample complexity has optimal dependence on the parameters n and $\epsilon$ as well as the strong positivity parameter.

algorithm, assumption 2, intervention, (17 more...)

2002.04232

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)
Asia > Singapore (0.04)
(9 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.91)

New ScientistFeb-10-2020, 19:22:07 GMT

Brain scans can help predict who'll benefit from an antidepressant

An AI can predict from people's brainwaves whether an antidepressant is likely to help them. The technique may offer a new approach to prescribing medicines for mental illnesses. "We have a central problem in psychiatry because we characterise diseases by their end point, such as what behaviours they cause," says Amit Etkin at Stanford University in California. "You tell me you're depressed, and I don't know any more than that. I don't really know what's going on in the brain and we prescribe medication on very little information."

algorithm, antidepressant, participant, (6 more...)

New Scientist

Country:

North America > United States > California (0.26)
Europe > Denmark > Capital Region > Copenhagen (0.06)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Neurology (0.91)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.57)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.41)

Daily Mail - Science & techFeb-10-2020, 18:27:12 GMT

Elephants mourn their dead even if they did not have a close bond

Elephants mourn their dead even if they did not have a close bond and continue to take an interest long after their bodies start to decay, a new study finds. Experts from the San Diego Zoo Institute for Conservation Research looked at 32 wild elephant carcasses from 12 different sources across Africa. They monitored the way in which the animals interacted with the carcasses and found that, in all cases, they would touch and examine the remains. They were also seen vocalising and attempting to lift or pull fallen elephants that had just died, according to researchers. New research has shown they mourn their dead even if they don't know them well (stock image) The idea that elephants have a'unique relationship' with the dead has been touted for a number of years, but this new study is the first to examine it in detail.

close bond, elephant, elephant mourn, (14 more...)

Daily Mail - Science & tech

Country:

North America > United States > California > San Diego County > San Diego (0.25)
Africa (0.25)
Asia > Myanmar (0.05)

Genre: Research Report (0.70)

Technology:

Information Technology > Communications > Social Media (0.47)
Information Technology > Artificial Intelligence (0.31)