to

### Selecting the State-Representation in Reinforcement Learning

The problem of selecting the right state-representation in a reinforcement learning problem is considered. Several models (functions mapping past observations to a finite set) of the observations are given, and it is known that for at least one of these models the resulting state dynamics are indeed Markovian. Without knowing neither which of the models is the correct one, nor what are the probabilistic characteristics of the resulting MDP, it is required to obtain as much reward as the optimal policy for the correct model (or for the best of the correct models, if there are several). We propose an algorithm that achieves that, with a regret of order T {2/3} where T is the horizon time. Papers published at the Neural Information Processing Systems Conference.

### Quiz asks users to select correct spelling to complete the sentence

Many of us will have passed English exams at school, however how many of us have carried our knowledge into adult life? While some of us will be reluctant to admit that our spelling has slipped over the years many will be caught out from time to time. A new quiz from Playbuzz aims to test users basic knowledge of the English language by asking them to correct the correct spelling in a sentence. Scroll down to take the test (answers are in the captions... so no cheating!) Users will be given a scenario with the sentence missing a word and they must select the correct spelling in the context.

### Retired Teacher Corrects Error-Filled Letter From Trump, Sends it Back to White House

When Yvonne Mason, a retired English teacher, received a letter from the White House earlier this month, she was appalled. The letter, which was signed by President Donald Trump, was filled with "many silly mistakes" she recognized from her 17 years as a high school English teacher in South Carolina. So she took out her trusty purple pen and started making corrections to the letter. "Have ya'll tried grammar and style check?" Mason wrote at the top of the letter. "OMG this is WRONG!" she wrote in one part near the bottom of the letter.

### Debating artificial intelligence

In daily life, people are sometimes confronted with issues that they are at a loss to decide one way or the other. We are working on debating artificial intelligence (AI), which instantaneously analyzes large volumes of documents related to an issue, and constructs and presents a rational argument either for or against. Conventional artificial intelligence provides "the correct answer" based on objective facts and knowledge. In contrast, debating AI addresses issues where there is no single correct answer, clarifying the pros and cons by taking societal values into consideration. Using debating AI allows us to become aware of the rationale supporting opposite opinions or decisions to our own.

### Why doesn't extra supervision increase the performance of the SOTA language model? • /r/MachineLearning

I took the tensorflow implementation of the language model from Zaremba et al., 2014, and changed the loss function from what it was (crossentropy with 1-hot vector representing the correct word) to a loss made up of two terms, the first is the loss from before and the second is a crossentropy loss with a 1-hot vector representing the closest synonym to the target word. I tried playing around with the weighting of these two terms, but no matter what I did the results did not improve over the original model. Doesn't this new loss function basically tell the network "the next correct word is'dog', but if you say its'puppy' thats also OK"?