Goto

Collaborating Authors

 milo



Mitigating Covariate Shift in Imitation Learning via Offline Data With Partial Coverage

Neural Information Processing Systems

This paper studies offline Imitation Learning (IL) where an agent learns to imitate an expert demonstrator without additional online environment interactions. Instead, the learner is presented with a static offline dataset of state-action-next state triples from a potentially less proficient behavior policy. We introduce Model-based IL from Offline data (MILO): an algorithmic framework that utilizes the static dataset to solve the offline IL problem efficiently both in theory and in practice. In theory, even if the behavior policy is highly sub-optimal compared to the expert, we show that as long as the data from the behavior policy provides sufficient coverage on the expert state-action traces (and with no necessity for a global coverage over the entire state-action space), MILO can provably combat the covariate shift issue in IL. Complementing our theory results, we also demonstrate that a practical implementation of our approach mitigates covariate shift on benchmark MuJoCo continuous control tasks. We demonstrate that with behavior policies whose performances are less than half of that of the expert, MILO still successfully imitates with an extremely low number of expert state-action pairs while traditional offline IL methods such as behavior cloning (BC) fail completely. Source code is provided at https://github.com/jdchang1/milo.




Your next parcel could be delivered by a robot DOG: Major UK courier service starts using four-legged bots for deliveries

Daily Mail - Science & tech

It might not be able to fetch the paper for you, but a robot dog might soon bring you your parcels. Milo, the four-legged delivery bot, has started taking to the streets of Yorkshire as part of a new trial for delivery firm Evri. The robot dog has been trained to jump in and out of the van, navigate to customers' doors, and drop off packages without any assistance. Milo will be joining Evri's regular drivers over the next fortnight as they make their rounds in Morley, Leeds. Evri hopes that these robot co-pilots will take the strain off their human counterparts, freeing up more time for complex jobs like parking or navigating.


Mitigating Covariate Shift in Imitation Learning via Offline Data With Partial Coverage

Neural Information Processing Systems

This paper studies offline Imitation Learning (IL) where an agent learns to imitate an expert demonstrator without additional online environment interactions. Instead, the learner is presented with a static offline dataset of state-action-next state triples from a potentially less proficient behavior policy. We introduce Model-based IL from Offline data (MILO): an algorithmic framework that utilizes the static dataset to solve the offline IL problem efficiently both in theory and in practice. In theory, even if the behavior policy is highly sub-optimal compared to the expert, we show that as long as the data from the behavior policy provides sufficient coverage on the expert state-action traces (and with no necessity for a global coverage over the entire state-action space), MILO can provably combat the covariate shift issue in IL. Complementing our theory results, we also demonstrate that a practical implementation of our approach mitigates covariate shift on benchmark MuJoCo continuous control tasks.


Mitigating Covariate Shift in Imitation Learning via Offline Data Without Great Coverage

Chang, Jonathan D., Uehara, Masatoshi, Sreenivas, Dhruv, Kidambi, Rahul, Sun, Wen

arXiv.org Machine Learning

This paper studies offline Imitation Learning (IL) where an agent learns to imitate an expert demonstrator without additional online environment interactions. Instead, the learner is presented with a static offline dataset of state-action-next state transition triples from a potentially less proficient behavior policy. We introduce Model-based IL from Offline data (MILO): an algorithmic framework that utilizes the static dataset to solve the offline IL problem efficiently both in theory and in practice. In theory, even if the behavior policy is highly sub-optimal compared to the expert, we show that as long as the data from the behavior policy provides sufficient coverage on the expert state-action traces (and with no necessity for a global coverage over the entire state-action space), MILO can provably combat the covariate shift issue in IL. Complementing our theory results, we also demonstrate that a practical implementation of our approach mitigates covariate shift on benchmark MuJoCo continuous control tasks. We demonstrate that with behavior policies whose performances are less than half of that of the expert, MILO still successfully imitates with an extremely low number of expert state-action pairs while traditional offline IL method such as behavior cloning (BC) fails completely. Source code is provided at https://github.com/jdchang1/milo.


Kids Are Especially Tough to Interview About Abuse. Are Robots the Solution?

Mother Jones

Cindy Bethel was 6 when her babysitter's neighbor started molesting her. Worried what else would happen if she told her parents, she confided in her stuffed panda instead. Sometimes she acted out the abuse with Barbie and Ken dolls. A few years later, the same teen neighbor raped her on a woodpile outside his house. She didn't tell anyone about the assault until long after she moved away from her Ohio hometown.


How AI could make A/B testing a thing of the past - ClickZ

#artificialintelligence

Even though one of the main goals of digital marketing is to serve customers the right message at the right time, we all know what it's like to be chased around the internet by an ad that's completely irrelevant or just plain annoying. And that's because deciding when and where to deliver that message, not to mention the labor that goes into creating the message in the first place, has long involved human guesswork. Granted, those guesses often come after rigorous testing, but those best tests are limited to the often slow-moving process human analysis. Artificial intelligence has already transformed everything from the IT department to the customer service experience, and now, machine learning is on track to completely change the ways we think about ad creative. A recent study by Adlucent found that 46% of consumers say that their ideal online experience would involve free access to websites that served only relevant ads, with 58% reporting that personalized content improves their perception of a brand.


CES 2018: The best products we saw at the show

PCWorld

CES 2018 is winding down, and we finally have a chance to pause and reflect on what we saw that was actually great. Products that advanced their category, or broke new ground. Things that leaped ahead of the competition, Or maybe they just looked cool. It's easy to hit saturation at CES, but these are the products we're still talking about when everything else has blurred together. We start with the product that was so innovative, two of us raved about it.