Accuracy
Bootstrapping Automation with Teleoperation and Data-Driven Reinforcement Learning
A new job figure is silently emerging: teleoperators, that is human piloting robots. While remotely completing a task is useful by itself, there is much more to it. Every successful trial can be logged, building a dataset of experiences. If done properly, we can build an infinitely reusable learning resource to train any number of autonomous robots to perform the same tasks. Here I will go over the potential and unsolved problems in teleoperation, review selected projects in data-driven learning and speculate on the evolution and opportunities in the nascent teleoperation industry, with focus on manipulation. Introduction Building useful AI agents is hard. Since we didn’t figure out a priori how to build general intelligence, the best we can as of today is to take a statistical regressive approach: build a huge dataset which incorporates the behaviour we would like to see, take a predictive function with a lot of free parameters and finally write an algorithm to tune these parameters so that the function is often correct when faced with a data point similar to the ones in the dataset. You may not like it, but the bitter lesson is that it works reasonably well, since we have powerful computers. Now, building useful physical AI agents, a.k.a robots, is harder. Traditional approaches require online interactions: the robot is learning while performing the actions. While this works well in virtual environments such as video games, deploying a baby robot to learn in the real world is costly and dangerous; moreover such online algorithms are not suited to reuse past experiences. To overcome these problems, a promising direction is data-driven reinforcement learning (also named offline-RL or batch-RL), in which we train agents offline on a dataset of already collected experiences. The training can be done virtually and then the learned skills, usually in the form of a neural network, can be deployed to a real robot. If we assume that offline-RL works well we are now able to reuse past-experiences indefinitely, but we still haven’t solved the most crucial problem: the access to a large dataset. As of today, we simply miss the large datasets of robot-experiences needed to power the learning, in the same way as large amounts of labelled pictures and text powered advances in computer vision and natural language processing. Teleoperation has entered the chat. Teleoperation Teleoperation, or Telerobotics, is about separating the brain and body: the human operators control the movements and take decisions, the robot executes. In fact there are different degrees of teleoperation: Direct Control: The operator is controlling the motion of the robot directly and without any automated help. Shared Control: Some degree of autonomy or automated help is available to assist the user. Such autonomy is set in advance and is fixed. Shared Autonomy: Same as Shared Control, but the level of autonomy is adjusted dynamically (and autonomously!) according to the situation. Supervisory Control: The control happens at a very high level, the robot executes nearly all the functions autonomously. (to go deeper, check Autonomy in Physical Human-Robot Interaction: a Brief Survey and the classic reference (cap 43)). The robot can really be anything: a drone, a manipulator, a vehicle or a humanoid. The robot hardware often dictates how we command the robot; a non exhaustive list of controllers includes joysticks, steering wheels, virtual reality kits, twin robotic arms, haptic controllers and suites, electromyography sensors and tracking the operator body with a camera, using fiducial markers or AI body tracking! Despite being old, with roots going back to the 1940s and 1950s in the remote manipulation of radioactive waste, teleoperation is still not a mature technology. My favourite overview of teleoperation shortcomings remains this 2007 paper, which mentions 8 limiting factors: cameras narrow field of view, figuring out the robot orientation and attitude, multi camera setups logistic, low frame rates, degradation due to motion, egocentric-vs-exocentric camera view tradeoffs, depth perception and stream latency. While all these directions still need to be perfected, in my experience the latency and its unpredictability, that is time delays in the sensor stream, remains the largest bottleneck to a fluid teleoperation experience. In this spirit, teleoperating fixed robots such as manipulators is going to be much easier than mobile robots in outdoor environments since the former can use wired connections, while the latter are forced to rely on cellular connections. Currently there is a significant amount of research into improving the operator User Interface. For instance to alleviate latency it’s possible to overlay a predictive model to the real time feed, such as a “ghost“ of the future robot state. In this way the operator has the illusion to control a zero latency robot. Virtual fixtures and augmented reality markers can also help. It is also possible to train an AI to map low dimensional inputs into complex actions, so that the operator can perform complex tasks with a simple joystick. For instance if a manipulator needs to grasp a cup on a table, the operator can simply instruct the manipulator to get closer to the cup and the AI will infer from the camera scene that the operator is looking to grasp it, therefore controlling all the fine motor skills required in the grasping. As a side effect, easier controls allow low skilled teleoperators to operate complex scenarios, which is important since today there are few expert operators. It’s worth stressing that all this area around shared control and UI needs careful engineering, indeed paradoxically shared control can be harder than just automating, since we need to take into account how the operator reacts to the partial automation. It must be done well, keeping automated and human tasks separated and smoothly glued, so that the automation does not surprise the operator, causing the typical wait and see behaviour (input a command, wait for the robot to finish the movement, input another command, repeat). In closing this intro, we can very crudely divide teleoperation tasks into two macro categories: driving and manipulation. These two classes present opposite challenges: driving (which includes piloting drones, cars and robot dogs walking) difficulty comes from the unpredictability of the environment and the requirement for fast reaction times, while in terms of control it’s easy (brake, accelerate, turn the steering wheel and not much more). Manipulation instead usually operates in slowly varying or fixed environments and in non time critical scenarios, but the controls are very nuanced and high dimensional. So for the rest of the article I will focus on robot manipulators, since that’s where the biggest learning challenges lie. The Nascent Teleoperation Industry and Bootstrapping Automation Today teleoperation is mainly being used in high touch use cases, such as medical surgery, nuclear decommissioning, space and undersea robotics. The operators are expensive professional domain experts, but their cost is justified since the alternative is too expensive or dangerous. With the increased quality and dropping cost of collaborative robots and the advancements in artificial intelligence in the next few years teleoperation will expand to service use cases, such as in warehouses, light manufacturing, commercial kitchens and labs. How is this possible? Having a teleoperator, even if low skilled, continuously tele operating the robot will rarely make sense. In fact what will happen is that teleoperation will be used to bootstrap automation and then for remote assistance. To understand why, it’s important to recall why automation is hard in the first place. Robots have superhuman precision already, so automating a fixed scenario it’s always possible with superhuman performance. The problem is that in real life no two environments are the same: different lighting, different objects to interact with, different arrangements, different success criteria. Considering these needs for autonomy and flexibility, here’s the example of how the development and deployment of a kitchen robot manipulator which is tasked to assemble rice bowls may look like: The robot manipulator is trained to learn from a mixture of teleoperated, simulated and unsupervised experiences, from a standardised kitchen, using popular ingredients, tools and appliances. Every experience is divided into a set of basic actions, such as pick&place, mix, pour, sprinkle and it is logged as camera streams, position and velocity of the robot joints, force sensors and any other sensor measurement available. To every experience a score is also assigned, so that the robot learns what is an acceptable end state. After months of development the robot is able to reliably prepare bowls in the training scenario, but the performance in a different kitchen would be poor. The system is deployed in a real kitchen, but for the first weeks a teleoperator has direct control of the robot, getting feedback from the kitchen owner. Every single experience with the new setup is logged and the AI is trained to imitate the operator. The AI performance is continuously tested, by asking in real time what action it would take and then confronting with the action that the operator actually takes. After a few weeks the AI is accurate enough to be left in control. The teleperator is called a few times a day to iron out the edge cases in which the AI keeps making mistakes. After some time the error rate is so low that a single teleoperator can assist more than 50 deployed robots at the same time. The aggregate experiences are trained offline and the robot firmware is routinely updated, so that the robot reaches superhuman performances. Future deployments proceed as in step 2, but the coaching time of the teleoperator keeps decreasing as the global dataset of experience increases in size. Eventually a few hours of demonstrations are enough to onboard a new kitchen. Also deploying the robot for different tasks becomes gradually easier, as basic actions such as pick&place can be reused. In a first instance, robotic companies will vertically integrate and have their own fleet of teleoperators. Eventually specialised infrastructure providers will leverage economies of scale to provide teleoperation-as-a-service, providing flexible fleets of operators when needed. This is similar to how companies providing dataset labelling services operate today, but teleoperation is destined to be a much larger industry as the value of the market size being automated is bigger and the need for teleoperation assistance persists after the initial training. Where the teleoperators are actually located will depend on how critical latency is and what are the regulations and liabilities around remote work for physical tasks, something which at the moment is pretty niche. Besides professional services with strict accuracy standards and trained operators, it will be possible to crowdsource demonstrations from the public, perhaps even inside gamified environments. In the long run, as teleoperation tooling and humanoid robots get cheaper, teleoperation will be rolled out to consumers for tele-existence. Hopefully by then we will not worry about work, and the main use cases will be around entertainment, social connections and exploration. Data-Driven Reinforcement Learning Today As said, offline reinforcement learning will be critical to scale since having a real robot to learn in a real environment is too slow and dangerous. Here I will scratch the tip of the research iceberg and mention a few approaches to tackle the offline reinforcement learning pipeline. A crucial element in building a dataset for offline RL is establishing the reward of each experience. This is in my view the strongest explanation as to why a teleoperator is needed to bootstrap automation: the operator needs to understand what “good” means for every single deployment and act accordingly, iterating over the feedback of the new robot owner. In this light, I find the ideas in Scaling data-driven robotics with reward sketching and batch reinforcement learning pretty interesting. Firstly they provide an intuitive mechanism to sketch the reward of a given experience, so that every single camera frame is rated according to how close we are to the desired goal. This helps having a more granular reward distribution than just rating trajectories as good or bad, even though it introduces some degree of subjectivity, since different operators will have different definitions of being close to the goal. More importantly, based on the human labelled rewards, they propose a mechanism to automatically relabel all the dataset accumulated over previous experiences, so that a large amount of data can be leveraged to learn a new task out a few initial demonstrations. Basically the reward annotations produced by the sketching procedure are used to train a reward model, which is then used to predict the reward of all the past data, according to the new definition of success. Ideally a dataset of 10.000 demonstrations for task A can be relabelled to be a dataset of 10.000 demonstrations for task B, assuming that the tasks are not too different. This approach is somewhat opposite to the usual deep learning paradigm of pretraining an AI agent on a large heterogeneous dataset and then fine-tuning using a small amount of data coming from the use case of interest. A more traditional approach is followed in Bridge Data: Boosting Generalization of Robotic Skills with Cross-Domain Datasets, where a medium-size dataset (7200 demonstrations) collected from 71 kitchen tasks is used to bootstrap training on 10 unseed tasks, resulting in a 2x performance improvement with respect to just training the new tasks from scratch. It remains an open research question to quantify how much we can push the performance improvements if we leverage a truly large dataset, containing millions of demonstrations. To my knowledge the easiest resource to get started with offline RL is Robomimic, an open source framework to learn from demonstrations. It provides a set of standardised datasets containing action-state-reward trajectories, with emphasis on human-provided demonstrations and support for multiple observation spaces, including visuomotor policies. The datasets available also contain different sources such as single expert teleoperators, multiple teleoperators and machine-generated trajectories across several simulated and real-world tasks. It contains implementations for several offline learning and imitation learning algorithms, including Behaviour Cloning, Behaviour Cloning-RNN, HBC, IRIS, BCQ, CQL, and TD3-BC (by the way, when to use offline reinforcement learning vs imitation learning?). In the same ecosystem we find RoboTurk, a project to lower the barrier to create large scale crowdsourced datasets and RoboSuite a Mujoco-based simulation framework with benchmark environments. Robomimic is well structured, with a clear documentation instructing on how to create a dataset and train an agent. It is still a bit rough, but hopefully the project will keep being maintained so that other projects can avoid duplicating efforts when training robots. Finally I want to mention d3rlpy, a library for offline and online reinforcement learning. The best thing about it is that it’s intuitive and well documented, with clear examples already available in the README. Outro Bringing artificial intelligence to the real world is going to be hard, many and many practitioners agree. But the takeaway message I want to convey is the following: there is a way to bootstrap automation in the short term, relying heavily on human-in-the-loop teleoperation. Such reliance on humans should be seen as a feature, not as a bug. Commercialising a novel product is always a massive undertaking, but in robotics this is exacerbated by the slow hardware development cycle. A teleop-first approach shortens the iteration cycles, incorporates feedback from the end-user and creates a pool of experiences on which to build scalable solutions.
Entropic Associative Memory for Manuscript Symbols
Morales, Rafael, Hernández, Noé, Cruz, Ricardo, Cruz, Victor D., Pineda, Luis A.
Manuscript symbols can be stored, recognized and retrieved from an entropic digital memory that is associative and distributed but yet declarative; memory retrieval is a constructive operation, memory cues to objects not contained in the memory are rejected directly without search, and memory operations can be performed through parallel computations. Manuscript symbols, both letters and numerals, are represented in Associative Memory Registers that have an associated entropy. The memory recognition operation obeys an entropy trade-off between precision and recall, and the entropy level impacts on the quality of the objects recovered through the memory retrieval operation. The present proposal is contrasted in several dimensions with neural networks models of associative memory. We discuss the operational characteristics of the entropic associative memory for retrieving objects with both complete and incomplete information, such as severe occlusions. The experiments reported in this paper add evidence on the potential of this framework for developing practical applications and computational models of natural memory.
Why Machine Learning Models Die In Silence?
The meaning of life differs from man to man, from day to day, and from hour to hour -- Viktor E. Frankle, Man's search for meaning. Frankle was not only right about the meaning of life, his saying was correct about machine learning models in production too. ML models perform well when you deploy them in production. Its quality of predictions decay and soon becomes less valuable. This is the primary difference between a software deployment and a machine learning one.
Confusion Matrix without Confused
As we know, the output for classification problem is consists from two target variables, either 0 or 1; Yes or No; Positive or Negative; etc. and our model is trying to classify whether a specific data is 0 or 1; Yes or No; etc. The columns are representing the True Class, which means true or real label for the specific data. The rows are representing the Predicted Class, which means the prediction results derived from our model for the specific use case. True Positive (TP) TP is simply the count of data where the Predicted value is Positive and True value is Positive too. True Negative (TN) TN is simply the count of data where the Predicted value is Negative and True value is Negative too.
Fairness Through Counterfactual Utilities
Group fairness definitions such as Demographic Parity and Equal Opportunity make assumptions about the underlying decision-problem that restrict them to classification problems. Prior work has translated these definitions to other machine learning environments, such as unsupervised learning and reinforcement learning, by implementing their closest mathematical equivalent. As a result, there are numerous bespoke interpretations of these definitions. Instead, we provide a generalized set of group fairness definitions that unambiguously extend to all machine learning environments while still retaining their original fairness notions. We derive two fairness principles that enable such a generalized framework. First, our framework measures outcomes in terms of utilities, rather than predictions, and does so for both the decision-algorithm and the individual. Second, our framework considers counterfactual outcomes, rather than just observed outcomes, thus preventing loopholes where fairness criteria are satisfied through self-fulfilling prophecies. We provide concrete examples of how our counterfactual utility fairness framework resolves known fairness issues in classification, clustering, and reinforcement learning problems. We also show that many of the bespoke interpretations of Demographic Parity and Equal Opportunity fit nicely as special cases of our framework.
Grasp-and-Lift Detection from EEG Signal Using Convolutional Neural Network
Hasan, Md. Kamrul, Wahid, Sifat Redwan, Rahman, Faria, Maliha, Shanjida Khan, Rahman, Sauda Binte
People undergoing neuromuscular dysfunctions and amputated limbs require automatic prosthetic appliances. In developing such prostheses, the precise detection of brain motor actions is imperative for the Grasp-and-Lift (GAL) tasks. Because of the low-cost and non-invasive essence of Electroencephalography (EEG), it is widely preferred for detecting motor actions during the controls of prosthetic tools. This article has automated the hand movement activity viz GAL detection method from the 32-channel EEG signals. The proposed pipeline essentially combines preprocessing and end-to-end detection steps, eliminating the requirement of hand-crafted feature engineering. Preprocessing action consists of raw signal denoising, using either Discrete Wavelet Transform (DWT) or highpass or bandpass filtering and data standardization. The detection step consists of Convolutional Neural Network (CNN)- or Long Short Term Memory (LSTM)-based model. All the investigations utilize the publicly available WAY-EEG-GAL dataset, having six different GAL events. The best experiment reveals that the proposed framework achieves an average area under the ROC curve of 0.944, employing the DWT-based denoising filter, data standardization, and CNN-based detection model. The obtained outcome designates an excellent achievement of the introduced method in detecting GAL events from the EEG signals, turning it applicable to prosthetic appliances, brain-computer interfaces, robotic arms, etc.
Scikit Learn Confusion Matrix - Python Guides
In this Python tutorial, we will learn How Scikit learn confusion matrix works in Python and we will also cover different examples related to Scikit learn confusion matrix. And, we will cover these topics. In this section, we will learn about how the Scikit learn confusion matrix works in python. After running the above code, we get the following output in which we can see that the confusion matrix value is printed on the screen. In this section, we will learn about how Scikit learn confusion matrix example works in python.
Uncalibrated Models Can Improve Human-AI Collaboration
Vodrahalli, Kailas, Gerstenberg, Tobias, Zou, James
In many practical applications of AI, an AI model is used as a decision aid for human users. The AI provides advice that a human (sometimes) incorporates into their decision-making process. The AI advice is often presented with some measure of "confidence" that the human can use to calibrate how much they depend on or trust the advice. In this paper, we demonstrate that presenting AI models as more confident than they actually are, even when the original AI is well-calibrated, can improve human-AI performance (measured as the accuracy and confidence of the human's final prediction after seeing the AI advice). We first learn a model for how humans incorporate AI advice using data from thousands of human interactions. This enables us to explicitly estimate how to transform the AI's prediction confidence, making the AI uncalibrated, in order to improve the final human prediction. We empirically validate our results across four different tasks -- dealing with images, text and tabular data -- involving hundreds of human participants. We further support our findings with simulation analysis. Our findings suggest the importance of and a framework for jointly optimizing the human-AI system as opposed to the standard paradigm of optimizing the AI model alone.
Inference of Multiscale Gaussian Graphical Model
Sanou, Do Edmond, Ambroise, Christophe, Robin, Geneviève
Gaussian Graphical Models (GGMs) are widely used for exploratory data analysis in various fields such as genomics, ecology, psychometry. In a high-dimensional setting, when the number of variables exceeds the number of observations by several orders of magnitude, the estimation of GGM is a difficult and unstable optimization problem. Clustering of variables or variable selection is often performed prior to GGM estimation. We propose a new method allowing to simultaneously infer a hierarchical clustering structure and the graphs describing the structure of independence at each level of the hierarchy. This method is based on solving a convex optimization problem combining a graphical lasso penalty with a fused type lasso penalty. Results on real and synthetic data are presented.
Inference and FDR Control for Simulated Ising Models in High-dimension
Wei, Haoyu, Lei, Xiaoyu, Zhang, Huiming
The (probabilistic) graphical model consists of a collection of probability distributions that factorize according to the structure of an underlying graph [52]. The graphical model captures the complex dependencies among random variables and build large-scale multivariate statistical models, which has been used in many research areas such as hierarchical Bayesian models [27], contingency table analysis [20, 53] in categorical data analysis [1, 23, 37], constraint satisfaction [16, 15], language and speech processing [11, 31], image processing [17, 24, 28] and spatial statistics more generally [8]. In our work, we focus on the undirected graphical models, where the probability distribution factorizes according to the function defined on the cliques of the graph. The undirected graphical models have a variety of applications, including statistical physics [32], natural language processing [38], image analysis [54] and spatial statistics [43]. Specifically, we pay attention to the undirected graphical models which can be described as exponential families, a broad class of probability distributions elaborately studied in many statistical literature [4, 21, 13]. The properties of the exponential families provide some connections between the inference methods and the convex analysis [12, 29]. There are many well-known examples that are undirected graphical models viewed as exponential families, such as Ising model [32, 5], Gaussian MRF [46] and latent Dirichlet allocation [11].