Goto

Collaborating Authors

 Memory-Based Learning


BabyAI++: Towards Grounded-Language Learning beyond Memorization

arXiv.org Artificial Intelligence

Despite success in many real-world tasks (e.g., robotics), reinforcement learning (RL) agents still learn from tabula rasa when facing new and dynamic scenarios. By contrast, humans can offload this burden through textual descriptions. Although recent works have shown the benefits of instructive texts in goal-conditioned RL, few have studied whether descriptive texts help agents to generalize across dynamic environments. To promote research in this direction, we introduce a new platform, BabyAI++, to generate various dynamic environments along with corresponding descriptive texts. Moreover, we benchmark several baselines inherited from the instruction following setting and develop a novel approach towards visually-grounded language learning on our platform. Extensive experiments show strong evidence that using descriptive texts improves the generalization of RL agents across environments with varied dynamics.


IBM Watson can answer all your coronavirus questions

#artificialintelligence

In order to help government agencies, academic institutions and healthcare organizations handle the influx of calls and messages regarding the coronavirus, IBM has announced that it will provide a bundle of Watson services for free. The company will combine Watson Assistant, which uses IBM Research's natural language processing technology, with Watson Discovery to create IBM Watson Assistant for Citizens. The new Watson suite will be available online and on smartphones and will be free for at least 90 days. According to IBM, wait times for coronavirus-related questions are exceeding two hours, so the company believes that using AI via Watson may be able to help speed up response times. "While helping government agencies and healthcare institutions use AI to get critical information out to their citizens remains a high priority right now, the current environment has made it clear that every business in every industry should find ways to digitally engage with their clients and employees. With today's news, IBM is taking years of experience in helping thousands of global businesses and institutions use Natural Language Processing and other advanced AI technologies to better meet the demands of their constituents, and now applying it to the COVID-19 crisis. AI has the power to be your assistant during this uncertain time."


Former IBM Watson Team Leader David Ferrucci on AI and Elemental Cognition

#artificialintelligence

Dr. David Ferrucci is one of the few people who have created a benchmark in the history of AI because when IBM Watson won Jeopardy we reached a milestone many thought impossible. I was very privileged to have Ferrucci on my podcast in early 2012 when we spent an hour on Watson's intricacies and importance. Well, it's been almost 8 years since our original conversation and it was time to catch up with David to talk about the things that have happened in the world of AI, the things that didn't happen but were supposed to, and our present and future in relation to Artificial Intelligence. All in all, I was super excited to have Ferrucci back on my podcast and hope you enjoy our conversation as much as I did. During this 90 min interview with David Ferffucci, we cover a variety of interesting topics such as: his perspective on IBM Watson; AI, hype and human cognition; benchmarks on the singularity timeline; his move away from IBM to the biggest hedge fund in the world; Elemental Cognition and its goals, mission and architecture; Noam Chomsky and Marvin Minsky's skepticism of Watson; deductive, inductive and abductive learning; leading and managing from the architecture down; Black Box vs Open Box AI; CLARA โ€“ Collaborative Learning and Reading Agent and the best and worst applications thereof; the importance of meaning and whether AI can be the source of it; whether AI is the greatest danger humanity is facing today; why technology is a magnifying mirror; why the world is transformed by asking questions.


Learn to Forget: User-Level Memorization Elimination in Federated Learning

arXiv.org Machine Learning

Federated learning is a decentralized machine learning technique that evokes widespread attention in both the research field and the real-world market. However, the current privacy-preserving federated learning scheme only provides a secure way for the users to contribute their private data but never leaves a way to withdraw the contribution to model update. Such an irreversible setting potentially breaks the regulations about data protection and increases the risk of data extraction. To resolve the problem, this paper describes a novel concept for federated learning, called memorization elimination. Based on the concept, we propose \sysname, a federated learning framework that allows the user to eliminate the memorization of its private data in the trained model. Specifically, each user in \sysname is deployed with a trainable dummy gradient generator. After steps of training, the generator can produce dummy gradients to stimulate the neurons of a machine learning model to eliminate the memorization of the specific data. Also, we prove that the additional memorization elimination service of \sysname does not break the common procedure of federated learning or lower its security.


Small ReLU networks are powerful memorizers: a tight analysis of memorization capacity

Neural Information Processing Systems

We study finite sample expressivity, i.e., memorization power of ReLU networks. Recent results require $N$ hidden nodes to memorize/interpolate arbitrary $N$ data points. In contrast, by exploiting depth, we show that 3-layer ReLU networks with $\Omega(\sqrt{N})$ hidden nodes can perfectly memorize most datasets with $N$ points. We also prove that width $\Theta(\sqrt{N})$ is necessary and sufficient for memorizing $N$ data points, proving tight bounds on memorization capacity. The sufficiency result can be extended to deeper networks; we show that an $L$-layer network with $W$ parameters in the hidden layers can memorize $N$ data points if $W \Omega(N)$.


Google Leverages Machine Learning to Improve Document Detection Capabilities

#artificialintelligence

With the rise in technology and enhanced connectivity, we are unintentionally moving toward a more insecure world of malicious activities. Businesses today, while deploying technology, fear the loss they would face if security gets compromised. As most of them operate through e-mails, it turns into a major source for malware attacks. Moreover, lots of emails are sent with malicious intent, putting a heavy burden on Gmail to protect users. As it turns out, a lot of malicious attachments come from documents, but through innovation brought in by Google, Gmail is getting better at detecting them.


Financial institutions can gain new AI model risk management

#artificialintelligence

Many financial institutions are rapidly developing and adopting AI models. They're using the models to achieve new competitive advantages such as being able to make faster and more successful underwriting decisions. However, AI models introduce new risks. In a previous post, I describe why AI models increase risk exposure compared to the more traditional, rule-based models that have been in use for decades. In short, if AI models have been trained on biased data, lack explainability, or perform inadequately, they can expose organizations to as much as seven-figure losses or fines.


Explaining Memorization and Generalization: A Large-Scale Study with Coherent Gradients

arXiv.org Machine Learning

Coherent Gradients is a recently proposed hypothesis to explain why over-parameterized neural networks trained with gradient descent generalize well even though they have sufficient capacity to memorize the training set. Inspired by random forests, Coherent Gradients proposes that (Stochastic) Gradient Descent (SGD) finds common patterns amongst examples (if such common patterns exist) since descent directions that are common to many examples add up in the overall gradient, and thus the biggest changes to the network parameters are those that simultaneously help many examples. The original Coherent Gradients paper validated the theory through causal intervention experiments on shallow, fully connected networks on MNIST. In this work, we perform similar intervention experiments on more complex architectures (such as VGG, Inception and ResNet) on more complex datasets (such as CIFAR-10 and ImageNet). Our results are in good agreement with the small scale study in the original paper, thus providing the first validation of coherent gradients in more practically relevant settings. We also confirm in these settings that suppressing incoherent updates by natural modifications to SGD can significantly reduce overfitting--lending credence to the hypothesis that memorization occurs when few examples are responsible for most of the gradient used in the update. Furthermore, we use the coherent gradients theory to explore a new characterization of why some examples are learned earlier than other examples, i.e., "easy" and "hard" examples.


IBM Watson Gains The Ability To Understand Complex Topics

#artificialintelligence

IBM recently announced several new Watson technologies designed to help organizations identify, understand, and analyze some of the most challenging aspects of the English language with greater clarity and insights. These new features are considered the first commercialization of key Natural Language Processing (NLP) capabilities to come from IBM Research's Project Debater. There is a new advanced sentiment analysis feature defined to identify and analyze idioms and colloquialisms for the first time. So it can recognize phrases such as "hardly helpful" or "hot under the collar." Phrases like those have been challenging for artificial intelligence systems since they are difficult for algorithms to spot.


Analyzing and Improving a Watson Assistant Solution Part 3: Recipes for common analytic patterns

#artificialintelligence

In previous posts we explored what analysts want to discover about their virtual assistant and some building blocks for building analytics. In this post I will demonstrate some common recipes tailored to Watson Assistant logs. First we extract raw log events and store on the file system. This requires the apikey and URL for your skill. For a single-skill assistant you will also need the workspace ID (extractable from the "Legacy v1 Workspace URL"), for a multi-skill assistant there are other IDs you can use to filter on (described in the Watson Assistant list log events API).