richard socher
aeb7b30ef1d024a76f21a1d40e30c302-Paper.pdf
Ideally, we want networks to be accurate, calibrated and confident. We show that, as opposed to the standard cross-entropy loss, focal loss [19] allows us to learn models that are already very well calibrated. When combined with temperature scaling, whilst preserving accuracy, it yields state-of-the-art calibrated models. We provide a thorough analysis of the factors causing miscalibration, and use the insights we glean from this to justify the empirically excellent performance of focal loss.
Future of Artificial Intelligence (AI) for Business
Artificial intelligence (AI) is continuing its migration out of the research lab and into the world of business. Leading companies across hundreds of industries are harnessing its power -- from banks analyzing countless data points in seconds to detect fraud, to call centers deploying chatbots to improve customer interactions. These early uses are still fairly limited, but huge advances in deep learning (a subset of machine learning) are starting to impact AI in ways that will soon help society and business tackle a wider set of more general problems. Such advances will also make it possible to automate more complex physical tasks that require adaptability and agility. At Salesforce, we believe AI has tremendous potential for improving the way organizations operate (and you can learn how AI is built into our entire Salesforce Customer 360 here).
Attention, please! A survey of Neural Attention Models in Deep Learning
Correia, Alana de Santana, Colombini, Esther Luna
In humans, Attention is a core property of all perceptual and cognitive operations. Given our limited ability to process competing sources, attention mechanisms select, modulate, and focus on the information most relevant to behavior. For decades, concepts and functions of attention have been studied in philosophy, psychology, neuroscience, and computing. For the last six years, this property has been widely explored in deep neural networks. Currently, the state-of-the-art in Deep Learning is represented by neural attention models in several application domains. This survey provides a comprehensive overview and analysis of developments in neural attention models. We systematically reviewed hundreds of architectures in the area, identifying and discussing those in which attention has shown a significant impact. We also developed and made public an automated methodology to facilitate the development of reviews in the area. By critically analyzing 650 works, we describe the primary uses of attention in convolutional, recurrent networks and generative models, identifying common subgroups of uses and applications. Furthermore, we describe the impact of attention in different application domains and their impact on neural networks' interpretability. Finally, we list possible trends and opportunities for further research, hoping that this review will provide a succinct overview of the main attentional models in the area and guide researchers in developing future approaches that will drive further improvements.
The Batch: Happy New Year! Hopes for AI in 2020: Yann LeCun, Kai-Fu Lee, Anima Anandkumar, Richard Socher
Datasets are critical to AI and machine learning, and they are becoming a key driver of the economy. Collection of sensitive data is increasing rapidly, covering almost every aspect of people's lives. In its current form, this data collection puts both individuals and businesses at risk. I hope that 2020 will be the year when we build the foundation for a responsible data economy. Today, users have almost no control over how data they generate are used.
Computational Linguistics: "Artificial Intelligence Doesn't Make After Work Plans"
It could be that Richard Socher's operating system just runs with more energy than other people's. He has just flown in from California and his body clock is telling him it's still 4 a.m. Already, though, he has delivered a keynote address, participated in a panel and held a question-and-answer session at the START Summit in St. Gallen, Switzerland, an important innovation conference. Despite all that, he's in a good mood as he poses for the ZEIT ONLINE photographer and later helps carry her flash equipment. He then sits down in a drafty corner of the congress hall for the following three-hour interview. After an hour, he remembers that he hasn't yet eaten today.
Simple and Effective Curriculum Pointer-Generator Networks for Reading Comprehension over Long Narratives
Tay, Yi, Wang, Shuohang, Tuan, Luu Anh, Fu, Jie, Phan, Minh C., Yuan, Xingdi, Rao, Jinfeng, Hui, Siu Cheung, Zhang, Aston
This paper tackles the problem of reading comprehension over long narratives where documents easily span over thousands of tokens. We propose a curriculum learning (CL) based Pointer-Generator framework for reading/sampling over large documents, enabling diverse training of the neural model based on the notion of alternating contextual difficulty. This can be interpreted as a form of domain randomization and/or generative pretraining during training. To this end, the usage of the Pointer-Generator softens the requirement of having the answer within the context, enabling us to construct diverse training samples for learning. Additionally, we propose a new Introspective Alignment Layer (IAL), which reasons over decomposed alignments using block-based self-attention. We evaluate our proposed method on the NarrativeQA reading comprehension benchmark, achieving state-of-the-art performance, improving existing baselines by $51\%$ relative improvement on BLEU-4 and $17\%$ relative improvement on Rouge-L. Extensive ablations confirm the effectiveness of our proposed IAL and CL components.