AITopics | Socher, Richard

Collaborating Authors

Socher, Richard

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

CO-Search: COVID-19 Information Retrieval with Semantic Search, Question Answering, and Abstractive Summarization

Esteva, Andre, Kale, Anuprit, Paulus, Romain, Hashimoto, Kazuma, Yin, Wenpeng, Radev, Dragomir, Socher, Richard

arXiv.org Artificial IntelligenceJun-16-2020

The COVID-19 global pandemic has resulted in international efforts to understand, track, and mitigate the disease, yielding a significant corpus of COVID-19 and SARS-CoV-2-related publications across scientific disciplines. As of May 2020, 128,000 coronavirus-related publications have been collected through the COVID-19 Open Research Dataset Challenge [23]. Here we present CO-Search, a retriever-ranker semantic search engine designed to handle complex queries over the COVID-19 literature, potentially aiding overburdened health workers in finding scientific answers during a time of crisis. The retriever is built from a Siamese-BERT[18] encoder that is linearly composed with a TF-IDF vectorizer [19], and reciprocal-rank fused [5] with a BM25 vectorizer. The ranker is composed of a multi-hop question-answering module[1], that together with a multi-paragraph abstractive summarizer adjust retriever scores. To account for the domain-specific and relatively limited dataset, we generate a bipartite graph of document paragraphs and citations, creating 1.3 million (citation title, paragraph) tuples for training the encoder. We evaluate our system on the data of the TREC-COVID[22] information retrieval challenge. CO-Search obtains top performance on the datasets of the first and second rounds, across several key metrics: normalized discounted cumulative gain, precision, mean average precision, and binary preference.

immunology, paragraph, us government, (18 more...)

arXiv.org Artificial Intelligence

2006.09595

Country: North America > United States (0.69)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Government > Regional Government > North America Government > United States Government (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)

Add feedback

It's Morphin' Time! Combating Linguistic Discrimination with Inflectional Perturbations

Tan, Samson, Joty, Shafiq, Kan, Min-Yen, Socher, Richard

arXiv.org Artificial IntelligenceMay-9-2020

Training on only perfect Standard English corpora predisposes pre-trained neural networks to discriminate against minorities from non-standard linguistic backgrounds (e.g., African American Vernacular English, Colloquial Singapore English, etc.). We perturb the inflectional morphology of words to craft plausible and semantically similar adversarial examples that expose these biases in popular NLP models, e.g., BERT and Transformer, and show that adversarially fine-tuning them for a single epoch significantly improves robustness without sacrificing performance on clean data.

artificial intelligence, machine translation, natural language, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.18653/v1/2020.acl-main.263

2005.04364

Country:

Europe (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.15)

Genre: Research Report > New Finding (0.46)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.94)
Information Technology > Communications > Social Media (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.66)

Add feedback

Keeping Your Distance: Solving Sparse Reward Tasks Using Self-Balancing Shaped Rewards

Trott, Alexander, Zheng, Stephan, Xiong, Caiming, Socher, Richard

Neural Information Processing SystemsMar-19-2020, 00:48:34 GMT

While using shaped rewards can be beneficial when solving sparse reward tasks, their successful application often requires careful engineering and is problem specific. For instance, in tasks where the agent must achieve some goal state, simple distance-to-goal reward shaping often fails, as it renders learning vulnerable to local optima. We introduce a simple and effective model-free method to learn from shaped distance-to-goal rewards on tasks where success depends on reaching a goal state. Our method introduces an auxiliary distance-based reward based on pairs of rollouts to encourage diverse exploration. This approach effectively prevents learning dynamics from stabilizing around local optima induced by the naive distance-to-goal reward shaping and enables policies to efficiently solve sparse reward tasks.

artificial intelligence, distance-to-goal reward, sparse reward task, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback

Taylorized Training: Towards Better Approximation of Neural Network Training at Finite Width

Bai, Yu, Krause, Ben, Wang, Huan, Xiong, Caiming, Socher, Richard

arXiv.org Machine LearningFeb-24-2020

We propose \emph{Taylorized training} as an initiative towards better understanding neural network training at finite width. Taylorized training involves training the $k$-th order Taylor expansion of the neural network at initialization, and is a principled extension of linearized training---a recently proposed theory for understanding the success of deep learning. We experiment with Taylorized training on modern neural network architectures, and show that Taylorized training (1) agrees with full neural network training increasingly better as we increase $k$, and (2) can significantly close the performance gap between linearized and full training. Compared with linearized training, higher-order training works in more realistic settings such as standard parameterization and large (initial) learning rate. We complement our experiments with theoretical results showing that the approximation error of $k$-th order Taylorized models decay exponentially over $k$ in wide neural networks.

deep learning, neural network, taylorized training, (17 more...)

arXiv.org Machine Learning

2002.0401

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Zero-Shot Learning Through Cross-Modal Transfer

Socher, Richard, Ganjoo, Milind, Manning, Christopher D., Ng, Andrew

Neural Information Processing SystemsFeb-14-2020, 15:57:59 GMT

This work introduces a model that can recognize objects in images even if no training data is available for the object class. The only necessary knowledge about unseen categories comes from unsupervised text corpora. Unlike previous zero-shot learning models, which can only differentiate between unseen classes, our model can operate on a mixture of objects, simultaneously obtaining state of the art performance on classes with thousands of training images and reasonable performance on unseen classes. This is achieved by seeing the distributions of words in texts as a semantic space for understanding what objects look like. Our deep learning model does not require any manually defined semantic or visual features for either words or images.

cross-modal transfer, deep learning, neural network, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.65)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.63)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.63)

Add feedback

Global Capacity Measures for Deep ReLU Networks via Path Sampling

Theisen, Ryan, Klusowski, Jason M., Wang, Huan, Keskar, Nitish Shirish, Xiong, Caiming, Socher, Richard

arXiv.org Machine LearningOct-22-2019

Classical results on the statistical complexity of linear models have commonly identified the norm of the weights $\|w\|$ as a fundamental capacity measure. Generalizations of this measure to the setting of deep networks have been varied, though a frequently identified quantity is the product of weight norms of each layer. In this work, we show that for a large class of networks possessing a positive homogeneity property, similar bounds may be obtained instead in terms of the norm of the product of weights. Our proof technique generalizes a recently proposed sampling argument, which allows us to demonstrate the existence of sparse approximants of positive homogeneous networks. This yields covering number bounds, which can be converted to generalization bounds for multi-class classification that are comparable to, and in certain cases improve upon, existing results in the literature. Finally, we investigate our sampling procedure empirically, which yields results consistent with our theory.

artificial intelligence, neural network, null, (19 more...)

arXiv.org Machine Learning

1910.10245

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Find or Classify? Dual Strategy for Slot-Value Predictions on Multi-Domain Dialog State Tracking

Zhang, Jian-Guo, Hashimoto, Kazuma, Wu, Chien-Sheng, Wan, Yao, Yu, Philip S., Socher, Richard, Xiong, Caiming

arXiv.org Artificial IntelligenceOct-10-2019

Dialog State Tracking (DST) is a core component in task-oriented dialog systems. Existing approaches for DST usually fall into two categories, i.e, the picklist-based and span-based. From one hand, the picklist-based methods perform classifications for each slot over a candidate-value list, under the condition that a pre-defined ontology is accessible. However, it is impractical in industry since it is hard to get full access to the ontology. On the other hand, the span-based methods track values for each slot through finding text spans in the dialog context. However, due to the diversity of value descriptions, it is hard to find a particular string in the dialog context. To mitigate these issues, this paper proposes a Dual Strategy for DST (DS-DST) to borrow advantages from both the picklist-based and span-based methods, by classifying over a picklist or finding values from a slot span. Empirical results show that DS-DST achieves the state-of-the-art scores in terms of joint accuracy, i.e., 51.2% on the MultiWOZ 2.1 dataset, and 53.3% when the full ontology is accessible.

deep learning, neural network, restaurant, (22 more...)

arXiv.org Artificial Intelligence

1910.03544

Country: North America > United States > Illinois (0.14)

Genre: Research Report > New Finding (0.34)

Industry: Consumer Products & Services (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Entropy Penalty: Towards Generalization Beyond the IID Assumption

Arpit, Devansh, Xiong, Caiming, Socher, Richard

arXiv.org Machine LearningSep-30-2019

A BSTRACT It has been shown that instead of learning actual object features, deep networks tend to exploit non-robust (spurious) discriminative features that are shared between training and test sets. Therefore, while they achieve state of the art performance on such test sets, they achieve poor generalization on out of distribution (OOD) samples where the IID (independent, identical distribution) assumption breaks and the distribution of non-robust features shifts. Through theoretical and empirical analysis, we show that this happens because maximum likelihood training (without appropriate regularization) leads the model to depend on all the correlations (including spurious ones) present between inputs and targets in the dataset. We then show evidence that the information bottleneck (IB) principle can address this problem. To do so, we propose a regularization approach based on IB, called Entropy Penalty, that reduces the model's dependence on spurious features-features corresponding to such spurious correlations. This allows deep networks trained with Entropy Penalty to generalize well even under distribution shift of spurious features. As a controlled test-bed for evaluating our claim, we train deep networks with Entropy Penalty on a colored MNIST (C-MNIST) dataset and show that it is able to generalize well on vanilla MNIST, MNIST -M and SVHN datasets in addition to an OOD version of C-MNIST itself. The baseline regularization methods we compare against fail to generalize on this test-bed. An example of non-robust feature is the presence of desert in camel images, which may correlate well with this object class. More realistically, models can learn to exploit the abundance of input-target correlations present in datasets, not all of which may be invariant under different environments. Interestingly, such classifiers can achieve good performance on test sets which share the same non-robust features. However, due to this exploitation, these classifiers perform poorly under distribution shift (Geirhos et al., 2018a; Hendrycks & Dietterich, 2019) because it violates the IID assumption which is the foundation of existing generalization theory (Bartlett & Mendelson, 2002; McAllester, 1999b;a).

deep learning, entropy penalty, neural network, (19 more...)

arXiv.org Machine Learning

1910.00164

Genre: Research Report (0.50)

Industry: Information Technology (0.46)

Add feedback

CoSQL: A Conversational Text-to-SQL Challenge Towards Cross-Domain Natural Language Interfaces to Databases

Yu, Tao, Zhang, Rui, Er, He Yang, Li, Suyi, Xue, Eric, Pang, Bo, Lin, Xi Victoria, Tan, Yi Chern, Shi, Tianze, Li, Zihan, Jiang, Youxuan, Yasunaga, Michihiro, Shim, Sungrok, Chen, Tao, Fabbri, Alexander, Li, Zifan, Chen, Luyao, Zhang, Yuwen, Dixit, Shreya, Zhang, Vincent, Xiong, Caiming, Socher, Richard, Lasecki, Walter S, Radev, Dragomir

arXiv.org Artificial IntelligenceSep-11-2019

It consists of 30k turns plus 10k annotated SQL queries, obtained from a Wizard-of-Oz (WOZ) collection of 3k dialogues querying 200 complex DBs spanning 138 domains. Each dialogue simulates a real-world DB query scenario with a crowd worker as a user exploring the DB and a SQL expert retrieving answers with SQL, clarifying ambiguous questions, or otherwise informing of unanswerable questions. When user questions are answerable by SQL, the expert describes the SQL and execution results to the user, hence maintaining a natural interaction flow. CoSQL introduces new challenges compared to existing task-oriented dialogue datasets: (1) the dialogue states are grounded in SQL, a domain-independent executable representation, instead of domain-specific slot-value pairs, and (2) because testing is done on unseen databases, success requires generalizing to new domains. CoSQL includes three tasks: SQL-grounded dialogue state tracking, response generation from query results, and user dialogue act prediction. We evaluate a set of strong baselines for each task and show that CoSQL presents significant challenges for future research. The dataset, baselines, and leaderboard will be released at https:// yale-lily.github.io/cosql .

computational linguistics, deep learning, neural network, (18 more...)

arXiv.org Artificial Intelligence

1909.05378

Country:

Europe (0.93)
North America > United States > California (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.34)

Add feedback

Pretrained AI Models: Performativity, Mobility, and Change

Varshney, Lav R., Keskar, Nitish Shirish, Socher, Richard

arXiv.org Artificial IntelligenceSep-7-2019

The paradigm of pretrained deep learning models has recently emerged in artificial intelligence practice, allowing deployment in numerous societal settings with limited computational resources, but also embedding biases and enabling unintended negative uses. In this paper, we treat pretrained models as objects of study and discuss the ethical impacts of their sociological position. We discuss how pretrained models are developed and compared under the common task framework, but that this may make self-regulation inadequate. Further how pretrained models may have a performative effect on society that exacerbates biases. We then discuss how pretrained models move through actor networks as a kind of computationally immutable mobile, but that users also act as agents of technological change by reinterpreting them via fine-tuning and transfer. We further discuss how users may use pretrained models in malicious ways, drawing a novel connection between the responsible innovation and user-centered innovation literatures. We close by discussing how this sociological understanding of pretrained models can inform AI governance frameworks for fairness, accountability, and transparency.

deep learning, neural network, pretrained model, (22 more...)

arXiv.org Artificial Intelligence

1909.0329

Country: North America > United States (1.00)

Genre: Research Report (0.50)

Industry:

Law (1.00)
Health & Medicine (1.00)
Automobiles & Trucks (0.93)
Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Add feedback