AITopics | Kozareva, Zornitsa

Collaborating Authors

Kozareva, Zornitsa

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ToKen: Task Decomposition and Knowledge Infusion for Few-Shot Hate Speech Detection

AlKhamissi, Badr, Ladhak, Faisal, Iyer, Srini, Stoyanov, Ves, Kozareva, Zornitsa, Li, Xian, Fung, Pascale, Mathias, Lambert, Celikyilmaz, Asli, Diab, Mona

arXiv.org Artificial IntelligenceMay-20-2023

Hate speech detection is complex; it relies on commonsense reasoning, knowledge of stereotypes, and an understanding of social nuance that differs from one culture to the next. It is also difficult to collect a large-scale hate speech annotated dataset. In this work, we frame this problem as a few-shot learning task, and show significant gains with decomposing the task into its "constituent" parts. In addition, we see that infusing knowledge from reasoning datasets (e.g. Atomic2020) improves the performance even further. Moreover, we observe that the trained models generalize to out-of-distribution datasets, showing the superiority of task decomposition and knowledge infusion compared to previously used methods. Concretely, our method outperforms the baseline by 17.83% absolute gain in the 16-shot case.

artificial intelligence, few-shot hate speech detection, task decomposition and knowledge infusion, (1 more...)

arXiv.org Artificial Intelligence

2205.12495

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.53)

Add feedback

Approximating 1-Wasserstein Distance with Trees

Yamada, Makoto, Takezawa, Yuki, Sato, Ryoma, Bao, Han, Kozareva, Zornitsa, Ravi, Sujith

arXiv.org Machine LearningJun-24-2022

Wasserstein distance, which measures the discrepancy between distributions, shows efficacy in various types of natural language processing (NLP) and computer vision (CV) applications. One of the challenges in estimating Wasserstein distance is that it is computationally expensive and does not scale well for many distribution comparison tasks. In this paper, we aim to approximate the 1-Wasserstein distance by the tree-Wasserstein distance (TWD), where TWD is a 1-Wasserstein distance with tree-based embedding and can be computed in linear time with respect to the number of nodes on a tree. More specifically, we propose a simple yet efficient L1-regularized approach to learning the weights of the edges in a tree. To this end, we first show that the 1-Wasserstein approximation problem can be formulated as a distance approximation problem using the shortest path distance on a tree. We then show that the shortest path distance can be represented by a linear model and can be formulated as a Lasso-based regression problem. Owing to the convex formulation, we can obtain a globally optimal solution efficiently. Moreover, we propose a tree-sliced variant of these methods. Through experiments, we demonstrated that the weighted TWD can accurately approximate the original 1-Wasserstein distance.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Machine Learning

2206.12116

Country: Europe > France (0.14)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)

Add feedback

Few-shot Learning with Multilingual Language Models

Lin, Xi Victoria, Mihaylov, Todor, Artetxe, Mikel, Wang, Tianlu, Chen, Shuohui, Simig, Daniel, Ott, Myle, Goyal, Naman, Bhosale, Shruti, Du, Jingfei, Pasunuru, Ramakanth, Shleifer, Sam, Koura, Punit Singh, Chaudhary, Vishrav, O'Horo, Brian, Wang, Jeff, Zettlemoyer, Luke, Kozareva, Zornitsa, Diab, Mona, Stoyanov, Veselin, Li, Xian

arXiv.org Artificial IntelligenceDec-20-2021

Large-scale autoregressive language models such as GPT-3 are few-shot learners that can perform a wide range of language tasks without fine-tuning. While these models are known to be able to jointly represent many different languages, their training data is dominated by English, potentially limiting their cross-lingual generalization. In this work, we train multilingual autoregressive language models on a balanced corpus covering a diverse set of languages, and study their few- and zero-shot learning capabilities in a wide range of tasks. Our largest model with 7.5 billion parameters sets new state of the art in few-shot learning in more than 20 representative languages, outperforming GPT-3 of comparable size in multilingual commonsense reasoning (with +7.4% absolute accuracy improvement in 0-shot settings and +9.4% in 4-shot settings) and natural language inference (+5.4% in each of 0-shot and 4-shot settings). On the FLORES-101 machine translation benchmark, our model outperforms GPT-3 on 171 out of 182 translation directions with 32 training examples, while surpassing the official supervised baseline in 45 directions. We present a detailed analysis of where the model succeeds and fails, showing in particular that it enables cross-lingual in-context learning on some tasks, while there is still room for improvement on surface form robustness and adaptation to tasks that do not have a natural cloze form. Finally, we evaluate our models in social value tasks such as hate speech detection in five languages and find it has limitations similar to comparable sized GPT-3 models.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2112.10668

Country:

North America > United States > Maryland (0.14)
Europe > United Kingdom > Scotland (0.14)
Asia > Middle East > Republic of Türkiye (0.14)

Genre: Research Report > New Finding (0.92)

Industry:

Law (1.00)
Information Technology > Security & Privacy (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Efficient Large Scale Language Modeling with Mixtures of Experts

Artetxe, Mikel, Bhosale, Shruti, Goyal, Naman, Mihaylov, Todor, Ott, Myle, Shleifer, Sam, Lin, Xi Victoria, Du, Jingfei, Iyer, Srinivasan, Pasunuru, Ramakanth, Anantharaman, Giri, Li, Xian, Chen, Shuohui, Akin, Halil, Baines, Mandeep, Martin, Louis, Zhou, Xing, Koura, Punit Singh, O'Horo, Brian, Wang, Jeff, Zettlemoyer, Luke, Diab, Mona, Kozareva, Zornitsa, Stoyanov, Ves

arXiv.org Artificial IntelligenceDec-20-2021

Mixture of Experts layers (MoEs) enable efficient scaling of language models through conditional computation. This paper presents a detailed empirical study of how autoregressive MoE language models scale in comparison with dense models in a wide range of settings: in- and out-of-domain language modeling, zero- and few-shot priming, and full fine-tuning. With the exception of fine-tuning, we find MoEs to be substantially more compute efficient. At more modest training budgets, MoEs can match the performance of dense models using $\sim$4 times less compute. This gap narrows at scale, but our largest MoE model (1.1T parameters) consistently outperforms a compute-equivalent dense model (6.7B parameters). Overall, this performance gap varies greatly across tasks and domains, suggesting that MoE and dense models generalize differently in ways that are worthy of future study. We make our code and models publicly available for research use.

artificial intelligence, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2112.10684

Country:

Europe (1.00)
North America > United States > Minnesota (0.14)
North America > United States > Louisiana (0.14)
North America > United States > California (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Energy (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.87)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.72)

Add feedback

Do Language Models Have Beliefs? Methods for Detecting, Updating, and Visualizing Model Beliefs

Hase, Peter, Diab, Mona, Celikyilmaz, Asli, Li, Xian, Kozareva, Zornitsa, Stoyanov, Veselin, Bansal, Mohit, Iyer, Srinivasan

arXiv.org Artificial IntelligenceNov-26-2021

Do language models have beliefs about the world? Dennett (1995) famously argues that even thermostats have beliefs, on the view that a belief is simply an informational state decoupled from any motivational state. In this paper, we discuss approaches to detecting when models have beliefs about the world, and we improve on methods for updating model beliefs to be more truthful, with a focus on methods based on learned optimizers or hypernetworks. Our main contributions include: (1) new metrics for evaluating belief-updating methods that focus on the logical consistency of beliefs, (2) a training objective for Sequential, Local, and Generalizing model updates (SLAG) that improves the performance of learned optimizers, and (3) the introduction of the belief graph, which is a new form of interface with language models that shows the interdependencies between model beliefs. Our experiments suggest that models possess belief-like qualities to only a limited extent, but update methods can both fix incorrect model beliefs and greatly improve their consistency. Although off-the-shelf optimizers are surprisingly strong belief-updating baselines, our learned optimizers can outperform them in more difficult settings than have been considered in past work. Code is available at https://github.com/peterbhase/SLAG-Belief-Updating

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2111.13654

Country:

Europe (0.67)
Asia (0.67)
North America > United States > Louisiana (0.14)
(3 more...)

Genre: Research Report > Experimental Study (0.34)

Industry:

Leisure & Entertainment (1.00)
Media > Television (0.46)
Media > Film (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Fixed Support Tree-Sliced Wasserstein Barycenter

Takezawa, Yuki, Sato, Ryoma, Kozareva, Zornitsa, Ravi, Sujith, Yamada, Makoto

arXiv.org Artificial IntelligenceSep-8-2021

The Wasserstein barycenter has been widely studied in various fields, including natural language processing, and computer vision. However, it requires a high computational cost to solve the Wasserstein barycenter problem because the computation of the Wasserstein distance requires a quadratic time with respect to the number of supports. By contrast, the Wasserstein distance on a tree, called the tree-Wasserstein distance, can be computed in linear time and allows for the fast comparison of a large number of distributions. In this study, we propose a barycenter under the tree-Wasserstein distance, called the fixed support tree-Wasserstein barycenter (FS-TWB) and its extension, called the fixed support tree-sliced Wasserstein barycenter (FS-TSWB). More specifically, we first show that the FS-TWB and FS-TSWB problems are convex optimization problems and can be solved by using the projected subgradient descent. Moreover, we propose a more efficient algorithm to compute the subgradient and objective function value by using the properties of tree-Wasserstein barycenter problems. Through real-world experiments, we show that, by using the proposed algorithm, the FS-TWB and FS-TSWB can be solved two orders of magnitude faster than the original Wasserstein barycenter.

artificial intelligence, barycenter, optimization problem, (19 more...)

arXiv.org Artificial Intelligence

2109.03431

Country: Asia (0.14)

Genre: Research Report > New Finding (0.35)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)

Add feedback

Transferable Neural Projection Representations

Sankar, Chinnadhurai, Ravi, Sujith, Kozareva, Zornitsa

arXiv.org Artificial IntelligenceJun-4-2019

Neural word representations are at the core of many state-of-the-art natural language processing models. A widely used approach is to pre-train, store and look up word or character embedding matrices. While useful, such representations occupy huge memory making it hard to deploy on-device and often do not generalize to unknown words due to vocabulary pruning. In this paper, we propose a skip-gram based architecture coupled with Locality-Sensitive Hashing (LSH) projections to learn efficient dynamically computable representations. Our model does not need to store lookup tables as representations are computed on-the-fly and require low memory footprint. The representations can be trained in an unsupervised fashion and can be easily transferred to other NLP tasks. For qualitative evaluation, we analyze the nearest neighbors of the word representations and discover semantically similar words even with misspellings. For quantitative evaluation, we plug our transferable projections into a simple LSTM and run it on multiple NLP tasks and show how our transferable projections achieve better performance compared to prior work.

deep learning, neural network, representation, (20 more...)

arXiv.org Artificial Intelligence

1906.01605

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback

Variational Reasoning for Question Answering With Knowledge Graph

Zhang, Yuyu (Georgia Institute of Technology) | Dai, Hanjun (Georgia Institute of Technology) | Kozareva, Zornitsa (Amazon Web Services) | Smola, Alexander J. (Amazon Web Services) | Song, Le (Georgia Institute of Technology)

AAAI ConferencesFeb-8-2018

Knowledge graph (KG) is known to be helpful for the task of question answering (QA), since it provides well-structured relational information between entities, and allows one to further infer indirect facts. However, it is challenging to build QA systems which can learn to reason over knowledge graphs based on question-answer pairs alone. First, when people ask questions, their expressions are noisy (for example, typos in texts, or variations in pronunciations), which is non-trivial for the QA system to match those mentioned entities to the knowledge graph. Second, many questions require multi-hop logic reasoning over the knowledge graph to retrieve the answers. To address these challenges, we propose a novel and unified deep learning architecture, and an end-to-end variational learning algorithm which can handle noise in questions, and learn multi-hop reasoning simultaneously. Our method achieves state-of-the-art performance on a recent benchmark dataset in the literature. We also derive a series of new benchmark datasets, including questions for multi-hop reasoning, questions paraphrased by neural translation model, and questions in human voice. Our method yields very promising results on all these challenging datasets.

deep learning, knowledge graph, neural network, (19 more...)

AAAI Conferences

Thirty-Second AAAI Conference on Artificial Intelligence

Industry: Media > Film (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Sentiment Prediction Using Collaborative Filtering

Kim, Jihie (USC Information Sciences Institiute) | Yoo, Jaebong (USC Information Sciences Institiute) | Lim, Ho (USC Information Sciences Institiute) | Qiu, Huida (USC Information Sciences Institiute ) | Kozareva, Zornitsa (USC Information Sciences Institiute) | Galstyan, Aram (USC Information Sciences Institiute)

AAAI ConferencesJul-5-2013

Learning sentiment models from short texts such as tweets is a notoriously challenging problem due to very strong noise and data sparsity. This paper presents a novel, collaborative filtering-based approach for sentiment prediction in twitter conversation threads. Given a set of sentiment holders and sentiment targets, we assume we know the true sentiments for a small fraction of holder-target pairs. This information is then used to predict the sentiment of a previously unknown user towards another user or an entity using collaborative filtering algorithms. We validate our model on two Twitter datasets using different collaborative filtering techniques. Our preliminary results demonstrate that the proposed approach can be effectively used in twitter sentiment prediction, thus mitigating the data sparsity problem.

collaborative filtering, sentiment prediction

AAAI Conferences

Seventh International AAAI Conference on Weblogs and Social Media

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)

Add feedback