AITopics | frage

With acareful study, we find amore general problem which is rooted in low-frequency words in the text corpus.

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > Bulgaria > Sofia City Province > Sofia (0.04)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.56)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.49)

Add feedback

FRAGE: Frequency-Agnostic Word Representation

Neural Information Processing SystemsNov-20-2025, 23:08:59 GMT

Continuous word representation (aka word embedding) is a basic building block in many neural network-based models used in natural language processing tasks. Although it is widely accepted that words with similar semantics should be close to each other in the embedding space, we find that word embeddings learned in several tasks are biased towards word frequency: the embeddings of high-frequency and low-frequency words lie in different subregions of the embedding space, and the embedding of a rare word and a popular word can be far from each other even if they are semantically similar. This makes learned word embeddings ineffective, especially for rare words, and consequently limits the performance of these neural network models. In order to mitigate the issue, in this paper, we propose a neat, simple yet effective adversarial training method to blur the boundary between the embeddings of high-frequency words and low-frequency words. We conducted comprehensive studies on ten datasets across four natural language processing tasks, including word similarity, language modeling, machine translation and text classification. Results show that we achieve higher performance than the baselines in all tasks.

frequency-agnostic word representation, name change, natural language processing task, (2 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.83)

Add feedback

Reviews: FRAGE: Frequency-Agnostic Word Representation

Neural Information Processing SystemsOct-8-2024, 07:57:13 GMT

The core idea is adding an adversarial loss which is recently widely used in many NLP papers as mentioned in this paper submission. The authors define two frequency-based domains: major words and rare words. The proposed approach is easy to use and seems helpful in improving accuracy of several NLP tasks.

frequency-agnostic word representation, one-hot representation, rare word, (8 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.56)
Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.40)
Information Technology > Artificial Intelligence > Machine Learning (0.35)

Add feedback

mCoT: Multilingual Instruction Tuning for Reasoning Consistency in Language Models

Lai, Huiyuan, Nissim, Malvina

arXiv.org Artificial IntelligenceJul-10-2024

Large language models (LLMs) with Chain-of-thought (CoT) have recently emerged as a powerful technique for eliciting reasoning to improve various downstream tasks. As most research mainly focuses on English, with few explorations in a multilingual context, the question of how reliable this reasoning capability is in different languages is still open. To address it directly, we study multilingual reasoning consistency across multiple languages, using popular open-source LLMs. First, we compile the first large-scale multilingual math reasoning dataset, mCoT-MATH, covering eleven diverse languages. Then, we introduce multilingual CoT instruction tuning to boost reasoning capability across languages, thereby improving model consistency. While existing LLMs show substantial variation across the languages we consider, and especially low performance for lesser resourced languages, our 7B parameter model mCoT achieves impressive consistency across languages, and superior or comparable performance to close- and open-source models even of much larger sizes.

computational linguistic, consistency, reasoning, (15 more...)

arXiv.org Artificial Intelligence

2406.02301

Country:

North America > Canada > Ontario > Toronto (0.04)
Asia > Singapore (0.04)
North America > United States > New York > New York County > New York City (0.04)
(5 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Language Models are Multilingual Chain-of-Thought Reasoners

Shi, Freda, Suzgun, Mirac, Freitag, Markus, Wang, Xuezhi, Srivats, Suraj, Vosoughi, Soroush, Chung, Hyung Won, Tay, Yi, Ruder, Sebastian, Zhou, Denny, Das, Dipanjan, Wei, Jason

arXiv.org Artificial IntelligenceOct-6-2022

We evaluate the reasoning abilities of large language models in multilingual settings. We introduce the Multilingual Grade School Math (MGSM) benchmark, by manually translating 250 grade-school math problems from the GSM8K dataset (Cobbe et al., 2021) into ten typologically diverse languages. We find that the ability to solve MGSM problems via chain-of-thought prompting emerges with increasing model scale, and that models have strikingly strong multilingual reasoning abilities, even in underrepresented languages such as Bengali and Swahili. Finally, we show that the multilingual reasoning abilities of language models extend to other tasks such as commonsense reasoning and word-in-context semantic judgment. The MGSM benchmark is publicly available at https://github.com/google-research/url-nlp.

computational linguistic, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

2210.03057

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
South America > Peru > Cusco Department > Cusco Province > Cusco (0.04)
Europe > Italy > Tuscany > Florence (0.04)
(7 more...)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.90)

Add feedback

FRAGE: Frequency-Agnostic Word Representation

Gong, Chengyue, He, Di, Tan, Xu, Qin, Tao, Wang, Liwei, Liu, Tie-Yan

Neural Information Processing SystemsFeb-14-2020, 07:58:05 GMT

Continuous word representation (aka word embedding) is a basic building block in many neural network-based models used in natural language processing tasks. Although it is widely accepted that words with similar semantics should be close to each other in the embedding space, we find that word embeddings learned in several tasks are biased towards word frequency: the embeddings of high-frequency and low-frequency words lie in different subregions of the embedding space, and the embedding of a rare word and a popular word can be far from each other even if they are semantically similar. This makes learned word embeddings ineffective, especially for rare words, and consequently limits the performance of these neural network models. In order to mitigate the issue, in this paper, we propose a neat, simple yet effective adversarial training method to blur the boundary between the embeddings of high-frequency words and low-frequency words. We conducted comprehensive studies on ten datasets across four natural language processing tasks, including word similarity, language modeling, machine translation and text classification.

frage, frequency-agnostic word representation, natural language processing task

Neural Information Processing Systems

Technology: