AITopics | Wang, Leroy Z.

Collaborating Authors

Wang, Leroy Z.

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Minimization of Boolean Complexity in In-Context Concept Learning

Wang, Leroy Z., McCoy, R. Thomas, Steinert-Threlkeld, Shane

arXiv.org Artificial IntelligenceDec-3-2024

What factors contribute to the relative success and corresponding difficulties of in-context learning for Large Language Models (LLMs)? Drawing on insights from the literature on human concept learning, we test LLMs on carefully designed concept learning tasks, and show that task performance highly correlates with the Boolean complexity of the concept. This suggests that in-context learning exhibits a learning bias for simplicity in a way similar to humans.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2412.02823

Country: Asia > Middle East > UAE (0.14)

Genre: Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.92)

Add feedback

Whose Language Counts as High Quality? Measuring Language Ideologies in Text Data Selection

Gururangan, Suchin, Card, Dallas, Dreier, Sarah K., Gade, Emily K., Wang, Leroy Z., Wang, Zeyu, Zettlemoyer, Luke, Smith, Noah A.

arXiv.org Artificial IntelligenceJan-26-2022

Language models increasingly rely on massive web dumps for diverse text data. However, these sources are rife with undesirable content. As such, resources like Wikipedia, books, and newswire often serve as anchors for automatically selecting web text most suitable for language modeling, a process typically referred to as quality filtering. Using a new dataset of U.S. high school newspaper articles -- written by students from across the country -- we investigate whose language is preferred by the quality filter used for GPT-3. We find that newspapers from larger schools, located in wealthier, educated, and urban ZIP codes are more likely to be classified as high quality. We then demonstrate that the filter's measurement of quality is unaligned with other sensible metrics, such as factuality or literary acclaim. We argue that privileging any corpus as high quality entails a language ideology, and more care is needed to construct training corpora for language models, with better transparency and justification for the inclusion or exclusion of various texts.

high quality, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2201.10474

Country: North America > United States > Massachusetts (0.28)

Genre:

Research Report > New Finding (1.00)
Personal (1.00)
Research Report > Experimental Study (0.69)

Industry:

Media > News (1.00)
Leisure & Entertainment > Sports > Football (1.00)
Law (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.93)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.70)

Add feedback