AITopics | Aoyama, Tatsuya

Collaborating Authors

Aoyama, Tatsuya

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Language Models Grow Less Humanlike beyond Phase Transition

Aoyama, Tatsuya, Wilcox, Ethan

arXiv.org Artificial IntelligenceFeb-25-2025

LMs' alignment with human reading behavior (i.e. psychometric predictive power; PPP) is known to improve during pretraining up to a tipping point, beyond which it either plateaus or degrades. Various factors, such as word frequency, recency bias in attention, and context size, have been theorized to affect PPP, yet there is no current account that explains why such a tipping point exists, and how it interacts with LMs' pretraining dynamics more generally. We hypothesize that the underlying factor is a pretraining phase transition, characterized by the rapid emergence of specialized attention heads. We conduct a series of correlational and causal experiments to show that such a phase transition is responsible for the tipping point in PPP. We then show that, rather than producing attention patterns that contribute to the degradation in PPP, phase transitions alter the subsequent learning dynamics of the model, such that further training keeps damaging PPP.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2502.18802

Country:

North America > United States > Utah (0.14)
Europe > Middle East > Malta (0.14)
Asia > Middle East > UAE (0.14)

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

Anything Goes? A Crosslinguistic Study of (Im)possible Language Learning in LMs

Yang, Xiulin, Aoyama, Tatsuya, Yao, Yuekun, Wilcox, Ethan

arXiv.org Artificial IntelligenceFeb-25-2025

Do LLMs offer insights into human language learning? A common argument against this idea is that because their architecture and training paradigm are so vastly different from humans, LLMs can learn arbitrary inputs as easily as natural languages. In this paper, we test this claim by training LMs to model impossible and typologically unattested languages. Unlike previous work, which has focused exclusively on English, we conduct experiments on 12 natural languages from 4 language families. Our results show that while GPT-2 small can primarily distinguish attested languages from their impossible counterparts, it does not achieve perfect separation between all the attested languages and all the impossible ones. We further test whether GPT-2 small distinguishes typologically attested from unattested languages with different NP orders by manipulating word order based on Greenberg's Universal 20. We find that the model's perplexity scores do not distinguish attested vs. unattested word orders, as long as the unattested variants maintain constituency structure. These findings suggest that language models exhibit some human-like inductive biases, though these biases are weaker than those found in human learners.

computational linguistic, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2502.18795

Country:

Europe (1.00)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Education > Curriculum > Subject-Specific Education (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Distributionally Robust Coreset Selection under Covariate Shift

Tanaka, Tomonari, Hanada, Hiroyuki, Yang, Hanting, Aoyama, Tatsuya, Inatsu, Yu, Akahane, Satoshi, Okura, Yoshito, Hashimoto, Noriaki, Murayama, Taro, Lee, Hanju, Kojima, Shinya, Takeuchi, Ichiro

arXiv.org Machine LearningFeb-18-2025

Coreset selection, which involves selecting a small subset from an existing training dataset, is an approach to reducing training data, and various approaches have been proposed for this method. In practical situations where these methods are employed, it is often the case that the data distributions differ between the development phase and the deployment phase, with the latter being unknown. Thus, it is challenging to select an effective subset of training data that performs well across all deployment scenarios. We therefore propose Distributionally Robust Coreset Selection (DRCS). DRCS theoretically derives an estimate of the upper bound for the worst-case test error, assuming that the future covariate distribution may deviate within a defined range from the training distribution. Furthermore, by selecting instances in a way that suppresses the estimate of the upper bound for the worst-case test error, DRCS achieves distributionally robust training instance selection. This study is primarily applicable to convex training computation, but we demonstrate that it can also be applied to deep learning under appropriate approximations. In this paper, we focus on covariate shift, a type of data distribution shift, and demonstrate the effectiveness of DRCS through experiments.

artificial intelligence, inductive learning, machine learning, (16 more...)

arXiv.org Machine Learning

2501.14253

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.87)

Add feedback

Generalized Kernel Inducing Points by Duality Gap for Dataset Distillation

Aoyama, Tatsuya, Yang, Hanting, Hanada, Hiroyuki, Akahane, Satoshi, Tanaka, Tomonari, Okura, Yoshito, Inatsu, Yu, Hashimoto, Noriaki, Murayama, Taro, Lee, Hanju, Kojima, Shinya, Takeuchi, Ichiro

arXiv.org Machine LearningFeb-18-2025

Reducing the amount of training data while preserving model performance remains a fundamental challenge in machine learning. Dataset distillation seeks to generate synthetic instances that encapsulate the essential information of the original data [31]. This synthetic approach often proves more flexible and can potentially achieve greater data reduction than simply retaining subsets of actual instances. Such distilled datasets can also serve broader applications, for example by enabling efficient continual learning with reduced storage demands [14, 23, 3], and offering privacy safeguards through data corruption [2, 12]. Existing dataset distillation methods are essentially formulated as a bi-level optimization problem. This is because generating synthetic instances requires retraining the model with those instances as training data. Specifically, synthetic instances are created in the outer loop, and the model is trained in the inner loop, leading to high computational costs. A promising approach to avoid bi-level optimization is a method called Kernel Inducing Point (KIP) [18]. The KIP method avoids bi-level optimization by obtaining an analytical solution in its inner loop, effectively leveraging the fact that its loss function is a variant of squared loss.

artificial intelligence, machine learning, optimization problem, (14 more...)

arXiv.org Machine Learning

2502.12607

Country: North America (0.46)

Genre: Research Report > New Finding (0.94)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.89)

Add feedback

GDTB: Genre Diverse Data for English Shallow Discourse Parsing across Modalities, Text Types, and Domains

Liu, Yang Janet, Aoyama, Tatsuya, Scivetti, Wesley, Zhu, Yilun, Behzad, Shabnam, Levine, Lauren Elizabeth, Lin, Jessica, Tiwari, Devika, Zeldes, Amir

arXiv.org Artificial IntelligenceNov-1-2024

Work on shallow discourse parsing in English has focused on the Wall Street Journal corpus, the only large-scale dataset for the language in the PDTB framework. However, the data is not openly available, is restricted to the news domain, and is by now 35 years old. In this paper, we present and evaluate a new open-access, multi-genre benchmark for PDTB-style shallow discourse parsing, based on the existing UD English GUM corpus, for which discourse relation annotations in other frameworks already exist. In a series of experiments on cross-domain relation classification, we show that while our dataset is compatible with PDTB, substantial out-of-domain degradation is observed, which can be alleviated by joint training on both datasets.

artificial intelligence, natural language, relation, (18 more...)

arXiv.org Artificial Intelligence

2411.00491

Country:

North America > United States (1.00)
Europe (1.00)

Genre: Research Report (0.40)

Industry:

Education (0.93)
Media > News (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.67)

Add feedback

Identifying Fairness Issues in Automatically Generated Testing Content

Stowe, Kevin, Longwill, Benny, Francis, Alyssa, Aoyama, Tatsuya, Ghosh, Debanjan, Somasundaran, Swapna

arXiv.org Artificial IntelligenceMay-1-2024

Natural language generation tools are powerful and effective for generating content. However, language models are known to display bias and fairness issues, making them impractical to deploy for many use cases. We here focus on how fairness issues impact automatically generated test content, which can have stringent requirements to ensure the test measures only what it was intended to measure. Specifically, we review test content generated for a large-scale standardized English proficiency test with the goal of identifying content that only pertains to a certain subset of the test population as well as content that has the potential to be upsetting or distracting to some test takers. Issues like these could inadvertently impact a test taker's score and thus should be avoided. This kind of content does not reflect the more commonly-acknowledged biases, making it challenging even for modern models that contain safeguards. We build a dataset of 601 generated texts annotated for fairness and explore a variety of methods for classification: fine-tuning, topic-based classification, and prompting, including few-shot and self-correcting prompts. We find that combining prompt self-correction and few-shot learning performs best, yielding an F1 score of 0.79 on our held-out test set, while much smaller BERT- and topic-based models have competitive performance on out-of-domain data.

computational linguistic, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2404.15104

Country:

Europe (0.67)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Overview (1.00)

Industry:

Education (1.00)
Media (0.93)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.68)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.54)

Add feedback

Distributionally Robust Safe Screening

Hanada, Hiroyuki, Akahane, Satoshi, Aoyama, Tatsuya, Tanaka, Tomonari, Okura, Yoshito, Inatsu, Yu, Hashimoto, Noriaki, Murayama, Taro, Hanju, Lee, Kojima, Shinya, Takeuchi, Ichiro

arXiv.org Machine LearningApr-25-2024

In this study, we propose a method Distributionally Robust Safe Screening (DRSS), for identifying unnecessary samples and features within a DR covariate shift setting. This method effectively combines DR learning, a paradigm aimed at enhancing model robustness against variations in data distribution, with safe screening (SS), a sparse optimization technique designed to identify irrelevant samples and features prior to model training. The core concept of the DRSS method involves reformulating the DR covariate-shift problem as a weighted empirical risk minimization problem, where the weights are subject to uncertainty within a predetermined range. By extending the SS technique to accommodate this weight uncertainty, the DRSS method is capable of reliably identifying unnecessary samples and features under any future distribution within a specified range. We provide a theoretical guarantee of the DRSS method and validate its performance through numerical experiments on both synthetic and real-world datasets.

artificial intelligence, machine learning, optimization problem, (15 more...)

arXiv.org Machine Learning

2404.16328

Country: Asia > Japan (0.14)

Genre: Research Report (0.84)

Technology:

Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

GENTLE: A Genre-Diverse Multilayer Challenge Set for English NLP and Linguistic Evaluation

Aoyama, Tatsuya, Behzad, Shabnam, Gessler, Luke, Levine, Lauren, Lin, Jessica, Liu, Yang Janet, Peng, Siyao, Zhu, Yilun, Zeldes, Amir

arXiv.org Artificial IntelligenceSep-21-2023

We present GENTLE, a new mixed-genre English challenge corpus totaling 17K tokens and consisting of 8 unusual text types for out-of domain evaluation: dictionary entries, esports commentaries, legal documents, medical notes, poetry, mathematical proofs, syllabuses, and threat letters. GENTLE is manually annotated for a variety of popular NLP tasks, including syntactic dependency parsing, entity recognition, coreference resolution, and discourse parsing. We evaluate state-of-the-art NLP systems on GENTLE and find severe degradation for at least some genres in their performance on all tasks, which indicates GENTLE's utility as an evaluation dataset for NLP systems.

artificial intelligence, computational linguistic, natural language, (16 more...)

arXiv.org Artificial Intelligence

2306.01966

Country:

Europe (1.00)
North America > United States > Maryland (0.14)
North America > United States > California (0.14)

Genre:

Instructional Material (0.67)
Research Report (0.64)

Industry:

Law (1.00)
Leisure & Entertainment > Sports (0.36)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

What's Hard in English RST Parsing? Predictive Models for Error Analysis

Liu, Yang Janet, Aoyama, Tatsuya, Zeldes, Amir

arXiv.org Artificial IntelligenceSep-10-2023

Despite recent advances in Natural Language Processing (NLP), hierarchical discourse parsing in the framework of Rhetorical Structure Theory remains challenging, and our understanding of the reasons for this are as yet limited. In this paper, we examine and model some of the factors associated with parsing difficulties in previous work: the existence of implicit discourse relations, challenges in identifying long-distance relations, out-of-vocabulary items, and more. In order to assess the relative importance of these variables, we also release two annotated English test-sets with explicit correct and distracting discourse markers associated with gold standard RST relations. Our results show that as in shallow discourse parsing, the explicit/implicit distinction plays a role, but that long-distance dependencies are the main challenge, while lack of lexical overlap is less of a problem, at least for in-domain parsing. Our final model is able to predict where errors will occur with an accuracy of 76.3% for the bottom-up parser and 76.6% for the top-down parser.

artificial intelligence, natural language, relation, (16 more...)

arXiv.org Artificial Intelligence

2309.0494

Country:

Europe (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > New Mexico (0.14)
(2 more...)

Genre: Research Report > New Finding (0.86)

Technology: Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)

Add feedback