AITopics

2410.11767

Country:

Europe > Latvia > Lubāna Municipality > Lubāna (0.04)
Asia > Japan > Honshū > Tōhoku > Iwate Prefecture > Morioka (0.04)
Pacific Ocean > North Pacific Ocean > San Francisco Bay > Golden Gate (0.04)
(2 more...)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Is it the end of (generative) linguistics as we know it?

Chesi, Cristiano

A significant debate has emerged in response to a paper written by Steven Piantadosi (Piantadosi, 2023) and uploaded to the LingBuzz platform, the open archive for generative linguistics. Piantadosi's dismissal of Chomsky's approach is ruthless, but generative linguists deserve it. In this paper, I will adopt three idealized perspectives -- computational, theoretical, and experimental -- to focus on two fundamental issues that lend partial support to Piantadosi's critique: (a) the evidence challenging the Poverty of Stimulus (PoS) hypothesis and (b) the notion of simplicity as conceived within mainstream Minimalism. In conclusion, I argue that, to reclaim a central role in language studies, generative linguistics -- representing a prototypical theoretical perspective on language -- needs a serious update leading to (i) more precise, consistent, and complete formalizations of foundational intuitions and (ii) the establishment and utilization of a standardized dataset of crucial empirical evidence to evaluate the theory's adequacy. On the other hand, ignoring the formal perspective leads to major drawbacks in both computational and experimental approaches. Neither descriptive nor explanatory adequacy can be easily achieved without the precise formulation of general principles that can be challenged empirically.

chomsky, large language model, machine learning, (23 more...)

2412.12797

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
North America > United States > Massachusetts (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(5 more...)

Genre: Research Report (1.00)

Industry: Education (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(3 more...)

Speak & Improve Challenge 2025: Tasks and Baseline Systems

Qian, Mengjie, Knill, Kate, Banno, Stefano, Tang, Siyuan, Karanasou, Penny, Gales, Mark J. F., Nicholls, Diane

This paper presents the "Speak & Improve Challenge 2025: Spoken Language Assessment and Feedback" -- a challenge associated with the ISCA SLaTE 2025 Workshop. The goal of the challenge is to advance research on spoken language assessment and feedback, with tasks associated with both the underlying technology and language learning feedback. Linked with the challenge, the Speak & Improve (S&I) Corpus 2025 is being pre-released, a dataset of L2 learner English data with holistic scores and language error annotation, collected from open (spontaneous) speaking tests on the Speak & Improve learning platform. The corpus consists of approximately 315 hours of audio data from second language English learners with holistic scores, and a 55-hour subset with manual transcriptions and error labels. The Challenge has four shared tasks: Automatic Speech Recognition (ASR), Spoken Language Assessment (SLA), Spoken Grammatical Error Correction (SGEC), and Spoken Grammatical Error Correction Feedback (SGECF). Each of these tasks has a closed track where a predetermined set of models and data sources are allowed to be used, and an open track where any public resource may be used. Challenge participants may do one or more of the tasks. This paper describes the challenge, the S&I Corpus 2025, and the baseline systems released for the Challenge.

artificial intelligence, natural language, transcription, (11 more...)

2412.11985

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)
Europe > Bulgaria > Sofia City Province > Sofia (0.04)
Asia > Singapore (0.04)

Genre: Research Report (0.64)

Industry: Education (0.90)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.73)

Alekseev, Anton, Tillabaeva, Alina, Kabaeva, Gulnara Dzh., Nikolenko, Sergey I.

Syntactic Transfer to Kyrgyz Using the Treebank Translation Method

The Kyrgyz language, as a low-resource language, requires significant effort to create high-quality syntactic corpora. This study proposes an approach to simplify the development process of a syntactic corpus for Kyrgyz. We present a tool for transferring syntactic annotations from Turkish to Kyrgyz based on a treebank translation method. The effectiveness of the proposed tool was evaluated using the TueCL treebank. The results demonstrate that this approach achieves higher syntactic annotation accuracy compared to a monolingual model trained on the Kyrgyz KTMU treebank. Additionally, the study introduces a method for assessing the complexity of manual annotation for the resulting syntactic trees, contributing to further optimization of the annotation process.

large language model, machine learning, natural language, (17 more...)

2412.13146

Country:

Asia > Russia (0.05)
Europe > Russia > Northwestern Federal District > Leningrad Oblast > Saint Petersburg (0.04)
Asia > Kyrgyzstan > Chüy Region > Bishkek (0.04)
(8 more...)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

Goto, Takumi, Vasselli, Justin, Watanabe, Taro

Improving Explainability of Sentence-level Metrics via Edit-level Attribution for Grammatical Error Correction

Various evaluation metrics have been proposed for Grammatical Error Correction (GEC), but many, particularly reference-free metrics, lack explainability. This lack of explainability hinders researchers from analyzing the strengths and weaknesses of GEC models and limits the ability to provide detailed feedback for users. To address this issue, we propose attributing sentence-level scores to individual edits, providing insight into how specific corrections contribute to the overall performance. For the attribution method, we use Shapley values, from cooperative game theory, to compute the contribution of each edit. Experiments with existing sentence-level metrics demonstrate high consistency across different edit granularities and show approximately 70\% alignment with human evaluations. In addition, we analyze biases in the metrics based on the attribution results, revealing trends such as the tendency to ignore orthographic edits. Our implementation is available at \url{https://github.com/naist-nlp/gec-attribute}.

computational linguistic, large language model, natural language, (16 more...)

2412.1311

Country:

North America > United States > Maryland > Baltimore (0.04)
North America > Mexico > Mexico City > Mexico City (0.04)
North America > United States > Washington > King County > Seattle (0.04)
(10 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.63)
Information Technology > Data Science > Data Quality > Data Cleaning (0.63)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.47)

Knill, Kate, Nicholls, Diane, Gales, Mark J. F., Qian, Mengjie, Stroinski, Pawel

Speak & Improve Corpus 2025: an L2 English Speech Corpus for Language Assessment and Feedback

We introduce the Speak & Improve Corpus 2025, a dataset of L2 learner English data with holistic scores and language error annotation, collected from open (spontaneous) speaking tests on the Speak & Improve learning platform. The aim of the corpus release is to address a major challenge to developing L2 spoken language processing systems, the lack of publicly available data with high-quality annotations. It is being made available for non-commercial use on the ELiT website. In designing this corpus we have sought to make it cover a wide-range of speaker attributes, from their L1 to their speaking ability, as well as providing manual annotations. This enables a range of language-learning tasks to be examined, such as assessing speaking proficiency or providing feedback on grammatical errors in a learner's speech. Additionally the data supports research into the underlying technology required for these tasks including automatic speech recognition (ASR) of low resource L2 learner English, disfluency detection or spoken grammatical error correction (GEC). The corpus consists of around 315 hours of L2 English learners audio with holistic scores, and a subset of audio annotated with transcriptions and error labels.

artificial intelligence, corpus, natural language, (13 more...)

2412.11986

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.30)
Europe > Middle East > Malta > Eastern Region > Northern Harbour District > St. Julian's (0.04)
Europe > Bulgaria > Sofia City Province > Sofia (0.04)

Genre: Research Report (1.00)

Industry: Education (0.90)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.70)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.58)

NAVCON: A Cognitively Inspired and Linguistically Grounded Corpus for Vision and Language Navigation

Wanchoo, Karan, Zuo, Xiaoye, Gonzalez, Hannah, Dan, Soham, Georgakis, Georgios, Roth, Dan, Daniilidis, Kostas, Miltsakaki, Eleni

We present NAVCON, a large-scale annotated Vision-Language Navigation (VLN) corpus built on top of two popular datasets (R2R and RxR). The paper introduces four core, cognitively motivated and linguistically grounded, navigation concepts and an algorithm for generating large-scale silver annotations of naturally occurring linguistic realizations of these concepts in navigation instructions. We pair the annotated instructions with video clips of an agent acting on these instructions. NAVCON contains 236, 316 concept annotations for approximately 30, 0000 instructions and 2.7 million aligned images (from approximately 19, 000 instructions) showing what the agent sees when executing an instruction. To our knowledge, this is the first comprehensive resource of navigation concepts. We evaluated the quality of the silver annotations by conducting human evaluation studies on NAVCON samples. As further validation of the quality and usefulness of the resource, we trained a model for detecting navigation concepts and their linguistic realizations in unseen instructions. Additionally, we show that few-shot learning with GPT-4o performs well on this task using large-scale silver annotations of NAVCON.

large language model, machine learning, natural language, (20 more...)

2412.13026

Country:

North America > United States > California (0.04)
North America > United States > Pennsylvania (0.04)
North America > United States > Indiana > Madison County > Anderson (0.04)

Genre: Research Report (0.50)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Robots (0.94)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.46)

arXiv.org Artificial IntelligenceDec-16-2024

DocFusion: A Unified Framework for Document Parsing Tasks

Chai, Mingxu, Shen, Ziyu, Zhang, Chong, Zhang, Yue, Wang, Xiao, Dou, Shihan, Kang, Jihua, Zhang, Jiazheng, Zhang, Qi

Document parsing is essential for analyzing complex document structures and extracting fine-grained information, supporting numerous downstream applications. However, existing methods often require integrating multiple independent models to handle various parsing tasks, leading to high complexity and maintenance overhead. To address this, we propose DocFusion, a lightweight generative model with only 0.28B parameters. It unifies task representations and achieves collaborative training through an improved objective function. Experiments reveal and leverage the mutually beneficial interaction among recognition tasks, and integrating recognition data significantly enhances detection performance. The final results demonstrate that DocFusion achieves state-of-the-art (SOTA) performance across four key tasks.

docfusion, machine learning, natural language, (18 more...)

2412.12505

Country:

Europe > Russia (0.04)
Asia > Russia (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report > New Finding (0.35)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Shayegh, Behzad, Lee, Hobie H. -B., Zhu, Xiaodan, Cheung, Jackie Chi Kit, Mou, Lili

Error Diversity Matters: An Error-Resistant Ensemble Method for Unsupervised Dependency Parsing

arXiv.org Artificial IntelligenceDec-16-2024

We address unsupervised dependency parsing by building an ensemble of diverse existing models through post hoc aggregation of their output dependency parse structures. We observe that these ensembles often suffer from low robustness against weak ensemble components due to error accumulation. To tackle this problem, we propose an efficient ensemble-selection approach that avoids error accumulation. Results demonstrate that our approach outperforms each individual model as well as previous ensemble techniques. Additionally, our experiments show that the proposed ensemble-selection method significantly enhances the performance and robustness of our ensemble, surpassing previously proposed strategies, which have not accounted for error diversity.

artificial intelligence, machine learning, natural language, (18 more...)

2412.11543

Country:

North America > Canada > Alberta (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > Canada > Quebec > Montreal (0.04)
(2 more...)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Artificial IntelligenceDec-15-2024

Attention with Dependency Parsing Augmentation for Fine-Grained Attribution

Ding, Qiang, Luo, Lvzhou, Cao, Yixuan, Luo, Ping

To assist humans in efficiently validating RAG-generated content, developing a fine-grained attribution mechanism that provides supporting evidence from retrieved documents for every answer span is essential. Existing fine-grained attribution methods rely on model-internal similarity metrics between responses and documents, such as saliency scores and hidden state similarity. However, these approaches suffer from either high computational complexity or coarse-grained representations. Additionally, a common problem shared by the previous works is their reliance on decoder-only Transformers, limiting their ability to incorporate contextual information after the target span. To address the above problems, we propose two techniques applicable to all model-internals-based methods. First, we aggregate token-wise evidence through set union operations, preserving the granularity of representations. Second, we enhance the attributor by integrating dependency parsing to enrich the semantic completeness of target spans. For practical implementation, our approach employs attention weights as the similarity metric. Experimental results demonstrate that the proposed method consistently outperforms all prior works.

attribution, large language model, machine learning, (18 more...)

2412.11404

Country:

Asia > China > Beijing > Beijing (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.86)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)