AITopics | punctuation

Grammatical Error Correction (GEC) is an important aspect of natural language processing. Arabic has a complicated morphological and syntactic structure, posing a greater challenge than other languages. Even though modern neural models have improved greatly in recent years, the majority of previous attempts used individual models without taking into account the potential benefits of combining different systems. In this paper, we present one of the first multi-system approaches for correcting grammatical errors in Arabic, the Arab Enhanced Edit Selection System Complication (ArbESC+). Several models are used to collect correction proposals, which are represented as numerical features in the framework. A classifier determines and implements the appropriate corrections based on these features. In order to improve output quality, the framework uses support techniques to filter overlapping corrections and estimate decision reliability. A combination of AraT5, ByT5, mT5, AraBART, AraBART+Morph+GEC, and Text editing systems gave better results than a single model alone, with F0.5 at 82.63% on QALB-14 test data, 84.64% on QALB-15 L1 data, and 65.55% on QALB-15 L2 data. As one of the most significant contributions of this work, it's the first Arab attempt to integrate linguistic error correction. Improving existing models provides a practical step towards developing advanced tools that will benefit users and researchers of Arabic text processing.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2511.1423

Country: Asia > Middle East > Saudi Arabia (0.28)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.67)

Add feedback

Punctuation-aware treebank tree binarization

Klinger, Eitan, Wadhwa, Vivaan, Park, Jungyeul

arXiv.org Artificial IntelligenceOct-14-2025

This article presents a curated resource and evaluation suite for punctuation-aware treebank binarization. Standard binarization pipelines drop punctuation before head selection, which alters constituent shape and harms head-child identification. We release (1) a reproducible pipeline that preserves punctuation as sibling nodes prior to binarization, (2) derived artifacts and metadata (intermediate @X markers, reversibility signatures, alignment indices), and (3) an accompanying evaluation suite covering head-child prediction, round-trip reversibility, and structural compatibility with derivational resources (CCGbank). On the Penn Treebank, punctuation-aware preprocessing improves head prediction accuracy from 73.66\% (Collins rules) and 86.66\% (MLP) to 91.85\% with the same classifier, and achieves competitive alignment against CCGbank derivations. All code, configuration files, and documentation are released to enable replication and extension to other corpora.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2510.10951

Country:

North America > United States > Pennsylvania (0.15)
North America > Canada > British Columbia (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.68)

Add feedback

Appendices 1 All codes, data, and instructions for our C

Neural Information Processing SystemsOct-9-2025, 22:47:25 GMT

We plan to expand the study to a larger scale in future work. "Please extract as many components as possible from the provided images. Only provide the component names, separated by commas. We treat objects and their attributes (if found) as options for the questions. "These sentences describe the differences between the two images.

annotation interface, annotator, dataset, (15 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.04)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)

Add feedback

9a9f4e15ad0d680429a3e0570a96f763-Paper-Conference.pdf

Neural Information Processing SystemsOct-9-2025, 02:28:47 GMT

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

Asia > China > Guangdong Province > Shenzhen (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.70)

Add feedback

From Canonical to Complex: Benchmarking LLM Capabilities in Undergraduate Thermodynamics

Geißler, Anna, Bien, Luca-Sophie, Schöppler, Friedrich, Hertel, Tobias

arXiv.org Artificial IntelligenceSep-1-2025

Large language models (LLMs) are increasingly considered as tutoring aids in science education. Yet their readiness for unsupervised use in undergraduate instruction remains uncertain, as reliable teaching requires more than fluent recall: it demands consistent, principle-grounded reasoning. Thermodynamics, with its compact laws and subtle distinctions between state and path functions, reversibility, and entropy, provides an ideal testbed for evaluating such capabilities. Here we present UTQA, a 50-item undergraduate thermodynamics question answering benchmark, covering ideal-gas processes, reversibility, and diagram interpretation. No leading 2025-era model exceeded our 95\% competence threshold: the best LLMs achieved 82\% accuracy, with text-only items performing better than image reasoning tasks, which often fell to chance levels. Prompt phrasing and syntactic complexity showed modest to little correlation with performance. The gap concentrates in finite-rate/irreversible scenarios and in binding visual features to thermodynamic meaning, indicating that current LLMs are not yet suitable for unsupervised tutoring in this domain.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2508.21452

Country:

North America > United States (0.14)
Europe > Germany (0.14)

Genre: Research Report (0.51)

Industry: Education (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)

Add feedback

Punctuation and Predicates in Language Models

Chauhan, Sonakshi, Chaudhary, Maheep, Choy, Koby, Nellessen, Samuel, Schoots, Nandi

arXiv.org Artificial IntelligenceAug-21-2025

In this paper we explore where information is collected and how it is propagated throughout layers in large language models (LLMs). We begin by examining the surprising computational importance of punctuation tokens which previous work has identified as attention sinks and memory aids. Using intervention-based techniques, we evaluate the necessity and sufficiency (for preserving model performance) of punctuation tokens across layers in GPT-2, DeepSeek, and Gemma. Our results show stark model-specific differences: for GPT-2, punctuation is both necessary and sufficient in multiple layers, while this holds far less in DeepSeek and not at all in Gemma. Extending beyond punctuation, we ask whether LLMs process different components of input (e.g., subjects, adjectives, punctuation, full sentences) by forming early static summaries reused across the network, or if the model remains sensitive to changes in these components across layers. Extending beyond punctuation, we investigate whether different reasoning rules are processed differently by LLMs. In particular, through interchange intervention and layer-swapping experiments, we find that conditional statements (if, then), and universal quantification (for all) are processed very differently. Our findings offer new insight into the internal mechanisms of punctuation usage and reasoning in LLMs and have implications for interpretability.

intervention, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2508.14067

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Whispering Context: Distilling Syntax and Semantics for Long Speech Transcripts

Altinok, Duygu

arXiv.org Artificial IntelligenceAug-20-2025

ASR systems often struggle with maintaining syntactic and semantic accuracy in long audio transcripts, impacting tasks like Named Entity Recognition (NER), capitalization, and punctuation. We propose a novel approach that enhances ASR by distilling contextual knowledge from LLaMA models into Whisper. Our method uses two strategies: (1) token level distillation with optimal transport to align dimensions and sequence lengths, and (2) representation loss minimization between sentence embeddings of Whisper and LLaMA, blending syntax and semantics. Evaluations on the Spoken Wikipedia dataset, a benchmark with long audios and rich entities demonstrate significant improvements in Word Error Rate (WER), NER, capitalization, and punctuation success. By introducing novel NER metrics and exploring semantics aware ASR, our work highlights the value of integrating linguistic context into transcription, setting a foundation for robust, context-aware ASR in longform speech.

distillation, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2508.13376

Country: North America > United States (1.00)

Genre: Research Report (1.00)

Industry:

Law (0.68)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.48)

Add feedback

emoji-development-face-tears-joy-book-keith-houston.html?via=rss

SlateJul-27-2025, 15:00:00 GMT

A couple of years ago, I frequently found myself driving past a roadside ice cream stand under construction. For weeks, the roof of this stand, a gigantic white swirl of fiberglass soft serve, sat on the ground next to the structure, waiting to be lowered onto the finished, cone-shaped building with a crane. I know what it was supposed to represent, but every time I glimpsed it, my instinctive first thought was There's a giant poop emoji. Keith Houston's history of emoji, Face With Tears of Joy, argues that emoji have "become so ubiquitous in our writing, so quotidian, that we should be talking about them in the same breath as grammar or punctuation." I don't know about grammar, which seems as fundamental to language, spoken and written, as words themselves.

artificial intelligence, emoji, houston, (15 more...)

Slate

Country: Asia > Japan (0.15)

Industry: Government (0.48)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence (0.97)

Add feedback

Filters

Collaborating Authors

punctuation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

9a9f4e15ad0d680429a3e0570a96f763-Paper-Conference.pdf

32923dff09f75cf1974c145764a523e2-Supplemental-Datasets_and_Benchmarks_Track.pdf

ArbESC+: Arabic Enhanced Edit Selection System Combination for Grammatical Error Correction Resolving conflict and improving system combination in Arabic GEC

Punctuation-aware treebank tree binarization

Appendices 1 All codes, data, and instructions for our C

9a9f4e15ad0d680429a3e0570a96f763-Paper-Conference.pdf

From Canonical to Complex: Benchmarking LLM Capabilities in Undergraduate Thermodynamics

Punctuation and Predicates in Language Models

Whispering Context: Distilling Syntax and Semantics for Long Speech Transcripts

emoji-development-face-tears-joy-book-keith-houston.html?via=rss