AITopics | english text

Collaborating Authors

english text

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Can QE-informed (Re)Translation lead to Error Correction?

Padmanabhan, Govardhan

arXiv.org Artificial IntelligenceNov-19-2025

The paper presents two approaches submitted to the WMT 2025 Automated Translation Quality Evaluation Systems Task 3 - Quality Estimation (QE)-informed Segment-level Error Correction. While jointly training QE systems with Automatic Post-Editing (APE) has shown improved performance for both tasks, APE systems are still known to overcorrect the output of Machine Translation (MT), leading to a degradation in performance. We investigate a simple training-free approach - QE-informed Retranslation, and compare it with another within the same training-free paradigm. Our winning approach selects the highest-quality translation from multiple candidates generated by different LLMs. The second approach, more akin to APE, instructs an LLM to replace error substrings as specified in the provided QE explanation(s). A conditional heuristic was employed to minimise the number of edits, with the aim of maximising the Gain-to-Edit ratio. The two proposed approaches achieved a Delta COMET score of 0.0201 and -0.0108, respectively, leading the first approach to achieve the winning position on the subtask leaderboard.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2511.13884

Country:

North America > United States (0.29)
Europe > United Kingdom (0.28)
Asia (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Be My Donor. Transfer the NLP Datasets Between the Languages Using LLM

Popov, Dmitrii, Terentev, Egor, Buyanov, Igor

arXiv.org Artificial IntelligenceOct-17-2024

In this work, we investigated how one can use the LLM to transfer the dataset and its annotation from one language to another. This is crucial since sharing the knowledge between different languages could boost certain underresourced directions in the target language, saving lots of efforts in data annotation or quick prototyping. We experiment with English and Russian pairs translating the DEFT corpus. This corpus contains three layers of annotation dedicated to term-definition pair mining, which is a rare annotation type for Russian. We provide a pipeline for the annotation transferring using ChatGPT3.5-turbo and Llama-3.1-8b as core LLMs. In the end, we train the BERT-based models on the translated dataset to establish a baseline.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2410.14074

Country:

Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
Asia > Russia (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

An Open-Source American Sign Language Fingerspell Recognition and Semantic Pose Retrieval Interface

Thomas, Kevin Jose

arXiv.org Artificial IntelligenceAug-17-2024

This paper introduces an open-source interface for American Sign Language fingerspell recognition and semantic pose retrieval, aimed to serve as a stepping stone towards more advanced sign language translation systems. Utilizing a combination of convolutional neural networks and pose estimation models, the interface provides two modular components: a recognition module for translating ASL fingerspelling into spoken English and a production module for converting spoken English into ASL pose sequences. The system is designed to be highly accessible, user-friendly, and capable of functioning in real-time under varying environmental conditions like backgrounds, lighting, skin tones, and hand sizes. We discuss the technical details of the model architecture, application in the wild, as well as potential future enhancements for real-world consumer applications.

application, interface, sign language, (16 more...)

arXiv.org Artificial Intelligence

2408.09311

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Burnaby (0.04)
Europe > Finland > Pirkanmaa > Tampere (0.04)

Genre: Research Report (0.50)

Industry: Education > Curriculum > Subject-Specific Education (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

CUDRT: Benchmarking the Detection of Human vs. Large Language Models Generated Texts

Tao, Zhen, Li, Zhiyu, Xi, Dinghao, Xu, Wei

arXiv.org Artificial IntelligenceJun-13-2024

The proliferation of large language models (LLMs) has significantly enhanced text generation capabilities across various industries. However, these models' ability to generate human-like text poses substantial challenges in discerning between human and AI authorship. Despite the effectiveness of existing AI-generated text detectors, their development is hindered by the lack of comprehensive, publicly available benchmarks. Current benchmarks are limited to specific scenarios, such as question answering and text polishing, and predominantly focus on English texts, failing to capture the diverse applications and linguistic nuances of LLMs. To address these limitations, this paper constructs a comprehensive bilingual benchmark in both Chinese and English to evaluate mainstream AI-generated text detectors. We categorize LLM text generation into five distinct operations: Create, Update, Delete, Rewrite, and Translate (CUDRT), encompassing all current LLMs activities. We also establish a robust benchmark evaluation framework to support scalable and reproducible experiments. For each CUDRT category, we have developed extensive datasets to thoroughly assess detector performance. By employing the latest mainstream LLMs specific to each language, our datasets provide a thorough evaluation environment. Extensive experimental results offer critical insights for optimizing AI-generated text detectors and suggest future research directions to improve detection accuracy and generalizability across various scenarios.

average 0, llm, opération, (17 more...)

arXiv.org Artificial Intelligence

2406.09056

Country:

Asia > China > Shanghai > Shanghai (0.04)
North America > United States (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre:

Overview (1.00)
Research Report > New Finding (0.92)

Industry:

Health & Medicine (0.92)
Information Technology > Security & Privacy (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Emotion Classification in Short English Texts using Deep Learning Techniques

Bhat, Siddhanth

arXiv.org Artificial IntelligenceMar-10-2024

Detecting emotions in limited text datasets from under-resourced languages presents a formidable obstacle, demanding specialized frameworks and computational strategies. This study conducts a thorough examination of deep learning techniques for discerning emotions in short English texts. Deep learning approaches employ transfer learning and word embedding, notably BERT, to attain superior accuracy. To evaluate these methods, we introduce the "SmallEnglishEmotions" dataset, comprising 6372 varied short English texts annotated with five primary emotion categories. Our experiments reveal that transfer learning and BERT-based text embedding outperform alternative methods in accurately categorizing the text in the dataset.

dataset, english text, smallenglishemotion dataset, (9 more...)

arXiv.org Artificial Intelligence

2402.16034

Country:

Asia > India > Rajasthan > Jaipur (0.05)
North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
Europe > Spain > Aragón (0.04)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Cross-corpus Readability Compatibility Assessment for English Texts

Li, Zhenzhen, Ding, Han, Zhang, Shaohong

arXiv.org Artificial IntelligenceSep-13-2023

Text readability assessment has gained significant attention from researchers in various domains. However, the lack of exploration into corpus compatibility poses a challenge as different research groups utilize different corpora. In this study, we propose a novel evaluation framework, Cross-corpus text Readability Compatibility Assessment (CRCA), to address this issue. The framework encompasses three key components: (1) Corpus: CEFR, CLEC, CLOTH, NES, OSP, and RACE. Linguistic features, GloVe word vector representations, and their fusion features were extracted. (2) Classification models: Machine learning methods (XGBoost, SVM) and deep learning methods (BiLSTM, Attention-BiLSTM) were employed. (3) Compatibility metrics: RJSD, RRNSS, and NDCG metrics. Our findings revealed: (1) Validated corpus compatibility, with OSP standing out as significantly different from other datasets. (2) An adaptation effect among corpora, feature representations, and classification methods. (3) Consistent outcomes across the three metrics, validating the robustness of the compatibility assessment framework. The outcomes of this study offer valuable insights into corpus selection, feature representation, and classification methods, and it can also serve as a beginning effort for cross-corpus transfer learning.

cross-corpus readability compatibility assessment, english text

arXiv.org Artificial Intelligence

2306.09704

Genre: Research Report > New Finding (0.53)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.53)

Add feedback

Discourse Representation Structure Parsing for Chinese

Wang, Chunliu, Zhang, Xiao, Bos, Johan

arXiv.org Artificial IntelligenceJun-16-2023

Previous work has predominantly focused on monolingual English semantic parsing. We, instead, explore the feasibility of Chinese semantic parsing in the absence of labeled data for Chinese meaning representations. We describe the pipeline of automatically collecting the linearized Chinese meaning representation data for sequential-to sequential neural networks. We further propose a test suite designed explicitly for Chinese semantic parsing, which provides fine-grained evaluation for parsing performance, where we aim to study Chinese parsing difficulties. Our experimental results show that the difficulty of Chinese semantic parsing is mainly caused by adverbs. Realizing Chinese parsing through machine translation and an English parser yields slightly lower performance than training a model directly on Chinese data.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2306.09725

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Africa > Middle East > Egypt > Giza Governorate > Giza (0.05)
Europe > Sweden > Vaestra Goetaland > Gothenburg (0.05)
(17 more...)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Crossword: A Semantic Approach to Data Compression via Masking

Li, Mingxiao, Jin, Rui, Xiang, Liyao, Shen, Kaiming, Cui, Shuguang

arXiv.org Artificial IntelligenceApr-3-2023

The traditional methods for data compression are typically based on the symbol-level statistics, with the information source modeled as a long sequence of i.i.d. random variables or a stochastic process, thus establishing the fundamental limit as entropy for lossless compression and as mutual information for lossy compression. However, the source (including text, music, and speech) in the real world is often statistically ill-defined because of its close connection to human perception, and thus the model-driven approach can be quite suboptimal. This study places careful emphasis on English text and exploits its semantic aspect to enhance the compression efficiency further. The main idea stems from the puzzle crossword, observing that the hidden words can still be precisely reconstructed so long as some key letters are provided. The proposed masking-based strategy resembles the above game. In a nutshell, the encoder evaluates the semantic importance of each word according to the semantic loss and then masks the minor ones, while the decoder aims to recover the masked words from the semantic context by means of the Transformer. Our experiments show that the proposed semantic approach can achieve much higher compression efficiency than the traditional methods such as Huffman code and UTF-8 code, while preserving the meaning in the target text to a great extent.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2304.01106

Country:

Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Hong Kong (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report (0.51)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.94)
Information Technology > Information Management (0.88)

Add feedback

ChatGPT maker OpenAI releases 'not fully reliable' tool to detect AI generated content

The GuardianFeb-1-2023, 03:58:08 GMT

OpenAI, the research laboratory behind AI program ChatGPT, has released a tool designed to detect whether text has been written by artificial intelligence, but warns it's not completely reliable – yet. In a blog post on Tuesday, OpenAI linked to a new classifier tool that has been trained to distinguish between text written by a human and that written by a variety of AI, not just ChatGPT. Open AI researchers said that while it was "impossible to reliably detect all AI-written text", good classifiers could pick up signs that text was written by AI. The tool could be useful in cases where AI was used for "academic dishonesty" and when AI chatbots were positioned as humans, they said. But they admited the classifier "is not fully reliable" and only correctly identified 26% of AI-written English texts.

large language model, machine learning, natural language, (11 more...)

The Guardian

Country:

Oceania > Australia (0.20)
Europe > United Kingdom (0.18)

Industry: Education (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.91)

Add feedback

Predicting Lexical Complexity in English Texts: The Complex 2.0 Dataset

Shardlow, Matthew, Evans, Richard, Zampieri, Marcos

arXiv.org Artificial IntelligenceNov-3-2022

Identifying words which may cause difficulty for a reader is an essential step in most lexical text simplification systems prior to lexical substitution and can also be used for assessing the readability of a text. This task is commonly referred to as Complex Word Identification (CWI) and is often modelled as a supervised classification problem. For training such systems, annotated datasets in which words and sometimes multi-word expressions are labelled regarding complexity are required. In this paper we analyze previous work carried out in this task and investigate the properties of CWI datasets for English. We develop a protocol for the annotation of lexical complexity and use this to annotate a new dataset, CompLex 2.0. We present experiments using both new and old datasets to investigate the nature of lexical complexity. We found that a Likert-scale annotation protocol provides an objective setting that is superior for identifying the complexity of words compared to a binary annotation protocol. We release a new dataset using our new protocol to promote the task of Lexical Complexity Prediction.

annotation, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/s10579-022-09588-2

2102.08773

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Maryland > Baltimore (0.14)
North America > United States > California > San Diego County > San Diego (0.05)
(22 more...)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
(2 more...)

Add feedback