AITopics | dictionary

Collaborating Authors

dictionary

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Is the Dictionary Done For?

The New YorkerDec-22-2025, 11:00:00 GMT

Is the Dictionary Done For? The print edition of Merriam-Webster was once a touchstone of authority and stability. Then the internet brought about a revolution. Wars over words are inevitably culture wars, and debates over the dictionary have raged for as long as it has existed. Once, every middle-class home had a piano and a dictionary. The purpose of the piano was to be able to listen to music before phonographs were available and affordable. Later on, it was to torture young persons by insisting that they learn to do something few people do well. The purpose of the dictionary was to settle intra-family disputes over the spelling of words like "camaraderie" and "sesquipedalian," or over the correct pronunciation of "puttee." This was the state of the world not that long ago. In the late nineteen-eighties, Merriam-Webster's Collegiate Dictionary was on the best-seller list for a hundred and fifty-five consecutive weeks. Fifty-seven million copies were sold, a number believed to be second only, in this country, to sales of the Bible. There was good money in the word business.

dictionary, english, merriam-webster, (16 more...)

The New Yorker

Country:

North America > United States > New York (0.05)
North America > United States > New Hampshire > Grafton County > Hanover (0.04)
North America > United States > Massachusetts > Hampden County > Springfield (0.04)
(5 more...)

Genre: Summary/Review (0.64)

Industry:

Leisure & Entertainment (1.00)
Media (0.93)
Health & Medicine > Therapeutic Area (0.46)
Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)

Add feedback

Shona spaCy: A Morphological Analyzer for an Under-Resourced Bantu Language

Masoka, Happymore

arXiv.org Artificial IntelligenceNov-24-2025

Despite rapid advances in multilingual natural language processing (NLP), the Bantu language Shona remains under-served in terms of morphological analysis and language-aware tools. This paper presents Shona spaCy, an open-source, rule-based morphological pipeline for Shona built on the spaCy framework. The system combines a curated JSON lexicon with linguistically grounded rules to model noun-class prefixes (Mupanda 1-18), verbal subject concords, tense-aspect markers, ideophones, and clitics, integrating these into token-level annotations for lemma, part-of-speech, and morphological features. The toolkit is available via pip install shona-spacy, with source code at https://github.com/HappymoreMasoka/shona-spacy and a PyPI release at https://pypi.org/project/shona-spacy/0.1.4/. Evaluation on formal and informal Shona corpora yields 90% POS-tagging accuracy and 88% morphological-feature accuracy, while maintaining transparency in its linguistic decisions. By bridging descriptive grammar and computational implementation, Shona spaCy advances NLP accessibility and digital inclusion for Shona speakers and provides a template for morphological analysis tools for other under-resourced Bantu languages.

artificial intelligence, natural language, shona, (18 more...)

arXiv.org Artificial Intelligence

2511.1668

Country: Africa > Zimbabwe (0.16)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (0.52)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.47)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.47)

Add feedback

Do you know your 2025 lingo? As 'parasocial' is named word of the year, take the test to see if you can keep up with this year's trending language

Daily Mail - Science & techNov-18-2025, 11:44:22 GMT

The truth behind Trump's dramatic late-night Epstein file reversal: It wasn't a gamble, it was a tactic... and White House insiders say it's Democrats who will pay the price Doctor's warning about lesser discussed Mounjaro side effect - which has similar symptom to deadly bowel cancer The incredible new treatment that can cure liver cancer - without surgery, drugs or radiation. Roger had cirrhosis and thought he was going to die. Now he says: 'I'm so grateful' X is DOWN: Elon Musk's social media app crashes for thousands of users around the world Tom Cruise breaks his silence over ex-wife Nicole Kidman's split from Keith Urban: 'Karma' North Korea executes'big shot' couple who became'arrogant' after the success of their business, accusing them of being'anti-republic' Movie icon'lost her virginity to her stepfather at 11', seduced her friend's 17-year-old son... but took a forbidden secret to her grave Charlie Kirk's head of security finally explains the unusual hand signals his team made just moments before kill shot rang out Trump is being utterly humiliated by a dead pedophile. MAGA and his legacy are collapsing. AMANDA PLATELL: Everyone is saying the same thing about pampered Princess Beatrice and her latest PR stunt.

artificial intelligence, dictionary, social media, (16 more...)

Daily Mail - Science & tech

Country:

Asia > North Korea (0.24)
North America > Canada > Alberta (0.14)
South America > Brazil (0.04)
(10 more...)

Genre: Personal (0.93)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.68)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence (1.00)

Add feedback

InnovatorBench: Evaluating Agents' Ability to Conduct Innovative LLM Research

Wu, Yunze, Fu, Dayuan, Si, Weiye, Huang, Zhen, Jiang, Mohan, Li, Keyu, Xia, Shijie, Sun, Jie, Xu, Tianze, Hu, Xiangkun, Lu, Pengrui, Cai, Xiaojie, Ye, Lyumanshan, Zhu, Wenhong, Xiao, Yang, Liu, Pengfei

arXiv.org Artificial IntelligenceNov-4-2025

AI agents could accelerate scientific discovery by automating hypothesis formation, experiment design, coding, execution, and analysis, yet existing benchmarks probe narrow skills in simplified settings. To address this gap, we introduce InnovatorBench, a benchmark-platform pair for realistic, end-to-end assessment of agents performing Large Language Model (LLM) research. It comprises 20 tasks spanning Data Construction, Filtering, Augmentation, Loss Design, Reward Design, and Scaffold Construction, which require runnable artifacts and assessment of correctness, performance, output quality, and uncertainty. To support agent operation, we develop ResearchGym, a research environment offering rich action spaces, distributed and long-horizon execution, asynchronous monitoring, and snapshot saving. We also implement a lightweight ReAct agent that couples explicit reasoning with executable planning using frontier models such as Claude-4, GPT-5, GLM-4.5, and Kimi-K2. Our experiments demonstrate that while frontier models show promise in code-driven research tasks, they struggle with fragile algorithm-related tasks and long-horizon decision making, such as impatience, poor resource management, and overreliance on template-based reasoning. Furthermore, agents require over 11 hours to achieve their best performance on InnovatorBench, underscoring the benchmark's difficulty and showing the potential of InnovatorBench to be the next generation of code-based research benchmark.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2510.27598

Country: Asia (0.46)

Genre: Research Report > Promising Solution (0.45)

Industry: Information Technology > Software (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Vision-Enabled LLMs in Historical Lexicography: Digitising and Enriching Estonian-German Dictionaries from the 17th and 18th Centuries

Jürviste, Madis, Jakobson, Joonatan

arXiv.org Artificial IntelligenceOct-10-2025

This article presents research conducted at the Institute of the Estonian Language between 2022 and 2025 on the application of large language models (LLMs) to the study of 17th and 18th century Estonian dictionaries. The authors address three main areas: enriching historical dictionaries with modern word forms and meanings; using vision-enabled LLMs to perform text recognition on sources printed in Gothic script (Fraktur); and preparing for the creation of a unified, cross-source dataset. Initial experiments with J. Gutslaff's 1648 dictionary indicate that LLMs have significant potential for semi-automatic enrichment of dictionary information. When provided with sufficient context, Claude 3.7 Sonnet accurately provided meanings and modern equivalents for 81% of headword entries. In a text recognition experiment with A. T. Helle's 1732 dictionary, a zero-shot method successfully identified and structured 41% of headword entries into error-free JSON-formatted output. For digitising the Estonian-German dictionary section of A. W. Hupel's 1780 grammar, overlapping tiling of scanned image files is employed, with one LLM being used for text recognition and a second for merging the structured output. These findings demonstrate that even for minor languages LLMs have a significant potential for saving time and financial resources.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2510.07931

Country: Europe > Estonia (0.52)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Radiological and Biological Dictionary of Radiomics Features: Addressing Understandable AI Issues in Personalized Breast Cancer; Dictionary Version BM1.0

Gorji, Arman, Sanati, Nima, Pouria, Amir Hossein, Mehrnia, Somayeh Sadat, Hacihaliloglu, Ilker, Rahmim, Arman, Salmanpour, Mohammad R.

arXiv.org Artificial IntelligenceJul-23-2025

Radiomics-based AI models show promise for breast cancer diagnosis but often lack interpretability, limiting clinical adoption. This study addresses the gap between radiomic features (RF) and the standardized BI-RADS lexicon by proposing a dual-dictionary framework. First, a Clinically-Informed Feature Interpretation Dictionary (CIFID) was created by mapping 56 RFs to BI-RADS descriptors (shape, margin, internal enhancement) through literature and expert review. The framework was applied to classify triple-negative breast cancer (TNBC) versus non-TNBC using dynamic contrast-enhanced MRI from a multi-institutional cohort of 1,549 patients. We trained 27 machine learning classifiers with 27 feature selection methods. SHapley Additive exPlanations (SHAP) were used to interpret predictions and generate a complementary Data-Driven Feature Interpretation Dictionary (DDFID) for 52 additional RFs. The best model, combining Variance Inflation Factor (VIF) selection with Extra Trees Classifier, achieved an average cross-validation accuracy of 0.83. Key predictive RFs aligned with clinical knowledge: higher Sphericity (round/oval shape) and lower Busyness (more homogeneous enhancement) were associated with TNBC. The framework confirmed known imaging biomarkers and uncovered novel, interpretable associations. This dual-dictionary approach (BM1.0) enhances AI model transparency and supports the integration of RFs into routine breast cancer diagnosis and personalized care.

artificial intelligence, descriptor, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2507.16041

Country:

North America > Canada > British Columbia (0.28)
Asia > Middle East > Iran (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.68)

Industry: Health & Medicine > Therapeutic Area > Oncology > Breast Cancer (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.66)

Add feedback

SenWiCh: Sense-Annotation of Low-Resource Languages for WiC using Hybrid Methods

Goworek, Roksana, Karlcut, Harpal, Shezad, Muhammad, Darshana, Nijaguna, Mane, Abhishek, Bondada, Syam, Sikka, Raghav, Mammadov, Ulvi, Allahverdiyev, Rauf, Purighella, Sriram, Gupta, Paridhi, Ndegwa, Muhinyia, Dubossarsky, Haim

arXiv.org Artificial IntelligenceJul-23-2025

This paper addresses the critical need for high-quality evaluation datasets in low-resource languages to advance cross-lingual transfer. While cross-lingual transfer offers a key strategy for leveraging multilingual pretraining to expand language technologies to understudied and typologically diverse languages, its effectiveness is dependent on quality and suitable benchmarks. We release new sense-annotated datasets of sentences containing polysemous words, spanning ten low-resource languages across diverse language families and scripts. To facilitate dataset creation, the paper presents a demonstrably beneficial semi-automatic annotation method. The utility of the datasets is demonstrated through Word-in-Context (WiC) formatted experiments that evaluate transfer on these low-resource languages. Results highlight the importance of targeted dataset creation and evaluation for effective polysemy disambiguation in low-resource settings and transfer studies. The released datasets and code aim to support further research into fair, robust, and truly multilingual NLP.

computational linguistic, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2505.23714

Country:

Europe (1.00)
Africa (0.93)
North America > United States > Minnesota (0.28)
Asia > Japan > Honshū (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
(3 more...)

Add feedback

From Dictionary to Tensor: A Scalable Multi-View Subspace Clustering Framework with Triple Information Enhancement

Neural Information Processing SystemsMay-27-2025, 14:33:21 GMT

While Tensor-based Multi-view Subspace Clustering (TMSC) has garnered significant attention for its capacity to effectively capture high-order correlations among multiple views, three notable limitations in current TMSC methods necessitate consideration: 1) high computational complexity and reliance on dictionary completeness resulting from using observed data as the dictionary, 2) inaccurate subspace representation stemming from the oversight of local geometric information and 3) under-penalization of noise-related singular values within tensor data caused by treating all singular values equally. Notably, an enhanced anchor dictionary learning mechanism has been utilized to recover the low-rank anchor structure, resulting in reduced computational complexity and increased resilience, especially in scenarios with inadequate dictionaries. Additionally, we introduce an anchor hypergraph Laplacian regularizer to preserve the inherent geometry of the data within the subspace representation. Simultaneously, an improved hyperbolic tangent function has been employed as a precise approximation for tensor rank, effectively capturing the significant variations in singular values. Extensive experimentation on a variety of datasets demonstrates that our approach surpasses SOTA methods in both effectiveness and efficiency.

scalable multi-view subspace clustering framework, textbf, triple information enhancement, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.42)

Add feedback

Towards an AI-Driven Video-Based American Sign Language Dictionary: Exploring Design and Usage Experience with Learners

Hassan, Saad, Bohacek, Matyas, Kim, Chaelin, Crochet, Denise

arXiv.org Artificial IntelligenceApr-9-2025

Searching for unfamiliar American Sign Language (ASL) signs is challenging for learners because, unlike spoken languages, they cannot type a text-based query to look up an unfamiliar sign. Advances in isolated sign recognition have enabled the creation of video-based dictionaries, allowing users to submit a video and receive a list of the closest matching signs. Previous HCI research using Wizard-of-Oz prototypes has explored interface designs for ASL dictionaries. Building on these studies, we incorporate their design recommendations and leverage state-of-the-art sign-recognition technology to develop an automated video-based dictionary. We also present findings from an observational study with twelve novice ASL learners who used this dictionary during video-comprehension and question-answering tasks. Our results address human-AI interaction challenges not covered in previous WoZ research, including recording and resubmitting signs, unpredictable outputs, system latency, and privacy concerns. These insights offer guidance for designing and deploying video-based ASL dictionary systems.

data mining, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2504.05857

Country:

North America > United States (1.00)
Europe (1.00)

Genre: Research Report > New Finding (1.00)

Industry: Education > Curriculum > Subject-Specific Education (0.37)

Technology:

Information Technology > Human Computer Interaction (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.48)
Information Technology > Artificial Intelligence > Machine Learning (0.46)
(3 more...)

Add feedback

Inaccuracy of an E-Dictionary and Its Influence on Chinese Language Users

Wang, Xi, Meng, Fanfei, Zhang, Shiyang, Li, Lan

arXiv.org Artificial IntelligenceApr-1-2025

Electronic dictionaries have largely replaced paper dictionaries and become central tools for L2 learners seeking to expand their vocabulary. Users often assume these resources are reliable and rarely question the validity of the definitions provided. The accuracy of major E-dictionaries is seldom scrutinized, and little attention has been paid to how their corpora are constructed. Research on dictionary use, particularly the limitations of electronic dictionaries, remains scarce. This study adopts a combined method of experimentation, user survey, and dictionary critique to examine Youdao, one of the most widely used E-dictionaries in China. The experiment involved a translation task paired with retrospective reflection. Participants were asked to translate sentences containing words that are insufficiently or inaccurately defined in Youdao. Their consultation behavior was recorded to analyze how faulty definitions influenced comprehension. Results show that incomplete or misleading definitions can cause serious misunderstandings. Additionally, students exhibited problematic consultation habits. The study further explores how such flawed definitions originate, highlighting issues in data processing and the integration of AI and machine learning technologies in dictionary construction. The findings suggest a need for better training in dictionary literacy for users, as well as improvements in the underlying AI models used to build E-dictionaries.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2504.00799

Country:

Asia > China > Hong Kong (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre:

Research Report > New Finding (1.00)
Questionnaire & Opinion Survey (1.00)

Industry: Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.86)

Add feedback