AITopics

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.59)
Information Technology > Artificial Intelligence > Machine Learning (0.39)

Neural Information Processing SystemsFeb-17-2026, 18:02:47 GMT

Supplementary Materials

During the inference, we set the background score as 0.95.

artificial intelligence, group token, machine learning, (16 more...)

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Shahriar, Sadat, Ayoobi, Navid, Mukherjee, Arjun, Musharrat, Mostafa, Vamsi, Sai Vishnu

Exposing Pink Slime Journalism: Linguistic Signatures and Robust Detection Against LLM-Generated Threats

arXiv.org Artificial IntelligenceDec-8-2025

The local news landscape, a vital source of reliable information for 28 million Americans, faces a growing threat from Pink Slime Journalism, a low-quality, auto-generated articles that mimic legitimate local reporting. Detecting these deceptive articles requires a fine-grained analysis of their linguistic, stylistic, and lexical characteristics. In this work, we conduct a comprehensive study to uncover the distinguishing patterns of Pink Slime content and propose detection strategies based on these insights. Beyond traditional generation methods, we highlight a new adversarial vector: modifications through large language models (LLMs). Our findings reveal that even consumer-accessible LLMs can significantly undermine existing detection systems, reducing their performance by up to 40% in F1-score. To counter this threat, we introduce a robust learning framework specifically designed to resist LLM-based adversarial attacks and adapt to the evolving landscape of automated pink slime journalism, and showed and improvement by up to 27%.

large language model, machine learning, natural language, (16 more...)

2512.05331

Country: North America > United States > Texas > Harris County > Houston (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Media > News (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsOct-9-2025, 10:38:57 GMT

Supplementary Materials

During the inference, we set the background score as 0.95.

artificial intelligence, group token, machine learning, (16 more...)

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Alkhuzaey, Samah, Grasso, Floriana, Payne, Terry R., Tamma, Valentina

Evaluating the Fitness of Ontologies for the Task of Question Generation

arXiv.org Artificial IntelligenceAug-28-2025

Ontology-based question generation is an important application of semantic-aware systems that enables the creation of large question banks for diverse learning environments. The effectiveness of these systems, both in terms of the calibre and cognitive difficulty of the resulting questions, depends heavily on the quality and modelling approach of the underlying ontologies, making it crucial to assess their fitness for this task. To date, there has been no comprehensive investigation into the specific ontology aspects or characteristics that affect the question generation process. Therefore, this paper proposes a set of requirements and task-specific metrics for evaluating the fitness of ontologies for question generation tasks in pedagogical settings. Using the ROMEO methodology (a structured framework used for identifying task-specific metrics), a set of evaluation metrics have been derived from an expert assessment of questions generated by a question generation model. To validate the proposed metrics, we apply them to a set of ontologies previously used in question generation to illustrate how the metric scores align with and complement findings reported in earlier studies. The analysis confirms that ontology characteristics significantly impact the effectiveness of question generation, with different ontologies exhibiting varying performance levels. This highlights the importance of assessing ontology quality with respect to Automatic Question Generation (AQG) tasks.

artificial intelligence, natural language, ontology, (13 more...)

2504.07994

Country: Europe (0.28)

Genre:

Overview (0.88)
Research Report > New Finding (0.46)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)

Abuhani, Diaa Addeen, Seccaroni, Marco, Mazzarello, Martina, Zualkernan, Imran, Duarte, Fabio, Ratti, Carlo

Unsupervised Urban Tree Biodiversity Mapping from Street-Level Imagery Using Spatially-Aware Visual Clustering

arXiv.org Artificial IntelligenceAug-26-2025

Urban tree biodiversity is critical for climate resilience, ecological stability, and livability in cities, yet most municipalities lack detailed knowledge of their canopies. Field-based inventories provide reliable estimates of Shannon and Simpson diversity but are costly and time-consuming, while supervised AI methods require labeled data that often fail to generalize across regions. We introduce an unsupervised clustering framework that integrates visual embeddings from street-level imagery with spatial planting patterns to estimate biodiversity without labels. Applied to eight North American cities, the method recovers genus-level diversity patterns with high fidelity, achieving low Wasserstein distances to ground truth for Shannon and Simpson indices and preserving spatial autocorrelation. This scalable, fine-grained approach enables biodiversity mapping in cities lacking detailed inventories and offers a pathway for continuous, low-cost monitoring to support equitable access to greenery and adaptive management of urban ecosystems.

artificial intelligence, data mining, machine learning, (19 more...)

2508.13814

Country:

North America > United States > California > San Francisco County > San Francisco (0.15)
North America > United States > California > Los Angeles County > Los Angeles (0.15)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.89)
Information Technology > Data Science > Data Mining (0.89)

arXiv.org Artificial IntelligenceAug-21-2025

Multi-scale species richness estimation with deep learning

Boussange, Victor, Wuyts, Bert, Brun, Philipp, Malle, Johanna T., Midolo, Gabriele, Portier, Jeanne, Sanchez, Théophile, Zimmermann, Niklaus E., Axmanová, Irena, Bruelheide, Helge, Chytrý, Milan, Kambach, Stephan, Lososová, Zdeňka, Večeřa, Martin, Biurrun, Idoia, Ecker, Klaus T., Lenoir, Jonathan, Svenning, Jens-Christian, Karger, Dirk Nikolaus

Biodiversity assessments are critically affected by the spatial scale at which species richness is measured. How species richness accumulates with sampling area depends on natural and anthropogenic processes whose effects can change depending on the spatial scale considered. These accumulation dynamics, described by the species-area relationship (SAR), are challenging to assess because most biodiversity surveys are restricted to sampling areas much smaller than the scales at which these processes operate. Here, we combine sampling theory and deep learning to predict local species richness within arbitrarily large sampling areas, enabling for the first time to estimate spatial differences in SARs. We demonstrate our approach by predicting vascular plant species richness across Europe and evaluate predictions against an independent dataset of plant community inventories. The resulting model, named deep SAR, delivers multi-scale species richness maps, improving coarse grain richness estimates by 32% compared to conventional methods, while delivering finer grain estimates. Additional to its predictive capabilities, we show how our deep SAR model can provide fundamental insights on the multi-scale effects of key biodiversity processes. The capacity of our approach to deliver comprehensive species richness estimates across the full spectrum of ecologically relevant scales is essential for robust biodiversity assessments and forecasts under global change.

artificial intelligence, machine learning, species richness, (19 more...)

2507.06358

Country:

Europe > Switzerland (0.28)
North America > United States (0.28)
Europe > Czechia (0.28)
Europe > Spain (0.28)

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine (0.87)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsMay-27-2025, 11:42:41 GMT

emg2qwerty: A Large Dataset with Baselines for Touch Typing using Surface Electromyography

baseline, surface electromyography, touch typing, (2 more...)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.64)
Health & Medicine > Diagnostic Medicine > Imaging (0.64)

Technology: Information Technology > Artificial Intelligence (0.62)

Tong, William L., Pehlevan, Cengiz

Learning richness modulates equality reasoning in neural networks

arXiv.org Artificial IntelligenceMar-12-2025

Equality reasoning is ubiquitous and purely abstract: sameness or difference may be evaluated no matter the nature of the underlying objects. As a result, same-different tasks (SD) have been extensively studied as a starting point for understanding abstract reasoning in humans and across animal species. With the rise of neural networks (NN) that exhibit striking apparent proficiency for abstractions, equality reasoning in NNs has also gained interest. Yet despite extensive study, conclusions about equality reasoning vary widely and with little consensus. To clarify the underlying principles in learning SD, we develop a theory of equality reasoning in multi-layer perceptrons (MLP). Following observations in comparative psychology, we propose a spectrum of behavior that ranges from conceptual to perceptual outcomes. Conceptual behavior is characterized by task-specific representations, efficient learning, and insensitivity to spurious perceptual details. Perceptual behavior is characterized by strong sensitivity to spurious perceptual details, accompanied by the need for exhaustive training to learn the task. We develop a mathematical theory to show that an MLP's behavior is driven by learning richness. Rich-regime MLPs exhibit conceptual behavior, whereas lazy-regime MLPs exhibit perceptual behavior. We validate our theoretical findings in vision SD experiments, showing that rich feature learning promotes success by encouraging hallmarks of conceptual behavior. Overall, our work identifies feature learning richness as a key parameter modulating equality reasoning, and suggests that equality reasoning in humans and animals may similarly depend on learning richness in neural circuits.

accuracy, representation, training symbol, (17 more...)

2503.09781

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > New York (0.04)

Genre: Research Report (0.82)

Industry: Health & Medicine (0.45)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Dang, Vu Minh Hoang, Verma, Rakesh M.

Autoencoder-Based Framework to Capture Vocabulary Quality in NLP

arXiv.org Artificial IntelligenceFeb-28-2025

Linguistic richness is essential for advancing natural language processing (NLP), as dataset characteristics often directly influence model performance. However, traditional metrics such as Type-Token Ratio (TTR), Vocabulary Diversity (VOCD), and Measure of Lexical Text Diversity (MTLD) do not adequately capture contextual relationships, semantic richness, and structural complexity. In this paper, we introduce an autoencoder-based framework that uses neural network capacity as a proxy for vocabulary richness, diversity, and complexity, enabling a dynamic assessment of the interplay between vocabulary size, sentence structure, and contextual depth. We validate our approach on two distinct datasets: the DIFrauD dataset, which spans multiple domains of deceptive and fraudulent text, and the Project Gutenberg dataset, representing diverse languages, genres, and historical periods. Experimental results highlight the robustness and adaptability of our method, offering practical guidance for dataset curation and NLP model design. By enhancing traditional vocabulary evaluation, our work fosters the development of more context-aware, linguistically adaptive NLP systems.

complexity, dataset, diversity, (16 more...)

2503.00209

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Texas > Harris County > Houston (0.04)
Europe > France (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)