AITopics | zenodo

Collaborating Authors

zenodo

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

f94d5edb5c01715d879693ddbfdc1b98-Supplemental-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsFeb-12-2026, 22:48:00 GMT

dataset, model zoo, zoo, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.35)

Add feedback

a6fe99561d9eb9c90b322afe664587fd-Supplemental-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsFeb-11-2026, 04:42:25 GMT

dataset, imagery, please provide, (17 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Syria (0.04)
North America > United States > Oregon (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Industry:

Law (1.00)
Government (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Policy Cards: Machine-Readable Runtime Governance for Autonomous AI Agents

Mavračić, Juraj

arXiv.org Artificial IntelligenceOct-29-2025

Policy Cards are introduced as a machine-readable, deployment-layer standard for expressing operational, regulatory, and ethical constraints for AI agents. The Policy Card sits with the agent and enables it to follow required constraints at runtime. It tells the agent what it must and must not do. As such, it becomes an integral part of the deployed agent. Policy Cards extend existing transparency artifacts such as Model, Data, and System Cards by defining a normative layer that encodes allow/deny rules, obligations, evidentiary requirements, and crosswalk mappings to assurance frameworks including NIST AI RMF, ISO/IEC 42001, and the EU AI Act. Each Policy Card can be validated automatically, version-controlled, and linked to runtime enforcement or continuous-audit pipelines. The framework enables verifiable compliance for autonomous agents, forming a foundation for distributed assurance in multi-agent ecosystems. Policy Cards provide a practical mechanism for integrating high-level governance with hands-on engineering practice and enabling accountable autonomy at scale.

agent, artificial intelligence, policy card, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.5281/zenodo.17464706

2510.24383

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Maryland > Montgomery County > Gaithersburg (0.04)

Genre: Research Report (0.64)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
Government (1.00)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Add feedback

AutoSciDACT: Automated Scientific Discovery through Contrastive Embedding and Hypothesis Testing

Bright-Thonney, Samuel, Reissel, Christina, Grosso, Gaia, Woodward, Nathaniel, Govorkova, Katya, Novak, Andrzej, Park, Sang Eon, Moreno, Eric, Harris, Philip

arXiv.org Machine LearningOct-28-2025

Novelty detection in large scientific datasets faces two key challenges: the noisy and high-dimensional nature of experimental data, and the necessity of making statistically robust statements about any observed outliers. While there is a wealth of literature on anomaly detection via dimensionality reduction, most methods do not produce outputs compatible with quantifiable claims of scientific discovery. In this work we directly address these challenges, presenting the first step towards a unified pipeline for novelty detection adapted for the rigorous statistical demands of science. We introduce AutoSciDACT (Automated Scientific Discovery with Anomalous Contrastive Testing), a general-purpose pipeline for detecting novelty in scientific data. AutoSciDACT begins by creating expressive low-dimensional data representations using a contrastive pre-training, leveraging the abundance of high-quality simulated data in many scientific domains alongside expertise that can guide principled data augmentation strategies. These compact embeddings then enable an extremely sensitive machine learning-based two-sample test using the New Physics Learning Machine (NPLM) framework, which identifies and statistically quantifies deviations in observed data relative to a reference distribution (null hypothesis). We perform experiments across a range of astronomical, physical, biological, image, and synthetic datasets, demonstrating strong sensitivity to small injections of anomalous data across all domains.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Machine Learning

2510.21935

Country:

Oceania > New Zealand (0.04)
Oceania > Cook Islands (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
(5 more...)

Genre: Research Report > New Finding (0.67)

Industry:

Health & Medicine (1.00)
Education > Curriculum > Subject-Specific Education (0.34)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Multilingual Clinical NER for Diseases and Medications Recognition in Cardiology Texts using BERT Embeddings

Danu, Manuela Daniela, Marica, George, Suciu, Constantin, Itu, Lucian Mihai, Farri, Oladimeji

arXiv.org Artificial IntelligenceOct-21-2025

The rapidly increasing volume of electronic health record (EHR) data underscores a pressing need to unlock biomedical knowledge from unstructured clinical texts to support advancements in data-driven clinical systems, including patient diagnosis, disease progression monitoring, treatment effects assessment, prediction of future clinical events, etc. While contextualized language models have demonstrated impressive performance improvements for named entity recognition (NER) systems in English corpora, there remains a scarcity of research focused on clinical texts in low-resource languages. To bridge this gap, our study aims to develop multiple deep contextual embedding models to enhance clinical NER in the cardiology domain, as part of the BioASQ MultiCardioNER shared task. We explore the effectiveness of different monolingual and multilingual BERT-based models, trained on general domain text, for extracting disease and medication mentions from clinical case reports written in English, Spanish, and Italian. We achieved an F1-score of 77.88% on Spanish Diseases Recognition (SDR), 92.09% on Spanish Medications Recognition (SMR), 91.74% on English Medications Recognition (EMR), and 88.9% on Italian Medications Recognition (IMR). These results outperform the mean and median F1 scores in the test leaderboard across all subtasks, with the mean/median values being: 69.61%/75.66% for SDR, 81.22%/90.18% for SMR, 89.2%/88.96% for EMR, and 82.8%/87.76% for IMR.

large language model, machine learning, recognition, (21 more...)

arXiv.org Artificial Intelligence

2510.17437

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Romania > Centru Development Region > Brașov County > Brașov (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Health Care Technology > Medical Record (0.69)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.63)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

The German Commons - 154 Billion Tokens of Openly Licensed Text for German Language Models

Gienapp, Lukas, Schröder, Christopher, Schweter, Stefan, Akiki, Christopher, Schlatt, Ferdinand, Zimmermann, Arden, Genêt, Phillipe, Potthast, Martin

arXiv.org Artificial IntelligenceOct-17-2025

Large language model development relies on large-scale training corpora, yet most contain data of unclear licensing status, limiting the development of truly open models. This problem is exacerbated for non-English languages, where openly licensed text remains critically scarce. We introduce the German Commons, the largest collection of openly licensed German text to date. It compiles data from 41 sources across seven domains, encompassing legal, scientific, cultural, political, news, economic, and web text. Through systematic sourcing from established data providers with verifiable licensing, it yields 154.56 billion tokens of high-quality text for language model training. Our processing pipeline implements comprehensive quality filtering, deduplication, and text formatting fixes, ensuring consistent quality across heterogeneous text sources. All domain subsets feature licenses of at least CC-BY-SA 4.0 or equivalent, ensuring legal compliance for model training and redistribution. The German Commons therefore addresses the critical gap in openly licensed German pretraining data, and enables the development of truly open German language models. We also release code for corpus construction and data filtering tailored to German language text, rendering the German Commons fully reproducible and extensible.

artificial intelligence, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

2510.13996

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.28)
Europe > Switzerland > Zürich > Zürich (0.14)
Europe > Austria > Vienna (0.14)
(10 more...)

Genre: Research Report (1.00)

Industry:

Government > Regional Government > Europe Government (0.68)
Law > Statutes (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)

Add feedback

CHURRO: Making History Readable with an Open-Weight Large Vision-Language Model for High-Accuracy, Low-Cost Historical Text Recognition

Semnani, Sina J., Zhang, Han, He, Xinyan, Tekgürler, Merve, Lam, Monica S.

arXiv.org Artificial IntelligenceSep-25-2025

Accurate text recognition for historical documents can greatly advance the study and preservation of cultural heritage. Existing vision-language models (VLMs), however, are designed for modern, standardized texts and are not equipped to read the diverse languages and scripts, irregular layouts, and frequent degradation found in historical materials. This paper presents CHURRO, a 3B-parameter open-weight VLM specialized for historical text recognition. The model is trained on CHURRO-DS, the largest historical text recognition dataset to date. CHURRO-DS unifies 155 historical corpora comprising 99,491 pages, spanning 22 centuries of textual heritage across 46 language clusters, including historical variants and dead languages. We evaluate several open-weight and closed VLMs and optical character recognition (OCR) systems on CHURRO-DS and find that CHURRO outperforms all other VLMs. On the CHURRO-DS test set, CHURRO achieves 82.3% (printed) and 70.1% (handwritten) normalized Levenshtein similarity, surpassing the second-best model, Gemini 2.5 Pro, by 1.4% and 6.5%, respectively, while being 15.5 times more cost-effective. By releasing the model and dataset, we aim to enable community-driven research to improve the readability of historical texts and accelerate scholarship.

large language model, machine learning, pattern recognition, (23 more...)

arXiv.org Artificial Intelligence

2509.19768

Country:

Europe > Austria > Vienna (0.14)
North America > Haiti (0.14)
Europe > France > Île-de-France > Paris > Paris (0.14)
(31 more...)

Genre:

Research Report (1.00)
Overview (0.92)

Industry:

Health & Medicine (1.00)
Media (0.69)
Law (0.67)
Government > Military (0.45)

Technology:

Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
(2 more...)

Add feedback

GateTS: Versatile and Efficient Forecasting via Attention-Inspired routed Mixture-of-Experts

Yemets, Kyrylo, Lukashchuk, Mykola, Izonin, Ivan

arXiv.org Artificial IntelligenceAug-26-2025

Accurate univariate forecasting remains a pressing need in real-world systems, such as energy markets, hydrology, retail demand, and IoT monitoring, where signals are often intermittent and horizons span both short- and long-term. While transformers and Mixture-of-Experts (MoE) architectures are increasingly favored for time-series forecasting, a key gap persists: MoE models typically require complicated training with both the main forecasting loss and auxiliary load-balancing losses, along with careful routing/temperature tuning, which hinders practical adoption. In this paper, we propose a model architecture that simplifies the training process for univariate time series forecasting and effectively addresses both long- and short-term horizons, including intermittent patterns. Our approach combines sparse MoE computation with a novel attention-inspired gating mechanism that replaces the traditional one-layer softmax router. Through extensive empirical evaluation, we demonstrate that our gating design naturally promotes balanced expert utilization and achieves superior predictive accuracy without requiring the auxiliary load-balancing losses typically used in classical MoE implementations. The model achieves better performance while utilizing only a fraction of the parameters required by state-of-the-art transformer models, such as PatchTST. Furthermore, experiments across diverse datasets confirm that our MoE architecture with the proposed gating mechanism is more computationally efficient than LSTM for both long- and short-term forecasting, enabling cost-effective inference. These results highlight the potential of our approach for practical time-series forecasting applications where both accuracy and computational efficiency are critical.

artificial intelligence, forecasting, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2508.17515

Country: