AITopics

In many engineering and applied science domains, high-dimensional nonlinear filtering is still a challenging problem. Recent advances in score-based diffusion models offer a promising alternative for posterior sampling but require repeated retraining to track evolving priors, which is impractical in high dimensions. In this work, we propose the Conditional Score-based Filter (CSF), a novel algorithm that leverages a set-transformer encoder and a conditional diffusion model to achieve efficient and accurate posterior sampling without retraining. By decoupling prior modeling and posterior sampling into offline and online stages, CSF enables scalable score-based filtering across diverse nonlinear systems. Extensive experiments on benchmark problems show that CSF achieves superior accuracy, robustness, and efficiency across diverse nonlinear filtering scenarios.

artificial intelligence, diffusion model, machine learning, (19 more...)

2509.19816

Country: Asia > China (0.15)

Genre:

Research Report (0.50)
Workflow (0.47)
Overview (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Vision (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

CHURRO: Making History Readable with an Open-Weight Large Vision-Language Model for High-Accuracy, Low-Cost Historical Text Recognition

Semnani, Sina J., Zhang, Han, He, Xinyan, Tekgürler, Merve, Lam, Monica S.

Accurate text recognition for historical documents can greatly advance the study and preservation of cultural heritage. Existing vision-language models (VLMs), however, are designed for modern, standardized texts and are not equipped to read the diverse languages and scripts, irregular layouts, and frequent degradation found in historical materials. This paper presents CHURRO, a 3B-parameter open-weight VLM specialized for historical text recognition. The model is trained on CHURRO-DS, the largest historical text recognition dataset to date. CHURRO-DS unifies 155 historical corpora comprising 99,491 pages, spanning 22 centuries of textual heritage across 46 language clusters, including historical variants and dead languages. We evaluate several open-weight and closed VLMs and optical character recognition (OCR) systems on CHURRO-DS and find that CHURRO outperforms all other VLMs. On the CHURRO-DS test set, CHURRO achieves 82.3% (printed) and 70.1% (handwritten) normalized Levenshtein similarity, surpassing the second-best model, Gemini 2.5 Pro, by 1.4% and 6.5%, respectively, while being 15.5 times more cost-effective. By releasing the model and dataset, we aim to enable community-driven research to improve the readability of historical texts and accelerate scholarship.

large language model, machine learning, pattern recognition, (22 more...)

2509.19768

Country:

North America > United States (1.00)
Europe (1.00)
Asia > Middle East (0.67)

Genre:

Research Report (1.00)
Overview (0.92)

Industry:

Health & Medicine (1.00)
Media (0.69)
Law (0.67)
Government > Military (0.45)

Technology:

Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
(2 more...)

Ouyang, Xueqiang, Wei, Jia

Multi-Modal Artificial Intelligence of Embryo Grading and Pregnancy Prediction in Assisted Reproductive Technology: A Review

Infertility, a pressing global health concern, affects a substantial proportion of individuals worldwide. While advancements in assisted reproductive technology (ART) have offered effective interventions, conventional in vitro fertilization-embryo transfer (IVF-ET) procedures still encounter significant hurdles in enhancing pregnancy success rates. Key challenges include the inherent subjectivity in embryo grading and the inefficiency of multi-modal data integration. Against this backdrop, the adoption of AI-driven technologies has emerged as a pivotal strategy to address these issues. This article presents a comprehensive review of the progress in AI applications for embryo grading and pregnancy prediction from a novel perspective, with a specific focus on the utilization of different modal data, such as static images, time-lapse videos, and structured tabular data. The reason for this perspective is that reorganizing tasks based on data sources can not only more accurately depict the essence of the problem but also help clarify the rationality and limitations of model design. Furthermore, this review critically examines the core challenges in contemporary research, encompassing the intricacies of multi-modal feature fusion, constraints imposed by data scarcity, limitations in model generalization capabilities, and the dynamically evolving legal and regulatory frameworks. On this basis, it explicitly identifies potential avenues for future research, aiming to provide actionable guidance for advancing the application of multi-modal AI in the field of ART.

data mining, machine learning, natural language, (20 more...)

2505.20306

Country:

North America > United States (0.67)
Europe > United Kingdom > England (0.46)
Asia > China > Guangdong Province (0.28)

Genre:

Research Report > Experimental Study (1.00)
Overview (1.00)
Research Report > New Finding (0.67)

Industry:

Law (1.00)
Information Technology (1.00)
Health & Medicine > Therapeutic Area > Obstetrics/Gynecology (1.00)
(5 more...)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(5 more...)

Batista, João Eduardo, Vatai, Emil, Wahib, Mohamed

SAFE: Improving LLM Systems using Sentence-Level In-generation Attribution

Large Language Models (LLMs) are increasingly applied in various science domains, yet their broader adoption remains constrained by a critical challenge: the lack of trustworthy, verifiable outputs. Current LLMs often generate answers without reliable source attribution, or worse, with incorrect attributions, posing a barrier to their use in scientific and high-stakes settings, where traceability and accountability are paramount. To be reliable, attribution systems require high accuracy for short-length attribution on retrieved data, i.e., attribution to a sentence within a document rather than the entire document. We propose SAFE, a Sentence-level A ttribution FramEwork for Retrieve-Augmented Generation (RAG) systems that attributes generated sentences during generation. This allows users to verify sentences as they read them and correct the model when the attribution indicates the generated text is not grounded in the documents, increasing the safety of LLM systems. This framework consists of two steps: predicting the required number of references for a sentence, and attributing the sentence. Our approach achieved 95% accuracy in the first step, which translated to 2.1\~6.0% improvements in the accuracy (normalized for maximum possible accuracy) of all attribution algorithms in our clean dataset, when compared to their top-1 accuracy. We also applied SAFE in real-world scenarios with documents containing hundreds to thousands of sentences. In these settings, SAFE reliably attributed sentences to their source documents, demonstrating that the method generalizes beyond controlled benchmarks. The SAFE framework and the training dataset are publicly available on GitHub.

large language model, machine learning, natural language, (19 more...)

2505.12621

Country:

North America > United States (1.00)
Asia (1.00)
Europe (0.67)

Genre:

Research Report (1.00)
Overview (0.67)

Industry:

Health & Medicine (1.00)
Government > Regional Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Cuffless Blood Pressure Prediction from Speech Sentences using Deep Learning Methods

Kainat, null

This research presents a novel method for non-invasive arterial blood pressure (ABP) prediction using speech signals, employing a BERT -based regression model. Arterial blood pressure is a vital indicator of cardiovascular health, and accurate monitoring is essential in preventing hypertension-related complications. Traditional cuff-based methods often yield inconsistent results due to factors like white-coat and masked hypertension. Our approach leverages the acoustic characteristics of speech, capturing voice features to establish correlations with blood pressure levels. Utilizing advanced deep learning techniques, we analyze speech signals to extract relevant patterns, enabling real-time monitoring without the discomfort of conventional methods.

artificial intelligence, blood pressure, machine learning, (16 more...)

2509.1975

Country: Asia (0.46)

Genre:

Research Report > Promising Solution (1.00)
Research Report > New Finding (1.00)
Overview (1.00)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Diagnostic Medicine > Vital Signs (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

SMILES-Inspired Transfer Learning for Quantum Operators in Generative Quantum Eigensolver

Yin, Zhi, Li, Xiaoran, Zhang, Shengyu, Li, Xin, Zhang, Xiaojin

Given the inherent limitations of traditional Variational Quantum Eigensolver(VQE) algorithms, the integration of deep generative models into hybrid quantum-classical frameworks, specifically the Generative Quantum Eigensolver(GQE), represents a promising innovative approach. However, taking the Unitary Coupled Cluster with Singles and Doubles(UCCSD) ansatz which is widely used in quantum chemistry as an example, different molecular systems require constructions of distinct quantum operators. Considering the similarity of different molecules, the construction of quantum operators utilizing the similarity can reduce the computational cost significantly. Inspired by the SMILES representation method in computational chemistry, we developed a text-based representation approach for UCCSD quantum operators by leveraging the inherent representational similarities between different molecular systems. This framework explores text pattern similarities in quantum operators and employs text similarity metrics to establish a transfer learning framework. Our approach with a naive baseline setting demonstrates knowledge transfer between different molecular systems for ground-state energy calculations within the GQE paradigm. This discovery offers significant benefits for hybrid quantum-classical computation of molecular ground-state energies, substantially reducing computational resource requirements.

artificial intelligence, machine learning, natural language, (19 more...)

2509.19715

Country: Asia > China (0.47)

Genre:

Research Report > Promising Solution (0.34)
Overview > Innovation (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)

Large Language Models for Pedestrian Safety: An Application to Predicting Driver Yielding Behavior at Unsignalized Intersections

Yang, Yicheng, Li, Zixian, Bizimana, Jean Paul, Zafri, Niaz, Dong, Yongfeng, Li, Tianyi

Pedestrian safety is a critical component of urban mobility and is strongly influenced by the interactions between pedestrian decision-making and driver yielding behavior at crosswalks. Modeling driver--pedestrian interactions at intersections requires accurately capturing the complexity of these behaviors. Traditional machine learning models often struggle to capture the nuanced and context-dependent reasoning required for these multifactorial interactions, due to their reliance on fixed feature representations and limited interpretability. In contrast, large language models (LLMs) are suited for extracting patterns from heterogeneous traffic data, enabling accurate modeling of driver-pedestrian interactions. Therefore, this paper leverages multimodal LLMs through a novel prompt design that incorporates domain-specific knowledge, structured reasoning, and few-shot prompting, enabling interpretable and context-aware inference of driver yielding behavior, as an example application of modeling pedestrian--driver interaction. We benchmarked state-of-the-art LLMs against traditional classifiers, finding that GPT-4o consistently achieves the highest accuracy and recall, while Deepseek-V3 excels in precision. These findings highlight the critical trade-offs between model performance and computational efficiency, offering practical guidance for deploying LLMs in real-world pedestrian safety systems.

large language model, machine learning, natural language, (20 more...)

2509.19657

Country: North America > United States (1.00)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Overview (0.92)

Industry:

Transportation > Ground > Road (1.00)
Government > Regional Government > North America Government > United States Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Tika, Salma, Haqiq, Abdelkrim, Sabir, Essaid, Driouch, Elmahdi

Where 6G Stands Today: Evolution, Enablers, and Research Gaps

Abstract--As the fifth-generation (5G) mobile communication system continues its global deployment, both industry and academia have started conceptualizing the 6th generation (6G) to address the growing need for a progressively advanced and digital society. Even while 5G offers considerable advancements over L TE, it could struggle to be sufficient to meet all of the requirements, including ultra-high reliability, seamless automation, and ubiquitous coverage. In response, 6G is supposed to bring out a highly intelligent, automated, and ultra-reliable communication system that can handle a vast number of connected devices. This paper offers a comprehensive overview of 6G, beginning with its main stringent requirements while focusing on key enabling technologies such as terahertz (THz) communications, intelligent reflecting surfaces, massive MIMO and AI-driven networking that will shape the 6G networks. Furthermore, the paper lists various 6G applications and usage scenarios that will benefit from these advancements. At the end, we outline the potential challenges that must be addressed to achieve the 6G promises. Keywords-- 6 G, Usage Scenarios, Capabilities, Enabling technologies, Challenges. I. INTRODUCTION The wireless industry has continuously evolved and it is is one of the few industry sectors that have kept a fast-growing trend, with each generation introducing higher frequencies, larger bandwidths, and faster data rates [1]. Since Marconi's wireless telegraphy in the 19th century, mobile networks have advanced from 1G's basic voice services to 5G's ultra-high-definition 3D data transmission. Researchers are currently focusing on 6G as 5G deployment expands throughout the world and is anticipated to be realized by 2030.

artificial intelligence, communication, machine learning, (18 more...)

2509.19646

Country: North America > Canada > Quebec (0.14)

Genre: Overview > Innovation (0.46)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
Energy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Internet of Things (1.00)
Information Technology > Data Science (1.00)
(4 more...)

Meow: End-to-End Outline Writing for Automatic Academic Survey

Ma, Zhaoyu, Shan, Yuan, Zhao, Jiahao, Xu, Nan, Wang, Lei

As academic paper publication numbers grow exponentially, conducting in-depth surveys with LLMs automatically has become an inevitable trend. Outline writing, which aims to systematically organize related works, is critical for automated survey generation. Yet existing automatic survey methods treat outline writing as mere workflow steps in the overall pipeline. Such template-based workflows produce outlines that lack in-depth understanding of the survey topic and fine-grained styles. To address these limitations, we propose Meow, the first metadata-driven outline writing framework that produces organized and faithful outlines efficiently. Specifically, we first formulate outline writing as an end-to-end task that generates hierarchical structured outlines from paper metadata. We then curate a high-quality dataset of surveys from arXiv, bioRxiv, and medRxiv, and establish systematic evaluation metrics for outline quality assessment. Finally, we employ a two-stage training approach combining supervised fine-tuning and reinforcement learning. Our 8B reasoning model demonstrates strong performance with high structural fidelity and stylistic coherence.

large language model, machine learning, natural language, (16 more...)

2509.1937

Country:

Europe > Ukraine (0.14)
Europe > Denmark (0.14)
Asia > China (0.14)

Genre:

Overview (0.71)
Workflow (0.68)
Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.98)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Charting a Decade of Computational Linguistics in Italy: The CLiC-it Corpus

Alzetta, Chiara, Auriemma, Serena, Bondielli, Alessandro, Dini, Luca, Fazzone, Chiara, Miaschi, Alessio, Miliani, Martina, Sartor, Marta

Over the past decade, Computational Linguistics (CL) and Natural Language Processing (NLP) have evolved rapidly, especially with the advent of Transformer-based Large Language Models (LLMs). This shift has transformed research goals and priorities, from Lexical and Semantic Resources to Language Modelling and Multimodality. In this study, we track the research trends of the Italian CL and NLP community through an analysis of the contributions to CLiC-it, arguably the leading Italian conference in the field. We compile the proceedings from the first 10 editions of the CLiC-it conference (from 2014 to 2024) into the CLiC-it Corpus, providing a comprehensive analysis of both its metadata, including author provenance, gender, affiliations, and more, as well as the content of the papers themselves, which address various topics. Our goal is to provide the Italian and international research communities with valuable insights into emerging trends and key developments over time, supporting informed decisions and future directions in the field.

large language model, machine learning, natural language, (19 more...)

2509.19033

Country:

Asia (1.00)
Europe > Italy (0.51)

Genre:

Overview (1.00)
Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.93)