AITopics | validation and test

Collaborating Authors

validation and test

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Supplementary Material RE

Neural Information Processing SystemsFeb-10-2026, 20:58:31 GMT

D.3 Open source performance on mini test set . . . . . . . . . . . . . . . . . . . . . A.1 V ersion 2 We have fixed some bugs in the evaluation code, resulting in slight differences compared to the previous release. The issue was that 149 samples were not evaluated in the previous version, and these have now been included in the new update. A.2 V ersion 3 We have clarified certain statements and added experimental results to address the reviewer's questions. B.1 Limitations Despite these advancements, our dataset does exhibit certain limitations, largely stemming from inherited biases from the source datasets: Currently, we only address scenarios where both the question and the answer span a single time duration. Given a question, the annotated time span must be a single, continuous duration, which might be limiting for all scenes. The presence of noisy or inaccurate annotations in the source datasets, including captions and timestamps, poses a challenge. Despite our efforts, some of these errors could not be automatically filtered out. The extent of this issue is detailed in the qualitative visualization conducted by our human reviewers, as presented in supplementary. The average duration of ground truth events in our dataset is relatively long. This characteristic has the unintended consequence of hindering the models' ability to detect and analyze fine-grained actions within shorter video segments. These drawbacks highlight areas for potential improvement and indicate the necessity for ongoing refinement to ensure the creation of more accurate and unbiased video language models. B.2 Social Impact Though we provide an assessment of temporal reasoning and moment localization, the types and scene diversity are still limited. We inherit the video classes from the two source video datasets, which may not be sufficient for a comprehensive assessment of all kinds of temporal reasoning. This limitation could introduce a bias. For both curated data and video data, they do not contain any personally identifiable information. Besides, some of the video samples in the source datasets might be slightly uncomfortable depending on the viewer. For example, some videos discuss tattoos and piercings, and some of them present news about social events including demonstrations or war reports. However, we only release the data of curated question-answer and time span.

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

North America > United States (0.04)
Asia > Taiwan (0.04)

Genre: Research Report (0.68)

Industry:

Law (1.00)
Information Technology (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

MEGConformer: Conformer-Based MEG Decoder for Robust Speech and Phoneme Classification

de Zuazo, Xabier, Saratxaga, Ibon, Navas, Eva

arXiv.org Artificial IntelligenceDec-2-2025

For Speech Detection, a MEG-oriented SpecAugment provided a first exploration of MEG-specific augmentation. For Phoneme Classification, we used inverse-square-root class weighting and a dynamic grouping loader to handle 100-sample averaged examples. In addition, a simple instance-level normalization proved critical to mitigate distribution shifts on the holdout split. Using the official Standard track splits and F1-macro for model selection, our best systems achieved 88.9% (Speech) and 65.8% (Phoneme) on the leaderboard, surpassing the competition baselines and ranking within the top-10 in both tasks.

artificial intelligence, deep learning, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2512.01443

Country: Europe > Spain (0.28)

Genre: Research Report > Experimental Study (0.48)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

32683193e1d0e7a5795b073acecb3549-Supplemental-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsOct-9-2025, 22:46:28 GMT

dataset, please provide, video, (16 more...)

Neural Information Processing Systems

Country: Asia > Taiwan (0.04)

Genre: Research Report (0.68)

Industry:

Law (1.00)
Information Technology (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Communications (0.68)

Add feedback

7fd3b80fb1884e2927df46a7139bb8bf-Supplemental.pdf

Neural Information Processing SystemsOct-3-2025, 09:25:52 GMT

configuration, dataset, hyper-parameter range, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Detection of Chagas Disease from the ECG: The George B. Moody PhysioNet Challenge 2025

Reyna, Matthew A., Koscova, Zuzana, Pavlus, Jan, Saghafi, Soheil, Weigle, James, Elola, Andoni, Seyedi, Salman, Campbell, Kiersten, Li, Qiao, Rad, Ali Bahrami, Ribeiro, Antônio H., Ribeiro, Antonio Luiz P., Sameni, Reza, Clifford, Gari D.

arXiv.org Artificial IntelligenceOct-3-2025

Objective: Chagas disease is a parasitic infection that is endemic to South America, Central America, and, more recently, the U.S., primarily transmitted by insects. Chronic Chagas disease can cause cardiovascular diseases and digestive problems. Serological testing capacities for Chagas disease are limited, but Chagas cardiomyopathy often manifests in ECGs, providing an opportunity to prioritize patients for testing and treatment. Approach: The George B. Moody PhysioNet Challenge 2025 invites teams to develop algorithmic approaches for identifying Chagas disease from electrocardiograms (ECGs). Main results: This Challenge provides multiple innovations. First, we leveraged several datasets with labels from patient reports and serological testing, provided a large dataset with weak labels and smaller datasets with strong labels. Second, we augmented the data to support model robustness and generalizability to unseen data sources. Third, we applied an evaluation metric that captured the local serological testing capacity for Chagas disease to frame the machine learning problem as a triage task. Significance: Over 630 participants from 111 teams submitted over 1300 entries during the Challenge, representing diverse approaches from academia and industry worldwide.

artificial intelligence, chagas disease, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2510.02202

Country:

North America > United States (0.93)
Europe (0.93)
South America > Brazil > Minas Gerais (0.14)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback

Are Virtual DES Images a Valid Alternative to the Real Ones?

Perre, Ana C., Alexandre, Luís A., Freire, Luís C.

arXiv.org Artificial IntelligenceAug-22-2025

Contrast-enhanced spectral mammography (CESM) is an imaging modality that provides two types of images, commonly known as low-energy (LE) and dual-energy subtracted (DES) images. In many domains, particularly in medicine, the emergence of image-to-image translation techniques has enabled the artificial generation of images using other images as input. Within CESM, applying such techniques to generate DES images from LE images could be highly beneficial, potentially reducing patient exposure to radiation associated with high-energy image acquisition. In this study, we investigated three models for the artificial generation of DES images (virtual DES): a pre-trained U-Net model, a U-Net trained end-to-end model, and a CycleGAN model. We also performed a series of experiments to assess the impact of using virtual DES images on the classification of CESM examinations into malignant and non-malignant categories. To our knowledge, this is the first study to evaluate the impact of virtual DES images on CESM lesion classification. The results demonstrate that the best performance was achieved with the pre-trained U-Net model, yielding an F1 score of 85.59% when using the virtual DES images, compared to 90.35% with the real DES images. This discrepancy likely results from the additional diagnostic information in real DES images, which contributes to a higher classification accuracy. Nevertheless, the potential for virtual DES image generation is considerable and future advancements may narrow this performance gap to a level where exclusive reliance on virtual DES images becomes clinically viable.

artificial intelligence, machine learning, virtual de image, (17 more...)

arXiv.org Artificial Intelligence

2508.15594

Country:

Africa (0.46)
North America > United States (0.28)

Genre: Research Report > New Finding (0.68)

Industry:

Health & Medicine > Nuclear Medicine (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Health & Medicine > Therapeutic Area > Oncology > Breast Cancer (0.36)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Evaluation of EAS directions based on TAIGA HiSCORE data using fully connected neural networks

Kryukov, A. P., Polyakov, S. P., Dubenskaya, Yu. Yu., Gres, E. O., Postnikov, E. B., Volchugov, P. A., Zhurov, D. P.

arXiv.org Artificial IntelligenceFeb-19-2025

High-energy cosmic rays and gamma quanta colliding with the upper atmosphere produce cascades of secondary particles known as extensive air showers (EASs). These showers can be detected and recorded using a variety of telescopes such as imaging atmospheric Cherenkov telescopes (IACTs), arrays of wide-angle integrating air detectors or water detectors; some experiments such as TAIGA [1] and LHAASO [2] combine several telescope types. The data from these observations can be used to identify the primary particle type and estimate its parameters such as energy and direction. In this paper, we estimate the EAS direction which is of interest because it can identify the gamma radiation source and is important in estimating the energy of the primary particle. Highly accurate shower direction estimates can be obtained from the timing measurements of multiple detectors spread over a large area such as TAIGA HiSCORE [3], LHAASO, or HAWC [4]. We use simulated data from TAIGA HiSCORE which is a non-imaging array of wide field-of-view integrating air Cherenkov detector stations. We use artificial neural networks (ANNs) to obtain shower direction estimates. Convolutional neural networks seem like a natural choice for the problem since the HiSCORE stations are positioned on a grid. However, the previous work using this approach [5, 6] produced estimates that were significantly less accurate than previously developed methods, e.g.

direction estimate, ea direction, neural network, (16 more...)

arXiv.org Artificial Intelligence

2502.13851

Country:

Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.06)
Asia > Russia > Siberian Federal District > Irkutsk Oblast > Irkutsk (0.04)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback

On the Detection of Aircraft Single Engine Taxi using Deep Learning Models

Jarry, Gabriel, Very, Philippe, Dalmau, Ramon, Delahaye, Daniel, Houdant, Arthur

arXiv.org Artificial IntelligenceOct-10-2024

The aviation industry is vital for global transportation but faces increasing pressure to reduce its environmental footprint, particularly CO2 emissions from ground operations such as taxiing. Single Engine Taxiing (SET) has emerged as a promising technique to enhance fuel efficiency and sustainability. However, evaluating SET's benefits is hindered by the limited availability of SET-specific data, typically accessible only to aircraft operators. In this paper, we present a novel deep learning approach to detect SET operations using ground trajectory data. Our method involves using proprietary Quick Access Recorder (QAR) data of A320 flights to label ground movements as SET or conventional taxiing during taxi-in operations, while using only trajectory features equivalent to those available in open-source surveillance systems such as Automatic Dependent Surveillance-Broadcast (ADS-B) or ground radar. This demonstrates that SET can be inferred from ground movement patterns, paving the way for future work with non-proprietary data sources. Our results highlight the potential of deep learning to improve SET detection and support more comprehensive environmental impact assessments.

aircraft, fuel consumption, opération, (14 more...)

arXiv.org Artificial Intelligence

2410.07727

Country:

Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)
Asia > China > Shanghai > Shanghai (0.04)
North America > United States > Texas > Tarrant County > Fort Worth (0.04)
(3 more...)

Genre:

Research Report > Promising Solution (0.48)
Research Report > New Finding (0.48)

Industry:

Transportation > Infrastructure & Services > Airport (1.00)
Transportation > Air (1.00)
Aerospace & Defense > Aircraft (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Contrasting Deep Learning Models for Direct Respiratory Insufficiency Detection Versus Blood Oxygen Saturation Estimation

Gauy, Marcelo Matheus, Koza, Natalia Hitomi, Morita, Ricardo Mikio, Stanzione, Gabriel Rocha, Junior, Arnaldo Candido, Berti, Larissa Cristina, Levin, Anna Sara Shafferman, Sabino, Ester Cerdeira, Svartman, Flaviane Romani Fernandes, Finger, Marcelo

arXiv.org Artificial IntelligenceJul-30-2024

We contrast high effectiveness of state of the art deep learning architectures designed for general audio classification tasks, refined for respiratory insufficiency (RI) detection and blood oxygen saturation (SpO$_2$) estimation and classification through automated audio analysis. Recently, multiple deep learning architectures have been proposed to detect RI in COVID patients through audio analysis, achieving accuracy above 95% and F1-score above 0.93. RI is a condition associated with low SpO$_2$ levels, commonly defined as the threshold SpO$_2$ <92%. While SpO$_2$ serves as a crucial determinant of RI, a medical doctor's diagnosis typically relies on multiple factors. These include respiratory frequency, heart rate, SpO$_2$ levels, among others. Here we study pretrained audio neural networks (CNN6, CNN10 and CNN14) and the Masked Autoencoder (Audio-MAE) for RI detection, where these models achieve near perfect accuracy, surpassing previous results. Yet, for the regression task of estimating SpO$_2$ levels, the models achieve root mean square error values exceeding the accepted clinical range of 3.5% for finger oximeters. Additionally, Pearson correlation coefficients fail to surpass 0.3. As deep learning models perform better in classification than regression, we transform SpO$_2$-regression into a SpO$_2$-threshold binary classification problem, with a threshold of 92%. However, this task still yields an F1-score below 0.65. Thus, audio analysis offers valuable insights into a patient's RI status, but does not provide accurate information about actual SpO$_2$ levels, indicating a separation of domains in which voice and speech biomarkers may and may not be useful in medical diagnostics under current technologies.

dataset, detection, experiment, (12 more...)

arXiv.org Artificial Intelligence

2407.20989

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
South America > Brazil > São Paulo (0.06)
South America > Brazil > Paraná (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

LADI v2: Multi-label Dataset and Classifiers for Low-Altitude Disaster Imagery

Scheele, Samuel, Picchione, Katherine, Liu, Jeffrey

arXiv.org Artificial IntelligenceJun-4-2024

ML-based computer vision models are promising tools for supporting emergency management operations following natural disasters. Arial photographs taken from small manned and unmanned aircraft can be available soon after a disaster and provide valuable information from multiple perspectives for situational awareness and damage assessment applications. However, emergency managers often face challenges finding the most relevant photos among the tens of thousands that may be taken after an incident. While ML-based solutions could enable more effective use of aerial photographs, there is still a lack of training data for imagery of this type from multiple perspectives and for multiple hazard types. To address this, we present the LADI v2 (Low Altitude Disaster Imagery version 2) dataset, a curated set of about 10,000 disaster images captured in the United States by the Civil Air Patrol (CAP) in response to federally-declared emergencies (2015-2023) and annotated for multi-label classification by trained CAP volunteers. We also provide two pretrained baseline classifiers and compare their performance to state-of-the-art vision-language models in multi-label classification. The data and code are released publicly to support the development of computer vision models for emergency management research and applications.

classifier, dataset, validation, (16 more...)

arXiv.org Artificial Intelligence

2406.0278

Country:

North America > United States > Hawaii (0.04)
North America > United States > Massachusetts > Middlesex County > Lexington (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry:

Government > Military (0.69)
Government > Regional Government > North America Government > United States Government (0.49)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.96)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)

Add feedback