AITopics

Country: North America > United States (0.92)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Government (0.45)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Neural Information Processing SystemsFeb-11-2026, 08:13:48 GMT

Paper: Generalization of Reinforcement Learners with Working and Episodic Memory

We thank the reviewers for their thoughtful and constructive feedback on our manuscript. This should help both contextualize each task's difficulty and illustrate what it involves. Reviewer 3 noted the Section 2 task descriptions could be better presented. We have reformatted it so that "the order We also changed our description of IMP ALA to match Reviewer 5's suggestion. Regarding the task suite, Reviewer 4 raised a thoughtful consideration on whether "most of the findings translate when Some 3D tasks in the suite already have '2D-like' semi-counterparts that do not require navigation, '2D-like' because everything is fully observable and the agent has a first-person point of view from a fixed point, without Spot the Difference level, was overall harder than Change Detection for our ablation models.

artificial intelligence, generalization, reinforcement learner, (16 more...)

Industry: Health & Medicine > Consumer Health (0.42)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Scripts & Frames (0.42)

Neural Information Processing SystemsFeb-11-2026, 07:51:35 GMT

d384dec9f5f7a64a36b5c8f03b8a6d92-Supplemental.pdf

pathway, representation, similarity, (15 more...)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Neural Information Processing SystemsFeb-7-2026, 06:59:28 GMT

Supplementaryfor NeuralMethodsforPoint-wiseDependencyEstimation

Four approaches are discussed: Variational Bounds of Mutual Information, Density Matching, ProbabilisticClassifier,andDensity-RatioFitting. Proposition3(IJS and its neural estimation, restating Jensen-Shannon bound with f-GAN objective [22]). We adopt the "concatenate critic" design [20, 22, 23] for our neural network parametrized function. NotethatProbabilistic Classifier method applies sigmoid function to the outputs to ensure probabilistic outputs. To proceed, it suffices if we could provide an upper bound forPrS(|lS(θk)| ε/2).

artificial intelligence, epx, machine learning, (16 more...)

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Asia > Middle East > Jordan (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Neural Information Processing SystemsOct-1-2025, 23:09:23 GMT

Paper: Generalization of Reinforcement Learners with Working and Episodic Memory

We thank the reviewers for their thoughtful and constructive feedback on our manuscript. This should help both contextualize each task's difficulty and illustrate what it involves. Reviewer 3 noted the Section 2 task descriptions could be better presented. We have reformatted it so that "the order We also changed our description of IMP ALA to match Reviewer 5's suggestion. Regarding the task suite, Reviewer 4 raised a thoughtful consideration on whether "most of the findings translate when Some 3D tasks in the suite already have '2D-like' semi-counterparts that do not require navigation, '2D-like' because everything is fully observable and the agent has a first-person point of view from a fixed point, without Spot the Difference level, was overall harder than Change Detection for our ablation models.

generalization, reinforcement learner, working and episodic memory, (15 more...)

Industry: Health & Medicine > Consumer Health (0.42)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Scripts & Frames (0.42)

Lyu, Ruiqi, Turcan, Alistair, Zhang, Martin Jinye, Wilder, Bryan

Improving constraint-based discovery with robust propagation and reliable LLM priors

arXiv.org Artificial IntelligenceSep-30-2025

Learning causal structure from observational data is central to scientific modeling and decision-making. Constraint-based methods aim to recover conditional independence (CI) relations in a causal directed acyclic graph (DAG). Classical approaches such as PC and subsequent methods orient v-structures first and then propagate edge directions from these seeds, assuming perfect CI tests and exhaustive search of separating subsets -- assumptions often violated in practice, leading to cascading errors in the final graph. Recent work has explored using large language models (LLMs) as experts, prompting sets of nodes for edge directions, and could augment edge orientation when assumptions are not met. However, such methods implicitly assume perfect experts, which is unrealistic for hallucination-prone LLMs. We propose MosaCD, a causal discovery method that propagates edges from a high-confidence set of seeds derived from both CI tests and LLM annotations. To filter hallucinations, we introduce shuffled queries that exploit LLMs' positional bias, retaining only high-confidence seeds. We then apply a novel confidence-down propagation strategy that orients the most reliable edges first, and can be integrated with any skeleton-based discovery method. Across multiple real-world graphs, MosaCD achieves higher accuracy in final graph construction than existing constraint-based methods, largely due to the improved reliability of initial seeds and robust propagation strategies.

large language model, machine learning, orientation, (18 more...)

2509.2357

Country: Asia (0.14)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

arXiv.org Artificial IntelligenceSep-16-2025

Advancing Medical Artificial Intelligence Using a Century of Cases

Buckley, Thomas A., Conci, Riccardo, Brodeur, Peter G., Gusdorf, Jason, Beltrán, Sourik, Behrouzi, Bita, Crowe, Byron, Dockterman, Jacob, Muhammad, Muzzammil, Ohnigian, Sarah, Sanchez, Andrew, Diao, James A., Shah, Aashna P., Restrepo, Daniel, Rosenberg, Eric S., Lea, Andrew S., Zitnik, Marinka, Podolsky, Scott H., Kanjee, Zahir, Abdulnour, Raja-Elie E., Koshy, Jacob M., Rodman, Adam, Manrai, Arjun K.

BACKGROUND: For over a century, the New England Journal of Medicine Clinicopathological Conferences (CPCs) have tested the reasoning of expert physicians and, recently, artificial intelligence (AI). However, prior AI evaluations have focused on final diagnoses without addressing the multifaceted reasoning and presentation skills required of expert discussants. METHODS: Using 7102 CPCs (1923-2025) and 1021 Image Challenges (2006-2025), we conducted extensive physician annotation and automated processing to create CPC-Bench, a physician-validated benchmark spanning 10 text-based and multimodal tasks, against which we evaluated leading large language models (LLMs). Then, we developed "Dr. CaBot," an AI discussant designed to produce written and slide-based video presentations using only the case presentation, modeling the role of the human expert in these cases. RESULTS: When challenged with 377 contemporary CPCs, o3 (OpenAI) ranked the final diagnosis first in 60% of cases and within the top ten in 84% of cases, outperforming a 20-physician baseline; next-test selection accuracy reached 98%. Event-level physician annotations quantified AI diagnostic accuracy per unit of information. Performance was lower on literature search and image tasks; o3 and Gemini 2.5 Pro (Google) achieved 67% accuracy on image challenges. In blinded comparisons of CaBot vs. human expert-generated text, physicians misclassified the source of the differential in 46 of 62 (74%) of trials, and scored CaBot more favorably across quality dimensions. To promote research, we are releasing CaBot and CPC-Bench. CONCLUSIONS: LLMs exceed physician performance on complex text-based differential diagnosis and convincingly emulate expert medical presentations, but image interpretation and literature retrieval remain weaker. CPC-Bench and CaBot may enable transparent and continued tracking of progress in medical AI.

large language model, machine learning, natural language, (20 more...)

2509.12194

Country: North America > United States > Massachusetts (0.30)

Genre: Research Report (0.94)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Health Care Providers & Services (1.00)
Health & Medicine > Diagnostic Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)

Neural Information Processing SystemsAug-17-2025, 13:39:51 GMT

d384dec9f5f7a64a36b5c8f03b8a6d92-Supplemental.pdf

artificial intelligence, machine learning, pathway, (17 more...)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Li, Jianfei, Yuen, Kevin Kam Fung

CPC-CMS: Cognitive Pairwise Comparison Classification Model Selection Framework for Document-level Sentiment Analysis

arXiv.org Artificial IntelligenceJul-21-2025

This study proposes the Cognitive Pairwise Comparison Classification Model Selection (CPC-CMS) framework for document-level sentiment analysis. The CPC, based on expert knowledge judgment, is used to calculate the weights of evaluation criteria, including accuracy, precision, recall, F1-score, specificity, Matthews Correlation Coefficient (MCC), Cohen's Kappa (Kappa), and efficiency. Naive Bayes, Linear Support Vector Classification (LSVC), Random Forest, Logistic Regression, Extreme Gradient Boosting (XGBoost), Long Short-Term Memory (LSTM), and A Lite Bidirectional Encoder Representations from Transformers (ALBERT) are chosen as classification baseline models. A weighted decision matrix consisting of classification evaluation scores with respect to criteria weights, is formed to select the best classification model for a classification problem. Three open datasets of social media are used to demonstrate the feasibility of the proposed CPC-CMS. Based on our simulation, for evaluation results excluding the time factor, ALBERT is the best for the three datasets; if time consumption is included, no single model always performs better than the other models. The CPC-CMS can be applied to the other classification applications in different areas.

artificial intelligence, deep learning, machine learning, (17 more...)

2507.14022

Country: Asia > India (0.28)

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine (1.00)

Truong, Dung, Delorme, Arnaud

Data Normalization Strategies for EEG Deep Learning

arXiv.org Artificial IntelligenceJul-1-2025

Normalization is a critical yet often overlooked component in the preprocessing pipeline for EEG deep learning applications. The rise of large-scale pretraining paradigms such as self-supervised learning (SSL) introduces a new set of tasks whose nature is substantially different from supervised training common in EEG deep learning applications. This raises new questions about optimal normalization strategies for the applicable task. In this study, we systematically evaluate the impact of normalization granularity (recording vs. window level) and scope (cross-channel vs. within-channel) on both supervised (age and gender prediction) and self-supervised (Contrastive Predictive Coding) tasks. Using high-density resting-state EEG from 2,836 subjects in the Healthy Brain Network dataset, we show that optimal normalization strategies differ significantly between training paradigms. Window-level within-channel normalization yields the best performance in supervised tasks, while minimal or cross-channel normalization at the window level is more effective for SSL. These results underscore the necessity of task-specific normalization choices and challenge the assumption that a universal normalization strategy can generalize across learning settings. Our findings provide practical insights for developing robust EEG deep learning pipelines as the field shifts toward large-scale, foundation model training.

artificial intelligence, machine learning, normalization, (14 more...)