AITopics

Technology: Information Technology > Artificial Intelligence > Vision (0.73)

Neural Information Processing SystemsFeb-14-2026, 17:21:24 GMT

AdversarialExamplesarenotBugs,theyareFeatures

Wedemonstrate that adversarial examples can be directly attributed to the presence of non-robust features: features (derived from patterns in the data distribution) that are highly predictive, yet brittle and (thus) incomprehensible to humans. After capturing these features within a theoretical framework, we establish their widespread existence in standard datasets.

artificial intelligence, classifier, machine learning, (16 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Asia (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Neural Information Processing SystemsFeb-8-2026, 21:46:00 GMT

DataDiversification: ASimpleStrategyForNeural MachineTranslation

Our method is applicable to all NMT models. It does not require extra monolingual data like back-translation, nor does it add more computations and parameters like ensembles ofmodels.

machine learning, natural language, urlhttp, (18 more...)

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Singapore (0.05)
North America > United States > Texas > Travis County > Austin (0.04)
(8 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.98)

Neural Information Processing SystemsDec-25-2025, 23:37:39 GMT

Train Once and Explain Everywhere: Pre-training Interpretable Graph Neural Networks

Intrinsic interpretable graph neural networks aim to provide transparent predictions by identifying the influential fraction of the input graph that guides the model prediction, i.e., the explanatory subgraph. However, current interpretable GNNs mostly are dataset-specific and hard to generalize to different graphs. A more generalizable GNN interpretation model which can effectively distill the universal structural patterns of different graphs is until-now unexplored. Motivated by the great success of recent pre-training techniques, we for the first time propose the Pre-training Interpretable Graph Neural Network ($\pi$-GNN) to distill the universal interpretability of GNNs by pre-training over synthetic graphs with ground-truth explanations. Specifically, we introduce a structural pattern learning module to extract diverse universal structure patterns and integrate them together to comprehensively represent the graphs of different types. Next, a hypergraph refining module is proposed to identify the explanatory subgraph by incorporating the universal structure patterns with local edge interactions. Finally, the task-specific predictor is cascaded with the pre-trained $\pi$-GNN model and fine-tuned over downstream tasks. Extensive experiments demonstrate that $\pi$-GNN significantly surpasses the leading interpretable GNN baselines with up to 9.98\% interpretation improvement and 16.06\% classification accuracy improvement. Meanwhile, $\pi$-GNN pre-trained on graph classification task also achieves the top-tier interpretation performance on node classification task, which further verifies its promising generalization performance among different downstream tasks.

name change, pre-training interpretable graph neural network, train once, (9 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.52)

Neural Information Processing SystemsOct-2-2025, 23:22:24 GMT

6f2268bd1d3d3ebaabb04d6b5d099425-AuthorFeedback.pdf

artificial intelligence, regularization, robust classifier, (8 more...)

Technology: Information Technology > Artificial Intelligence > Vision (0.73)

arXiv.org Artificial IntelligenceSep-23-2025

End-to-end RL Improves Dexterous Grasping Policies

Singh, Ritvik, Van Wyk, Karl, Abbeel, Pieter, Malik, Jitendra, Ratliff, Nathan, Handa, Ankur

This work explores techniques to scale up image-based end-to-end learning for dexterous grasping with an arm + hand system. Unlike state-based RL, vision-based RL is much more memory inefficient, resulting in relatively low batch sizes, which is not amenable for algorithms like PPO. Nevertheless, it is still an attractive method as unlike the more commonly used techniques which distill state-based policies into vision networks, end-to-end RL can allow for emergent active vision behaviors. We identify a key bottleneck in training these policies is the way most existing simulators scale to multiple GPUs using traditional data parallelism techniques. We propose a new method where we disaggregate the simulator and RL (both training and experience buffers) onto separate GPUs. On a node with four GPUs, we have the simulator running on three of them, and PPO running on the fourth. We are able to show that with the same number of GPUs, we can double the number of existing environments compared to the previous baseline of standard data parallelism. This allows us to train vision-based environments, end-to-end with depth, which were previously performing far worse with the baseline. We train and distill both depth and state-based policies into stereo RGB networks and show that depth distillation leads to better results, both in simulation and reality. This improvement is likely due to the observability gap between state and vision policies which does not exist when distilling depth policies into stereo RGB. We further show that the increased batch size brought about by disaggregated simulation also improves real world performance. When deploying in the real world, we improve upon the previous state-of-the-art vision-based results using our end-to-end policies.

artificial intelligence, arxiv, machine learning, (17 more...)

2509.16434

Genre: Research Report (0.86)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots > Manipulation (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Artificial Intelligence > Vision (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

arXiv.org Artificial IntelligenceSep-22-2025

Global Pre-fixing, Local Adjusting: A Simple yet Effective Contrastive Strategy for Continual Learning

Tang, Jia, Wang, Xinrui, Chen, Songcan

Continual learning (CL) involves acquiring and accumulating knowledge from evolving tasks while alleviating catastrophic forgetting. Recently, leveraging contrastive loss to construct more transferable and less forgetful representations has been a promising direction in CL. Despite advancements, their performance is still limited due to confusion arising from both inter-task and intra-task features. To address the problem, we propose a simple yet effective contrastive strategy named \textbf{G}lobal \textbf{P}re-fixing, \textbf{L}ocal \textbf{A}djusting for \textbf{S}upervised \textbf{C}ontrastive learning (GPLASC). Specifically, to avoid task-level confusion, we divide the entire unit hypersphere of representations into non-overlapping regions, with the centers of the regions forming an inter-task pre-fixed \textbf{E}quiangular \textbf{T}ight \textbf{F}rame (ETF). Meanwhile, for individual tasks, our method helps regulate the feature structure and form intra-task adjustable ETFs within their respective allocated regions. As a result, our method \textit{simultaneously} ensures discriminative feature structures both between tasks and within tasks and can be seamlessly integrated into any existing contrastive continual learning framework. Extensive experiments validate its effectiveness.

artificial intelligence, learning, machine learning, (16 more...)

2509.15347

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

arXiv.org Artificial IntelligenceAug-4-2025

Integrating clinical reasoning into large language model-based diagnosis through etiology-aware attention steering

Li, Peixian, Tian, Yu, Tu, Ruiqi, Wu, Chengkai, Ren, Jingjing, Li, Jingsong

Objective: Large Language Models (LLMs) demonstrate significant capabilities in medical text understanding and generation. However, their diagnostic reliability in complex clinical scenarios remains limited. This study aims to enhance LLMs' diagnostic accuracy and clinical reasoning ability. Method: We propose an Etiology-Aware Attention Steering Framework to integrate structured clinical reasoning into LLM-based diagnosis. Specifically, we first construct Clinical Reasoning Scaffolding (CRS) based on authoritative clinical guidelines for three representative acute abdominal emergencies: acute appendicitis, acute pancreatitis, and acute cholecystitis. Next, we develop the Etiology-Aware Head Identification algorithm to pinpoint attention heads crucial for the model's etiology reasoning. To ensure reliable clinical reasoning alignment, we introduce the Reasoning-Guided Parameter-Efficient Fine-tuning that embeds etiological reasoning cues into input representations and steers the selected Etiology-Aware Heads toward critical information through a Reasoning-Guided Loss function. Result: On the Consistent Diagnosis Cohort, our framework improves average diagnostic accuracy by 15.65% and boosts the average Reasoning Focus Score by 31.6% over baselines. External validation on the Discrepant Diagnosis Cohort further confirms its effectiveness in enhancing diagnostic accuracy. Further assessments via Reasoning Attention Frequency indicate that our models exhibit enhanced reliability when faced with real-world complex scenarios. Conclusion: This study presents a practical and effective approach to enhance clinical reasoning in LLM-based diagnosis. By aligning model attention with structured CRS, the proposed framework offers a promising paradigm for building more interpretable and reliable AI diagnostic systems in complex clinical settings.

large language model, machine learning, natural language, (19 more...)

2508.00285

Country: Asia > China (0.14)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.94)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Gastroenterology (1.00)
Health & Medicine > Diagnostic Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)

arXiv.org Artificial IntelligenceJul-15-2025

MedGemma Technical Report

Sellergren, Andrew, Kazemzadeh, Sahar, Jaroensri, Tiam, Kiraly, Atilla, Traverse, Madeleine, Kohlberger, Timo, Xu, Shawn, Jamil, Fayaz, Hughes, Cían, Lau, Charles, Chen, Justin, Mahvar, Fereshteh, Yatziv, Liron, Chen, Tiffany, Sterling, Bram, Baby, Stefanie Anna, Baby, Susanna Maria, Lai, Jeremy, Schmidgall, Samuel, Yang, Lu, Chen, Kejia, Bjornsson, Per, Reddy, Shashir, Brush, Ryan, Philbrick, Kenneth, Asiedu, Mercy, Mezerreg, Ines, Hu, Howard, Yang, Howard, Tiwari, Richa, Jansen, Sunny, Singh, Preeti, Liu, Yun, Azizi, Shekoofeh, Kamath, Aishwarya, Ferret, Johan, Pathak, Shreya, Vieillard, Nino, Merhej, Ramona, Perrin, Sarah, Matejovicova, Tatiana, Ramé, Alexandre, Riviere, Morgane, Rouillard, Louis, Mesnard, Thomas, Cideron, Geoffrey, Grill, Jean-bastien, Ramos, Sabela, Yvinec, Edouard, Casbon, Michelle, Buchatskaya, Elena, Alayrac, Jean-Baptiste, Lepikhin, Dmitry, Feinberg, Vlad, Borgeaud, Sebastian, Andreev, Alek, Hardin, Cassidy, Dadashi, Robert, Hussenot, Léonard, Joulin, Armand, Bachem, Olivier, Matias, Yossi, Chou, Katherine, Hassidim, Avinatan, Goel, Kavi, Farabet, Clement, Barral, Joelle, Warkentin, Tris, Shlens, Jonathon, Fleet, David, Cotruta, Victor, Sanseviero, Omar, Martins, Gus, Kirk, Phoebe, Rao, Anand, Shetty, Shravya, Steiner, David F., Kirmizibayrak, Can, Pilgrim, Rory, Golden, Daniel, Yang, Lin

Artificial intelligence (AI) has significant potential in healthcare applications, but its training and deployment faces challenges due to healthcare's diverse data, complex tasks, and the need to preserve privacy. Foundation models that perform well on medical tasks and require less task-specific tuning data are critical to accelerate the development of healthcare AI applications. We introduce MedGemma, a collection of medical vision-language foundation models based on Gemma 3 4B and 27B. MedGemma demonstrates advanced medical understanding and reasoning on images and text, significantly exceeding the performance of similar-sized generative models and approaching the performance of task-specific models, while maintaining the general capabilities of the Gemma 3 base models. For out-of-distribution tasks, MedGemma achieves 2.6-10% improvement on medical multimodal question answering, 15.5-18.1% improvement on chest X-ray finding classification, and 10.8% improvement on agentic evaluations compared to the base models. Fine-tuning MedGemma further improves performance in subdomains, reducing errors in electronic health record information retrieval by 50% and reaching comparable performance to existing specialized state-of-the-art methods for pneumothorax classification and histopathology patch classification. We additionally introduce MedSigLIP, a medically-tuned vision encoder derived from SigLIP. MedSigLIP powers the visual understanding capabilities of MedGemma and as an encoder achieves comparable or better performance than specialized medical image encoders. Taken together, the MedGemma collection provides a strong foundation of medical image and text capabilities, with potential to significantly accelerate medical research and development of downstream applications. The MedGemma collection, including tutorials and model weights, can be found at https://goo.gle/medgemma.

arxiv preprint arxiv, large language model, machine learning, (21 more...)

2507.05201

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Dermatology (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.70)

Chari, Vivek, Qin, Guanghui, Van Durme, Benjamin

KV-Distill: Nearly Lossless Learnable Context Compression for LLMs

arXiv.org Artificial IntelligenceMar-13-2025

Sequence-to-sequence tasks often benefit from long contexts, but the quadratic complexity of self-attention in standard Transformers renders this non-trivial. During generation, temporary representations -stored in the so-called KV cache-account for a large portion of GPU memory usage and scale linearly with context length. We introduce KV-Distill, a Transformer compression framework that distills long context KV caches into significantly shorter representations in a question-independent fashion. KV-Distill can be trained as a parameter-efficient adaptor for pretrained models, and enables the compression of arbitrary spans of a context while preserving pre-trained model capabilities. We treat a compressed-uncompressed cache as a student-teacher pairing and apply a KL-type divergence to match the generated outputs. KV-Distill outperforms other compression techniques in worst-case extractive tasks and approaches uncompressed performance in long context question answering and summarization, and it can be fine-tuned on domain-specific contexts to reduce lengths by up to 99% while preserving downstream performance. We demonstrate the generalizability of KV-Distill across various model sizes and architectures.

compression, kv cache, proceedings, (16 more...)

2503.10337

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > Mexico > Mexico City > Mexico City (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)