Goto

Collaborating Authors

 Industry


Care-PD: A Multi-Site Anonymized Clinical Dataset for Parkinson's Disease Gait Assessment

Neural Information Processing Systems

Objective gait assessment in Parkinson's Disease (PD) is limited by the absence of large, diverse, and clinically annotated motion datasets. We introduce Care-PD, the largest publicly available archive of 3D mesh gait data for PD, and the first multi-site collection spanning 9 cohorts from 8 clinical centers. All recordings (RGB video or motion capture) are converted into anonymized SMPL meshes via a harmonized preprocessing pipeline. Care-PD supports two key benchmarks: supervised clinical score prediction (estimating Unified Parkinson's Disease Rating Scale, UPDRS, gait scores) and unsupervised motion pretext tasks (2D-to-3D keypoint lifting and full-body 3D reconstruction). Clinical prediction is evaluated under four generalization protocols: within-dataset, cross-dataset, leave-one-dataset-out, and multi-dataset in-domain adaptation.To assess clinical relevance, we compare state-of-the-art motion encoders with a traditional gait-feature baseline, finding that encoders consistently outperform handcrafted features. Pretraining on Care-PD reduces MPJPE (from 60.8mm to 7.5mm) and boosts PD severity macro-F1 by 17\%, underscoring the value of clinically curated, diverse training data. Care-PD and all benchmark code are released for non-commercial research (Code, Data).


From Human Attention to Diagnosis: Semantic Patch-Level Integration of Vision-Language Models in Medical Imaging

Neural Information Processing Systems

Predicting human eye movements during goal-directed visual search is critical for enhancing interactive AI systems. In medical imaging, such prediction can support radiologists in interpreting complex data, such as chest X-rays. Many existing methods rely on generic vision--language models and saliency-based features, which can limit their ability to capture fine-grained clinical semantics and integrate domain knowledge effectively. We present \textbf{LogitGaze-Med}, a state-of-the-art multimodal transformer framework that unifies (1) domain-specific visual encoders (e.g., CheXNet), (2) textual embeddings of diagnostic labels, and (3) semantic priors extracted via the logit-lens from an instruction-tuned medical vision--language model (LLaVA-Med). By directly predicting continuous fixation coordinates and dwell durations, our model generates clinically meaningful scanpaths. Experiments on the GazeSearch dataset and synthetic scanpaths generated from MIMIC-CXR and validated by experts demonstrate that LogitGaze-Med improves scanpath similarity metrics by 20--30\% over competitive baselines and yields over 5\% gains in downstream pathology classification when incorporating predicted fixations as additional training data.


Time-Evolving Dynamical System for Learning Latent Representations of Mouse Visual Neural Activity

Neural Information Processing Systems

Seeking high-quality representations with latent variable models (LVMs) to reveal the intrinsic correlation between neural activity and behavior or sensory stimuli has attracted much interest. In the study of the biological visual system, naturalistic visual stimuli are inherently high-dimensional and time-dependent, leading to intricate dynamics within visual neural activity. However, most work on LVMs has not explicitly considered neural temporal relationships. To cope with such conditions, we propose Time-Evolving Visual Dynamical System (TE-ViDS), a sequential LVM that decomposes neural activity into low-dimensional latent representations that evolve over time. To better align the model with the characteristics of visual neural activity, we split latent representations into two parts and apply contrastive learning to shape them. Extensive experiments on synthetic datasets and real neural datasets from the mouse visual cortex demonstrate that TE-ViDS achieves the best decoding performance on naturalistic scenes/movies, extracts interpretable latent trajectories that uncover clear underlying neural dynamics, and provides new insights into differences in visual information processing between subjects and between cortical regions. In summary, TE-ViDS is markedly competent in extracting stimulus-relevant embeddings from visual neural activity and contributes to the understanding of visual processing mechanisms.


Evolutionary Reasoning Does Not Arise in Standard Usage of Protein Language Models

Neural Information Processing Systems

Protein language models (PLMs) are often assumed to capture evolutionary information by training on large protein sequence datasets. Yet it remains unclear whether PLMs can reason about evolution--that is, infer evolutionary relationships between sequences. We test this capability by evaluating whether standard PLM usage, frozen or fine-tuned embeddings with distance-based comparison, supports evolutionary reasoning. Existing PLMs consistently fail to recover phylogenetic structure, despite strong performance on sequence-level tasks such as masked-token and contact prediction. We present Phyla, a hybrid state-space and transformer model that jointly processes multiple sequences and is trained using a tree-based objective across 3,000 phylogenies spanning diverse protein families.


MONITRS: Multimodal Observations of Natural Incidents Through Remote Sensing

Neural Information Processing Systems

Natural disasters cause devastating damage to communities and infrastructure every year. Effective disaster response is hampered by the difficulty of accessing affected areas during and after events. Remote sensing has allowed us to monitor natural disasters in a remote way. More recently there have been advances in computer vision and deep learning that help automate satellite imagery analysis, However, they remain limited by their narrow focus on specific disaster types, reliance on manual expert interpretation, and lack of datasets with sufficient temporal granularity or natural language annotations for tracking disaster progression. We present MONITRS, a novel multimodal dataset of $\sim$10,000 FEMA disaster events with temporal satellite imagery with natural language annotations from news articles, accompanied by geotagged locations, and question-answer pairs. We demonstrate that fine-tuning existing MLLMs on our dataset yields significant performance improvements for disaster monitoring tasks, establishing a new benchmark for machine learning-assisted disaster response systems.


Multimodal 3D Genome Pre-training

Neural Information Processing Systems

Deep learning techniques have driven significant progress in various analytical tasks within 3D genomics in computational biology. However, a holistic understanding of 3D genomics knowledge remains underexplored.


A Standardized Benchmark for Multilabel Antimicrobial Peptide Classification

Neural Information Processing Systems

Antimicrobial peptides have emerged as promising molecules to combat antimicrobial resistance. However, fragmented datasets, inconsistent annotations, and the lack of standardized benchmarks hinder computational approaches and slow down the discovery of new candidates. To address these challenges, we present the Expanded Standardized Collection for Antimicrobial Peptide Evaluation (ESCAPE), an experimental framework integrating over 80.000 peptides from 27 validated repositories. Our dataset separates antimicrobial peptides from negative sequences and incorporates their functional annotations into a biologically coherent multilabel hierarchy, capturing activities across antibacterial, antifungal, antiviral, and antiparasitic classes. Building on ESCAPE, we propose a transformer-based model that leverages sequence and structural information to predict multiple functional activities of peptides. Our method achieves up to a 2.56% relative average improvement in mean Average Precision over the second-best method adapted for this task, establishing a new state-of-the-art multilabel peptide classification. ESCAPE provides a comprehensive and reproducible evaluation framework to advance AI-driven antimicrobial peptide research.


DyG-Mamba: Continuous State Space Modeling on Dynamic Graphs

Neural Information Processing Systems

Dynamic graph modeling aims to uncover evolutionary patterns in real-world systems, enabling accurate social recommendation and early detection of cancer cells. Inspired by the success of recent state space models in efficiently capturing long-term dependencies, we propose DyG-Mamba by translating dynamic graph modeling into a long-term sequence modeling problem. Specifically, inspired by Ebbinghaus' forgetting curve, we treat the irregular timespans between events as control signals, allowing DyG-Mamba to dynamically adjust the forgetting of historical information. This mechanism ensures effective usage of irregular timespans, thereby improving both model effectiveness and inductive capability. In addition, inspired by Ebbinghaus' review cycle, we redefine core parameters to ensure that DyG-Mamba selectively reviews historical information and filters out noisy inputs, further enhancing the model's robustness. Through exhaustive experiments on 12 datasets covering dynamic link prediction and node classification tasks, we show that DyG-Mamba achieves state-of-the-art performance on most datasets, while demonstrating significantly improved computational and memory efficiency.


From Synapses to Dynamics: Obtaining Function from Structure in a Connectome Constrained Model of the Head Direction Circuit

Neural Information Processing Systems

How precisely does circuit wiring specify function? This fundamental question is particularly relevant for modern neuroscience, as large-scale electron microscopy now enables the reconstruction of neural circuits at single-synapse resolution across many organisms. To interpret circuit function from such datasets, we must understand the extent to which [measured] structure constrains dynamics. We investigate this question in the drosophila head direction (HD) circuit, which maintains an internal heading estimate through attractor dynamics that integrate self-motion velocity cues. This circuit serves as a sensitive assay for functional specification: continuous attractor networks are theoretically known to require finely tuned wiring, whereas connectomes reveal that biological wiring can be variable and omit key cellular parameters such as synaptic gains, neuronal thresholds, and time constants. We introduce a method that combines self-supervised and unsupervised learning objectives to estimate unknown parameters at the level of cell types, rather than individual neurons and synapses. Given the raw connectivity matrix, our approach recovers a network that robustly exhibits continuous attractor dynamics and accurately integrates a range of velocity inputs, despite minimal parameter tuning on a connectome which notably departs from the symmetric regularity of an idealized ring attractor. We characterize how deviations from the original connectome shape the space of viable solutions. We also perform in-silico ablation experiments to probe the distinct functional roles of specific cell types in the circuit, demonstrating how connectome-derived structure, when augmented with minimal, biologically grounded tuning, can replicate known physiology and elucidate circuit function.


Simulating Viva Voce Examinations to Evaluate Clinical Reasoning in Large Language Models

Neural Information Processing Systems

Clinical reasoning in medicine is a hypothesis-driven process where physicians refine diagnoses from limited information through targeted history, physical examination, and diagnostic investigations. In contrast, current medical benchmarks for large language models (LLMs) primarily assess knowledge recall through single-turn questions, where complete clinical information is provided upfront. To address this gap, we introduce VivaBench, a multi-turn benchmark that evaluates sequential clinical reasoning in LLM agents. Our dataset consists of 1762 physician-curated clinical vignettes structured as interactive scenarios that simulate a $ \textit{viva voce}$ (oral) examination in medical training, requiring agents to actively probe for relevant findings, select appropriate investigations, and synthesize information across multiple steps to reach a diagnosis. While current LLMs demonstrate competence in diagnosing conditions from well-described clinical presentations, their performance degrades significantly when required to navigate iterative diagnostic reasoning under uncertainty in our evaluation. Our analysis identified several failure modes that mirror common cognitive errors in clinical practice, including: (1) fixation on initial hypotheses, (2) inappropriate investigation ordering, (3) premature diagnostic closure, and (4) failing to screen for critical conditions. These patterns reveal fundamental limitations in how current LLMs reason and make decisions under uncertainty. Through VivaBench, we provide a standardized benchmark for evaluating conversational medical AI systems for real-world clinical decision support. Beyond medical applications, we contribute to the larger corpus of research on agentic AI by demonstrating how sequential reasoning trajectories can diverge in complex decision-making environments.