Technology
C3Po: Cross-View Cross-Modality Correspondence by Pointmap Prediction
Geometric models like DUSt3R have shown great advances in understanding the geometry of a scene from pairs of photos. However, they fail when the inputs are from vastly different viewpoints (e.g., aerial vs. ground) or modalities (e.g., photos vs. abstract drawings) compared to what was observed during training. This paper addresses a challenging version of this problem: predicting correspondences between ground-level photos and floor plans. Current datasets for joint photo-floor plan reasoning are limited, either lacking in varying modalities (VIGOR) or lacking in correspondences (WAFFLE). To address these limitations, we introduce a new dataset, C3, created by first reconstructing a number of scenes in 3D from Internet photo collections via structure-from-motion, then manually registering the reconstructions to floor plans gathered from the Internet, from which we can derive correspondences between images and floor plans.
Integrating Drug Substructures and Longitudinal Electronic Health Records for Personalized Drug Recommendation
Drug recommendation systems aim to identify optimal drug combinations for patient care, balancing therapeutic efficacy and safety. Advances in large-scale longitudinal EHRs have enabled learning-based approaches that leverage patient histories such as diagnoses, procedures, and previously prescribed drugs, to model complex patient-drug relationships. Yet, many existing solutions overlook standard clinical practices that favor certain drugs for specific conditions and fail to fully integrate the influence of molecular substructures on drug efficacy and safety. In response, we propose SubRec, a unified framework that integrates representation learning across both patient and drug spaces. Specifically, SubRec introduces a conditional information bottleneck to extract core drug substructures most relevant to patient conditions, thereby enhancing interpretability and clinical alignment. Meanwhile, an adaptive vector quantization mechanism is designed to generate patient-drug interaction patterns into a condition-aware codebook which reuses clinically meaningful patterns, reduces training overhead, and provides a controllable latent space for recommendation. Crucially, the synergy between condition-specific substructure learning and discrete patient prototypes allows SubRec to make accurate and personalized drug recommendations. Experimental results on the real-world MIMICIII and IV demonstrate our model's advantages. The source code is available at https://DrugRecommendation/.
Multi-SWE-bench: AMultilingual Benchmark for Issue Resolving
The task of issue resolving aims to modify a codebase to generate a patch that addresses a given issue. However, most existing benchmarks focus almost exclusively on Python, making them insufficient for evaluating Large Language Models (LLMs) across different programming languages. To bridge this gap, we introduce a multilingual issue-resolving benchmark, called Multi-SWE-bench, covering 8 widely used programming languages: Python, Java, TypeScript, JavaScript, Go, Rust, C, and C++. In particular, this benchmark includes a total of 2,132 highquality instances, carefully curated by 68 expert annotators, ensuring a reliable and accurate evaluation of LLMs on the issue-resolving task. Based on humanannotated results, the issues are further classified into three difficulty levels. We evaluate a series of state-of-the-art models on Multi-SWE-bench, utilizing both procedural and agent-based frameworks for issue resolving. Experimental results based on Multi-SWE-bench reveal three key findings: (1) Limited generalization across languages: While existing LLMs perform well on Python issues, their ability to generalize across other languages remains limited; (2) Performance aligned with human-annotated difficulty: LLM-based agents' performance closely aligns with human-assigned difficulty, with resolved rates notably decreasing as issue complexity rises; and (3) Performance drop on cross-file issues: The performance of current methods significantly deteriorates when handling cross-file issues. These findings highlight the limitations of current LLMs and underscore the need for more robust models capable of handling a broader range of programming languages and complex issue scenarios.
Stackelberg Self-Annotation: ARobust Approach to Data-Efficient LLMAlignment
Aligning large language models (LLMs) with human preferences typically demands vast amounts of meticulously curated data, which is both expensive and prone to labeling noise. We propose Stackelberg Game Preference Optimization (SGPO), a robust alignment framework that models alignment as a two-player Stackelberg game between a policy (leader) and a worst-case preference distribution (follower). The proposed SGPO guarantees O(ฯต)-bounded regret within an ฯต-Wasserstein ball, offering formal robustness to (self-)annotation noise. We instantiate SGPO with Stackelberg Self-Annotated Preference Optimization (SSAPO), which uses minimal humanlabeled "seed" preferences and iteratively self-annotates new prompts. In each iteration, SSAPO applies a distributionally robust reweighting of synthetic annotations, ensuring that noisy or biased self-labels do not derail training. Remarkably, using only 2K seed preferences--about 1/30 of standard human labels--SSAPO achieves strong win rates against GPT-4 across multiple benchmarks within three iterations.
BMMR: ALarge-Scale Bilingual Multimodal Multi-Discipline Reasoning Dataset
In this paper, we introduce BMMR, a large-scale bilingual, multimodal, multidisciplinary reasoning dataset for the community to develop and evaluate large multimodal models (LMMs). BMMR comprises 110k college-level questions spanning 300 UNESCO-defined subjects, spanning diverse formats--multiplechoice, fill-in-the-blank, and open-ended QA--and sourced from both print and digital media such as books, exams, and quizzes. All data are curated and filtered via a human-in-the-loop and scalable framework, and each instance is paired with a high-quality reasoning path. The dataset is organized into two parts: BMMR-Eval that comprises 20,458high-quality instances to comprehensively assess LMMs' knowledge and reasoning across multiple disciplines in both Chinese and English; and BMMR-Train that contains 88,991 instances to support further research and development, extending the current focus on mathematical reasoning to diverse disciplines and domains. In addition, we propose the process-based multi-discipline verifier (i.e., BMMR-Verifier) for accurate and fine-grained evaluation of reasoning paths. Extensive experiments on 24 models reveal that (i) even SOTA models (e.g., o3and Gemini-2.5-Pro)
Towards Understanding the Mechanisms of Classifier-Free Guidance
Classifier-free guidance (CFG) is a core technique powering state-of-the-art image generation systems, yet its underlying mechanisms remain poorly understood. In this work, we begin by analyzing CFG in a simplified linear diffusion model, where we show its behavior closely resembles that observed in the nonlinear case. Our analysis reveals that linear CFG improves generation quality via three distinct components: (i) a mean-shift term that approximately steers samples in the direction of class means, (ii) a positive Contrastive Principal Components (CPC) term that amplifies class-specific features, and (iii) a negative CPC term that suppresses generic features prevalent in unconditional data. We then verify these insights in real-world, nonlinear diffusion models: over a broad range of noise levels, linear CFG resembles the behavior of its nonlinear counterpart. Although the two eventually diverge at low noise levels, we discuss how the insights from the linear analysis still shed light on the CFG's mechanism in the nonlinear regime.
Why road trips are good for you, according to science
Driving into the sunset can actually form new neural pathways. More information Adding us as a Preferred Source in Google by using this link indicates that you would like to see more of our content in Google News results. Seeing a landscape or place that takes your breath away is actually good for your brain. Breakthroughs, discoveries, and DIY tips sent six days a week. By signing up, you confirm you are 16+, will receive newsletters and promotional content and agree to our Terms of Use and acknowledge the data practices in our Privacy Policy .
Set-LLM: A Permutation-Invariant LLM
While large language models (LLMs) demonstrate impressive capabilities across numerous applications, their robustness remains a critical concern. This paper is motivated by a specific vulnerability: the order sensitivity of LLMs. This vulnerability manifests itself as the order bias observed when LLMs decide between possible options (for example, a preference for the first option) and the tendency of LLMs to provide different answers when options are reordered. The use cases for this scenario extend beyond the classical case of multiple-choice question answering to the use of LLMs for multidocument tasks and as automated evaluators in AI pipelines. We introduce Set-LLM, a novel architectural adaptation for pretrained LLMs that enables the processing of mixed set-text inputs with permutation invariance guarantees. The adaptations involve a new attention mask and new positional encodings specifically designed for sets. We provide a theoretical proof of invariance and demonstrate through experiments that Set-LLM can be trained effectively, achieving comparable or improved performance and maintaining the runtime of the original model, while altogether eliminating order sensitivity.
ChatGPT can check your PC's health without touching your files. Here's how
PCWorld explains how Windows 11 users can leverage AI chatbots like ChatGPT, Claude, or Gemini to analyze their PC's health using MSINFO32 system reports. This method allows comprehensive system analysis without granting AI direct file access, maintaining privacy while getting actionable insights about hardware and software issues.