Industry
OrthoLoC: UAV 6-DoF Localization and Calibration Using Orthographic Geodata
Accurate visual localization from aerial views is a fundamental problem with applications in mapping, large-area inspection, and search-and-rescue operations. In many scenarios, these systems require high-precision localization while operating with limited resources (e.g., no internet connection or GNSS/GPS support), making large image databases or heavy 3D models impractical. Surprisingly, little attention has been given to leveraging orthographic geodata as an alternative paradigm, which is lightweight and increasingly available through free releases by governmental authorities (e.g., the European Union). To fill this gap, we propose OrthoLoC, the first large-scale dataset comprising 16,425 UAV images from Germany and the United States with multiple modalities.
WeatherPrompt: Multi-modality Representation Learning for All-Weather Drone Visual Geo-Localization
Visual geo-localization for drones faces critical degradation under weather perturbations, e.g., rain and fog, where existing methods struggle with two inherent limitations: 1) Heavy reliance on limited weather categories that constrain generalization, and 2) Suboptimal disentanglement of entangled scene-weather features through pseudo weather categories. We present WeatherPrompt, a multi-modality learning paradigm that establishes weather-invariant representations through fusing the image embedding with the text context. Our framework introduces two key contributions: First, a Training-free Weather Reasoning mechanism that employs off-the-shelf large multi-modality models to synthesize multi-weather textual descriptions through human-like reasoning. It improves the scalability to unseen or complex weather, and could reflect different weather strength. Second, to better disentangle the scene and weather features, we propose a multi-modality framework with the dynamic gating mechanism driven by the text embedding to adaptively reweight and fuse visual features across modalities. The framework is further optimized by the cross-modal objectives, including image-text contrastive learning and image-text matching, which maps the same scene with different weather conditions closer in the representation space. Extensive experiments validate that, under diverse weather conditions, our method achieves competitive recall rates compared to state-of-the-art drone geo-localization methods. Notably, it improves Recall@1 by 13.37% under night conditions and by 18.69% under fog and snow conditions.
seq-JEPA: Autoregressive Predictive Learning of Invariant-Equivariant World Models
Joint-embedding self-supervised learning (SSL) commonly relies on transformations such as data augmentation and masking to learn visual representations, a task achieved by enforcing invariance or equivariance with respect to these transformations applied to two views of an image. This dominant two-view paradigm in SSL often limits the flexibility of learned representations for downstream adaptation by creating performance trade-offs between high-level invariance-demanding tasks such as image classification and more fine-grained equivariance-related tasks. In this work, we propose seq-JEPA, a world modeling framework that introduces architectural inductive biases into joint-embedding predictive architectures to resolve this trade-off. Without relying on dual equivariance predictors or loss terms, seq-JEPA simultaneously learns two architecturally separate representations for equivariance-and invariance-demanding tasks. To do so, our model processes short sequences of different views (observations) of inputs.
Q: Provably Optimal Distributional RL for LLMPost-Training
Reinforcement learning (RL) post-training is crucial for LLM alignment and reasoning, but existing policy-based methods, such as PPO and DPO, can fall short of fixing shortcuts inherited from pre-training. In this work, we introduce Q, a value-based algorithm for KL-regularized RL that guides the reference policy using the optimal regularized Q function. We propose to learn the optimal Q function using distributional RL on an aggregated online dataset. Unlike prior value-based baselines that guide the model using unregularized Q-values, our method is theoretically principled and provably learns the optimal policy for the KL-regularized RL problem. Empirically, Q outperforms prior baselines in math reasoning benchmarks while maintaining a smaller KL divergence to the reference policy. Theoretically, we establish a reduction from KL-regularized RL to no-regret online learning, providing the first bounds for deterministic MDPs under only realizability. Thanks to distributional RL, our bounds are also variance-dependent and converge faster when the reference policy has small variance. In sum, our results highlight Q as an effective approach for post-training LLMs, offering both improved performance and theoretical guarantees. The code can be found at https://github.com/jinpz/q_sharp.
Anthropic Is Still at Odds With the White House Over Claude Fable 5
Anthropic leaders flew to Washington, DC, to meet with White House officials on Monday. Trump administration officials concluded talks with Anthropic on Monday without lifting export controls that were imposed last week on the company's most advanced AI models in response to jailbreaking concerns, according to three people briefed on the matter. The administration continues to believe that there are ways to disable some of the guardrails on Anthropic's Claude Fable 5, effectively allowing users to access the more powerful cybersecurity capabilities of the company's Mythos model, the people said. Anthropic has said for days that the administration's concerns are overblown, a position it reiterated in working group meetings held at the Commerce Department with government researchers from Center for AI Standards and Innovation (CAISI) and the Office of the National Cyber Director Sean Cairncross, one of the people said. The meetings were also attended by Commerce secretary Howard Lutnick, who dialed in by conference call from the G7 summit in Evian, France.
RoMa: ARobust Model Watermarking Scheme for Protecting IP in Diffusion Models
In this regard, model watermarking is a common practice for IP protection that embeds traceable information within models and allows for further verification. Nevertheless, existing watermarking schemes often face challenges due to their vulnerability to fine-tuning, limiting their practical application in general pretraining and fine-tuning paradigms. Inspired by using mode connectivity to analyze model performance between a pair of connected models, we investigate watermark vulnerability by leveraging Linear Mode Connectivity (LMC) as a proxy to analyze the fine-tuning dynamics of watermark performance. Our results show that existing watermarked models tend to converge to sharp minima in the loss landscape, thus making them vulnerable to fine-tuning. To tackle this challenge, we propose RoMa, a Robust Model watermarking scheme that improves the robustness of watermarks against fine-tuning. Specifically, RoMa decomposes watermarking into two components, including Embedding Functionality, which preserves reliable watermark detection capability, and Path-specific Smoothness, which enhances the smoothness along the watermark-connected path to improve robustness. Extensive experiments on benchmark datasets MS-COCO-2017 and CUB-200-2011 demonstrate that RoMa significantly improves watermark robustness against fine-tuning while maintaining generation quality, outperforming baselines. The code is available at https://github.com/xiekks/RoMa.
RoomEditor: High-Fidelity Furniture Synthesis with Parameter-Sharing U-Net
Virtual furniture synthesis, a critical task in image composition, aims to seamlessly integrate reference objects into indoor scenes while preserving geometric coherence and visual realism. Despite its significant potential in home design applications, this field remains underexplored due to two major challenges: the absence of publicly available and ready-to-use benchmarks hinders reproducible research, and existing image composition methods fail to meet the stringent fidelity requirements for realistic furniture placement. To address these issues, we introduce RoomBench, a ready-to-use benchmark dataset for virtual furniture synthesis, comprising 7,298 training pairs and 895 testing samples across 27 furniture categories. Then, we propose RoomEditor, a simple yet effective image composition method that employs a parameter-sharing dual U-Net architecture, ensuring better feature consistency by sharing weights between dual branches. Technical analysis reveals that conventional dual-branch architectures generally suffer from inconsistent intermediate features due to independent processing of reference and background images.
Statistical Inference for Gradient Boosting Regression
Gradient boosting is widely popular due to its flexibility and predictive accuracy. However, statistical inference and uncertainty quantification for gradient boosting remain challenging and under-explored. We propose a unified framework for statistical inference in gradient boosting regression. Our framework integrates dropout or parallel training with a recently proposed regularization procedure called Boulevard that allows for a central limit theorem (CLT) for boosting. With these enhancements, we surprisingly find that increasing the dropout rate and the number of trees grown in parallel at each iteration substantially enhances signal recovery and overall performance. Our resulting algorithms enjoy similar CLTs, which we use to construct built-in confidence intervals, prediction intervals, and rigorous hypothesis tests for assessing variable importance in only O(nd2) time with the Nystr om method. Numerical experiments verify the asymptotic normality and demonstrate that our algorithms perform well, do not require early stopping, interpolate between regularized boosting and random forests, and confirm the validity of their built-in statistical inference procedures.
ObCLIP: Oblivious CLoud-Device Hybrid Image Generation with Privacy Preservation
Diffusion Models have gained significant popularity due to their remarkable capabilities in image generation, albeit at the cost of intensive computation requirement. Meanwhile, despite their widespread deployment in inference services such as Midjourney, concerns about the potential leakage of sensitive information in uploaded user prompts have arisen. Existing solutions either lack rigorous privacy guarantees or fail to strike an effective balance between utility and efficiency. To bridge this gap, we propose ObCLIP, a plug-and-play safeguard that enables oblivious clouddevice hybrid generation. By oblivious, each input prompt is transformed into a set of semantically similar candidate prompts that differ only in sensitive attributes (e.g., gender, ethnicity).