Goto

Collaborating Authors

 Genre


Text-to-Code Generation for Modular Building Layouts in Building Information Modeling

Neural Information Processing Systems

We present Text2MBL, a text-to-code generation framework that generates executable Building Information Modeling (BIM) code directly from textual descriptions of modular building layout (MBL) design. Unlike conventional layout generation approaches that operate in 2D space, Text2MBL produces fully parametric, semantically rich BIM layouts through on the fly code instantiation. To address MBLs' unique challenges due to their hierarchical three-tier structure: modules (physical building blocks), units (self-contained dwellings), and rooms (functional spaces), we developed an object-oriented code architecture and fine-tuned large language models to output structured action sequences in code format. To train and evaluate the framework, we curated a dataset of paired descriptions and ground truth layouts drawn from real world modular housing projects. Performance were assessed using metrics for executable validity, semantic fidelity, and geometric consistency. By tightly unifying natural language understanding with BIM code generation, Text2MBL establishes a scalable pipeline from high-level conceptual design to automation-ready modular construction workflows.


CellVerse: Do Large Language Models Really Understand Cell Biology?

Neural Information Processing Systems

Recent studies have demonstrated the feasibility of modeling single-cell data as natural languages and the potential of leveraging powerful large language models (LLMs) for understanding cell biology. However, a comprehensive evaluation of LLMs' performance on language-driven single-cell analysis tasks still remains unexplored. Motivated by this challenge, we introduce CellVerse, a unified language-centric question-answering benchmark that integrates four types of single-cell multi-omics data and encompasses three hierarchical levels of single-cell analysis tasks: cell type annotation (cell-level), drug response prediction (drug-level), and perturbation analysis (gene-level). Going beyond this, we systematically evaluate the performance across 14 open-source and closed-source LLMs ranging 160M $\rightarrow$ 671B on CellVerse. Remarkably, the experimental results reveal: (1) Existing specialist models (C2S-Pythia) fail to make reasonable decisions across all sub-tasks within CellVerse, while generalist models such as Qwen, Llama, GPT, and DeepSeek family models exhibit preliminary understanding capabilities within the realm of cell biology.


3DPE-Gaze:Unlocking the Potential of 3D Facial Priors for Generalized Gaze Estimation

Neural Information Processing Systems

In recent years, face-based deep-learning gaze estimation methods have achieved significant advancements. However, while face images provide supplementary information beneficial for gaze inference, the substantial extraneous information they contain also increases the risk of overfitting during model training and compromises generalization capability. To alleviate this problem, we propose the 3DPE-Gaze framework, explicitly modeling 3D facial priors for feature decoupling and generalized gaze estimation. The 3DPE-Gaze framework consists of two core modules: the 3D Geometric Prior Module (3DGP) incorporating the FLAME model to parameterize facial structures and gaze-irrelevant facial appearances while extracting gaze features; the Semantic Concept Alignment Module (SCAM) separates gaze-related and unrelated concepts through CLIP-guided contrastive learning. Finally, the 3DPE-Gaze framework combines 3D facial landmark as prior for generalized gaze estimation. Experimental results show that 3DPE-Gaze outperforms existing state-of-the-art methods on four major cross-domain tasks, with particularly outstanding performance in challenging scenarios such as lighting variations, extreme head poses, and glasses occlusion.


Why Real-Life Disclosure Day Will Look Nothing Like Steven Spielberg's New Movie

WIRED

Why Real-Life Disclosure Day Will Look Nothing Like Steven Spielberg's New Movie Previous landmark scientific discoveries like the Higgs boson provide a better template for what it will take to confirm whether aliens have made contact with Earth. Steven Spielberg's new film imagines the moment 8 billion humans find out that we are not alone in the universe. The movie, which opens in US theaters on June 12, is a fictional account of the government cover-up and subsequent "disclosure" of evidence that aliens have contacted Earth. The UFO community has been chasing that type of cinematic big reveal for 80 years. But it's more likely that monumental scientific discoveries, like the detection of the Higgs boson in 2012 and the confirmation of gravitational waves in 2016, are a better guideline for how real-world disclosure is likely to play out: through long-running research and with verifiable results.


The Quest for Universal Master Key Filters in DS-CNNs

Neural Information Processing Systems

A recent study has proposed the ``Master Key Filters Hypothesis for convolutional neural network filters. This paper extends this hypothesis by radically constraining its scope to a single set of just 8 universal filters that depthwise separable convolutional networks inherently converge to. While conventional DS-CNNs employ thousands of distinct trained filters, our analysis reveals these filters are predominantly linear shifts (ax+b) of our discovered universal set. Through systematic unsupervised search, we extracted these fundamental patterns across different architectures and datasets. Remarkably, networks initialized with these 8 unique frozen filters achieve over 80\% ImageNet accuracy, and even outperform models with thousands of trainable parameters when applied to smaller datasets. The identified master key filters closely match Difference of Gaussians (DoGs), Gaussians, and their derivatives, structures that are not only fundamental to classical image processing but also strikingly similar to receptive fields in mammalian visual systems. Our findings provide compelling evidence that depthwise convolutional layers naturally gravitate toward this fundamental set of spatial operators regardless of task or architecture. This work offers new insights for understanding generalization and transfer learning through the universal language of these master key filters.


Rectifying Shortcut Behaviors in Preference-based Reward Learning

Neural Information Processing Systems

In reinforcement learning from human feedback, preference-based reward models play a central role in aligning large language models to human-aligned behavior. However, recent studies show that these models are prone to reward hacking and often fail to generalize well due to over-optimization. They achieve high reward scores by exploiting shortcuts, that is, exploiting spurious features (e.g., response verbosity, agreeable tone, or sycophancy) that correlate with human preference labels in the training data rather than genuinely reflecting the intended objectives. In this paper, instead of probing these issues one at a time, we take a broader view of the reward hacking problem as shortcut behaviors and introduce a principled yet flexible approach to mitigate shortcut behaviors in preference-based reward learning. Inspired by the invariant theory in the kernel perspective, we propose Preference-based Reward Invariance for Shortcut Mitigation (PRISM), which learns group-invariant kernels with feature maps in a closed-form learning objective. Experimental results in several benchmarks show that our method consistently improves the accuracy of the reward model on diverse out-of-distribution tasks and reduces the dependency on shortcuts in downstream policy models, establishing a robust framework for preference-based alignment.


Short-length Adversarial Training Helps LLMs Defend Long-length Jailbreak Attacks: Theoretical and Empirical Evidence

Neural Information Processing Systems

Jailbreak attacks against large language models (LLMs) aim to induce harmful behaviors in LLMs through carefully crafted adversarial prompts. To mitigate attacks, one way is to perform adversarial training (AT)-based alignment, i.e., training LLMs on some of the most adversarial prompts to help them learn how to behave safely under attacks. During AT, the length of adversarial prompts plays a critical role in the robustness of aligned LLMs. While long-length adversarial prompts during AT might lead to strong LLM robustness, their synthesis however is very resource-consuming, which may limit the application of LLM AT. This paper focuses on adversarial suffix jailbreak attacks and unveils that to defend against a jailbreak attack with an adversarial suffix of length $\Theta(M)$, it is enough to align LLMs on prompts with adversarial suffixes of length $\Theta(\sqrt{M})$. Theoretically, we analyze the adversarial in-context learning of linear transformers on linear regression tasks and prove a robust generalization bound for trained transformers.


How to sparkle in conversation with strangers

New Scientist

In the face of loneliness, many people are turning to AI chatbots for companionship - but research shows it can't replace human connection. Guaranteed compassion, encouragement and validation? A soothing voice available to massage your ego whenever you feel unsure of yourself? If you could find a living being with these qualities, you'd call them your soulmate, and yet it is exactly what many chatbots are offering an increasing number of users. But can those exchanges with AI ever achieve the benefits of real, human connection?


Smooth Quadratic Prediction Markets

Neural Information Processing Systems

When agents trade in a Duality-based Cost Function prediction market, they collectively implement the learning algorithm Follow-The-Regularized-Leader [Abernethy et al., 2013]. We ask whether other learning algorithms could be used to inspire the design of prediction markets. By decomposing and modifying the Duality-based Cost Function Market Maker's (DCFMM) pricing mechanism, we propose a new prediction market, called the Smooth Quadratic Prediction Market, the incentivizes agents to collectively implement general steepest gradient descent. Relative to the DCFMM, the Smooth Quadratic Prediction Market has a better worst-case monetary loss for AD securities while preserving axiom guarantees such as the existence of instantaneous price, information incorporation, expressiveness, no arbitrage, and a form of incentive compatibility. To motivate the application of the Smooth Quadratic Prediction Market, we independently examine agents' trading behavior under two realistic constraints: bounded budgets and buy-only securities. Finally, we provide an introductory analysis of an approach to facilitate adaptive liquidity using the Smooth Quadratic Prediction Market. Our results suggest future designs where the price update rule is separate from the fee structure, yet guarantees are preserved.


RepLDM: Reprogramming Pretrained Latent Diffusion Models for High-Quality, High-Efficiency, High-Resolution Image Generation

Neural Information Processing Systems

While latent diffusion models (LDMs), such as Stable Diffusion, are designed for high-resolution image generation, they often struggle with significant structural distortions when generating images at resolutions higher than their training one. Instead of relying on extensive retraining, a more resource-efficient approach is to reprogram the pretrained model for high-resolution (HR) image generation; however, existing methods often result in poor image quality and long inference time. We introduce RepLDM, a novel reprogramming framework for pretrained LDMs that enables high-quality, high-efficiency, high-resolution image generation; see Figure 1. RepLDM consists of two stages: (i) an attention guidance stage, which generates a latent representation of a higher-quality training-resolution image using a novel parameter-free self-attention mechanism to enhance the structural consistency; and (ii) a progressive upsampling stage, which progressively performs upsampling in pixel space to mitigate the severe artifacts caused by latent space upsampling. The effective initialization from the first stage allows for denoising at higher resolutions with significantly fewer steps, improving the efficiency. Extensive experimental results demonstrate that RepLDM significantly outperforms state-of-the-art methods in both quality and efficiency for HR image generation, underscoring its advantages for real-world applications.