Goto

Collaborating Authors

 Industry


Boosting Adversarial Transferability with Spatial Adversarial Alignment

Neural Information Processing Systems

Deep neural networks are vulnerable to adversarial examples that exhibit transferability across various models. Numerous approaches are proposed to enhance the transferability of adversarial examples, including advanced optimization, data augmentation, and model modifications. However, these methods still show limited transferability, particularly in cross-architecture scenarios, such as from CNN to ViT. To achieve high transferability, we propose a technique termed Spatial Adversarial Alignment (SAA), which employs an alignment loss and leverages a witness model to fine-tune the surrogate model. Specifically, SAA consists of two key parts: spatial-aware alignment and adversarial-aware alignment.


Scalable inference of functional neural connectivity at submillisecond timescales

Neural Information Processing Systems

The Poisson Generalized Linear Model (GLM) is a foundational tool for analyzing neural spike train data. However, standard implementations rely on discretizing spike times into binned count data, limiting temporal resolution and scalability. Here, we develop Monte Carlo (MC) methods and polynomial approximations (PA) to the continuous-time analog of these models, and show them to be advantageous over their discrete-time counterparts. Further, we propose using a set of exponentially scaled Laguerre polynomials as an orthogonal temporal basis, which improves filter identification and yields closed-form integral solutions under the polynomial approximation. Applied to both synthetic and real spike-time data from rodent hippocampus, our methods demonstrate superior accuracy and scalability compared to traditional binned GLMs, enabling functional connectivity inference in large-scale neural recordings that are temporally precise on the order of synaptic dynamical timescales and in agreement with known anatomical properties of hippocampal subregions. We provide open-source implementations of both MC and PA estimators, optimized for GPU acceleration, to facilitate adoption in the neuroscience community1.


Seeing What Matters: Generalizable AI-generated Video Detection with Forensic-Oriented Augmentation

Neural Information Processing Systems

Synthetic video generation is progressing very rapidly. The latest models can produce very realistic high-resolution videos that are virtually indistinguishable from real ones. Although several video forensic detectors have been recently proposed, they often exhibit poor generalization, which limits their applicability in a real-world scenario. Our key insight to overcome this issue is to guide the detector towards seeing what really matters. In fact, a well-designed forensic classifier should focus on identifying intrinsic low-level artifacts introduced by a generative architecture rather than relying on high-level semantic flaws that characterize a specific model.


Mixture of Inputs: Text Generation Beyond Discrete Token Sampling

Neural Information Processing Systems

In standard autoregressive generation, an LLM predicts the next-token distribution, samples a discrete token, and then discards the distribution, passing only the sampled token as new input. To preserve this distribution's rich information, we propose Mixture of Inputs (MOI), a training-free method for autoregressive generation. After generating a token following the standard paradigm, we construct a new input that blends the generated discrete token with the previously discarded token distribution. Specifically, we employ a Bayesian estimation method that treats the token distribution as the prior, the sampled token as the observation, and replaces the conventional one-hot vector with the continuous posterior expectation as the new model input. MOI allows the model to maintain a richer internal representation throughout the generation process, resulting in improved text quality and reasoning capabilities. On mathematical reasoning, code generation, and PhDlevel QA tasks, MOI consistently improves performance across multiple models including QwQ-32B, Nemotron-Super-49B, Gemma-3-27B, and DAPO-Qwen32B, with no additional training and negligible computational overhead.


EgoBlind: Towards Egocentric Visual Assistance for the Blind

Neural Information Processing Systems

We present EgoBlind, the first egocentric VideoQA dataset collected from blind individuals to evaluate the assistive capabilities of contemporary multimodal large language models (MLLMs). EgoBlind comprises 1,392 first-person videos from the daily lives of blind and visually impaired individuals. It also features 5,311 questions directly posed or verified by the blind to reflect their in-situation needs for visual assistance. Each question has an average of 3 manually annotated reference answers to reduce subjectiveness. Using EgoBlind, we comprehensively evaluate 16 advanced MLLMs and find that all models struggle. The best performers achieve an accuracy near 60%, which is far behind human performance of 87.4%. To guide future advancements, we identify and summarize major limitations of existing MLLMs in egocentric visual assistance for the blind and explore heuristic solutions for improvement. With these efforts, we hope that EgoBlind will serve as a foundation for developing effective AI assistants to enhance the independence of the blind and visually impaired. Data and code are available at https://github.


Facebook's new AI tools offer more of the same, with photo-editing and question-answering capabilities

Engadget

Facebook's new AI tools offer more of the same, with photo-editing and question-answering capabilities Facebook's new AI tools offer more of the same, with photo-editing and question-answering capabilities Now you can ask a different chatbot which restaurant to try. Meta just announced a suite of AI tools for Facebook users. Nothing here looks especially new, but availability on Facebook could be of some use to certain power users. This is a standard chatbot that answers questions, with Meta using the example everyone uses when rolling out one of these tools. The company highlights a person asking the chatbot for nearby summer vacation spots. Meta does say that AI Mode pulls data from across its apps, like from Groups and Reels, so maybe the information provided will be slightly different than when asking about summer getaways via Gemini, Claude, Grok, ChatGPT and all the rest.


FRN: Fractal-Based Recursive Spectral Reconstruction Network

Neural Information Processing Systems

Generating hyperspectral images (HSIs) from RGB images through spectral reconstruction can significantly reduce the cost of HSI acquisition. In this paper, we propose a Fractal-Based Recursive Spectral Reconstruction Network (FRN), which differs from existing paradigms that attempt to directly integrate the full-spectrum information from the R, G, and B channels in a one-shot manner. Instead, it treats spectral reconstruction as a progressive process, predicting from broad to narrow bands or employing a coarse-to-fine approach for predicting the next wavelength. Inspired by fractals in mathematics, FRN establishes a novel spectral reconstruction paradigm by recursively invoking an atomic reconstruction module. In each invocation, only the spectral information from neighboring bands is used to provide clues for the generation of the image at the next wavelength, which follows the low-rank property of spectral data. Moreover, we design a band-aware state space model that employs a pixel-differentiated scanning strategy at different stages of the generation process, further suppressing interference from low-correlation regions caused by reflectance differences. Through extensive experimentation across different datasets, FRN achieves superior reconstruction performance compared to state-of-the-art methods. Code is available at https://github.com/mongko007/frn.


Protein Function Prediction with Contrastive Alignment

Neural Information Processing Systems

Predicting protein function from sequence is a central challenge in computational biology. While existing methods rely heavily on structured ontologies or similaritybased techniques, they often lack the flexibility to express structure-free functional descriptions and novel biological functions. In this work, we introduce Prot2TextV2, a novel multimodal sequence-to-text model that generates free-form natural language descriptions of protein function directly from amino acid sequences. Our method combines a protein language model as a sequence encoder (ESM-3B) and a decoder-only language model (LLaMA-3.1-8B-Instruct)


Object-X: Learning to Reconstruct Multi-Modal 3DObject Representations

Neural Information Processing Systems

Learning effective multi-modal 3D representations of objects is essential for numerous applications, such as augmented reality and robotics. Existing methods often rely on task-specific embeddings that are tailored either for semantic understanding or geometric reconstruction. As a result, these embeddings typically cannot be decoded into explicit geometry and simultaneously reused across tasks. In this paper, we propose Object-X, a versatile multi-modal object representation framework capable of encoding rich object embeddings (e.g., images, point cloud, text) and decoding them back into detailed geometric and visual reconstructions. Object-X operates by geometrically grounding the captured modalities in a 3D voxel grid and learning an unstructured embedding fusing the information from the voxels with the object attributes. The learned embedding enables 3D Gaussian Splatting-based object reconstruction, while also supporting a range of downstream tasks, including scene alignment, single-image 3D object reconstruction, and localization. Evaluations on two challenging real-world datasets demonstrate that Object-X achieves high-fidelity novel-view synthesis comparable to standard 3DGaussian Splatting, while significantly improving geometric accuracy. Moreover, Object-X achieves competitive performance with specialized methods in scene alignment and localization. Critically, our object-centric descriptors require 3-4 orders of magnitude less storage compared to traditional imageor point cloud-based approaches, establishing Object-X as a scalable and highly practical solution for multi-modal 3D scene representation.


New Baseus Spacemate dock includes the feature I've been waiting for

PCWorld

PCWorld reviews the Baseus Spacemate RD1 Pro USB-C dock, highlighting its standout wireless charging feature alongside multiple connectivity options. The dock offers dual 4K HDMI ports, USB-C, USB-A, SD card slots, and Ethernet, though performance may be limited by its 10Gbps connection. Available on Amazon for $199.99 (down from $299.99 until June 30), it provides solid value despite potential data bottlenecks. I've reviewed dozens of USB-C, Thunderbolt, and DisplayLink docks. And there's a fundamental problem: They take up a lot of space.