Goto

Collaborating Authors

 Industry


Best early Prime Day deals on Thunderbolt docks & USB-C hubs

PCWorld

When you purchase through links in our articles, we may earn a small commission. Amazon's Prime Day is one of the best opportunities for shoppers to score great deals on Thunderbolt docks and their cousins, USB-C hubs. I should know -- I've been tracking them for years now. Docking stations and hubs offer an opportunity to connect more peripherals to your desk without breaking the bank. Amazon has been the traditional home of the best deals in both categories, year round, which makes it an even more ideal place to shop during Prime Day -- and before.


MiCADangelo: Fine-Grained Reconstruction of Constrained CADModels from 3DScans

Neural Information Processing Systems

Computer-Aided Design (CAD) plays a foundational role in modern manufacturing and product development, often requiring designers to modify or build upon existing models. Converting 3D scans into parametric CAD representations--a process known as CAD reverse engineering--remains a significant challenge due to the high precision and structural complexity of CAD models. Existing deep learning-based approaches typically fall into two categories: bottom-up, geometry-driven methods, which often fail to produce fully parametric outputs, and top-down strategies, which tend to overlook fine-grained geometric details.


SKETCHMIND: AMulti-Agent Cognitive Framework for Assessing Student-Drawn Scientific Sketches

Neural Information Processing Systems

Scientific sketches (e.g., models) offer a powerful lens into students' conceptual understanding, yet AI-powered automated assessment of such free-form, visually diverse artifacts remains a critical challenge. Existing solutions often treat sketch evaluation as either an image classification task or monolithic vision-language models, which lack interpretability, pedagogical alignment, and adaptability across cognitive levels. To address these limitations, we present SKETCHMIND, a cognitively grounded, multi-agent framework for evaluating and improving studentdrawn scientific sketches. SKETCHMIND introduces Sketch Reasoning Graphs (SRGs), semantic graph representations that embed domain concepts and Bloom's taxonomy-based cognitive labels. The system comprises modular agents responsible for rubric parsing, sketch perception, cognitive alignment, and iterative feedback with sketch modification, enabling personalized and transparent evaluation. We evaluate SKETCHMIND on a curated dataset of 3,575 student-generated sketches across six science assessment items with different highest order of Bloom's level that require students to draw models to explain phenomena. Compared to baseline GPT-4o performance without SRG(average accuracy: 55.6%), and with bSRGintegration achieves 77.1% average accuracy (+21.4% average absolute gain).


ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning

Neural Information Processing Systems

Vision-language-action (VLA) reasoning tasks require agents to interpret multimodal instructions, perform long-horizon planning, and act adaptively in dynamic environments. Existing approaches typically train VLA models in an end-to-end fashion, directly mapping inputs to actions without explicit reasoning, which hinders their ability to plan over multiple steps or adapt to complex task variations. In this paper, we propose ThinkAct, a dual-system framework that bridges high-level reasoning with low-level action execution via reinforced visual latent planning. ThinkAct trains a multimodal LLM to generate embodied reasoning plans guided by reinforcing action-aligned visual rewards based on goal completion and trajectory consistency. These reasoning plans are compressed into a visual plan latent that conditions a downstream action model for robust action execution on target environments. Extensive experiments on embodied reasoning and robot manipulation benchmarks demonstrate that ThinkAct enables few-shot adaptation, long-horizon planning, and self-correction behaviors in complex embodied AI tasks.


771155abaae744e08576f1f3b4b7ac0d-Paper-Conference.pdf

Neural Information Processing Systems

We introduce FlowMo, a novel training-free guidance method that enhances motion coherence using only the model's own predictions in each diffusion step. FlowMo first derives an appearance-debiased temporal representation by measuring the distance between latents corresponding to consecutive frames. This highlights the implicit temporal structure predicted by the model. It then estimates motion coherence by measuring the patch-wise variance across the temporal dimension and guides the model to reduce this variance dynamically during sampling. Extensive experiments across multiple text-to-video models demonstrate that FlowMo significantly improves motion coherence without sacrificing visual quality or prompt alignment, offering an effective plug-and-play solution for enhancing the temporal fidelity of pre-trained video diffusion models.


Effortless, Simulation-Efficient Bayesian Inference using Tabular Foundation Models

Neural Information Processing Systems

Simulation-based inference (SBI) offers a flexible and general approach to performing Bayesian inference: In SBI, a neural network is trained on synthetic data simulated from a model and used to rapidly infer posterior distributions for observed data. A key goal for SBI is to achieve accurate inference with as few simulations as possible, especially for expensive simulators. In this work, we address this challenge by repurposing recent probabilistic foundation models for tabular data: We show how tabular foundation models--specifically TabPFN--can be used as pre-trained autoregressive conditional density estimators for SBI. We propose Neural Posterior Estimation with Prior-data Fitted Networks (NPE-PFN) and show that it is competitive with current SBI approaches in terms of accuracy for both benchmark tasks and two complex scientific inverse problems. Crucially, it often substantially outperforms them in terms of simulation efficiency, sometimes requiring orders of magnitude fewer simulations. NPE-PFN eliminates the need for selecting and training an inference network and tuning its hyperparameters. We also show that it exhibits superior robustness to model misspecification and can be scaled to simulation budgets that exceed the context size limit of TabPFN. NPE-PFN provides a new direction for SBI, where training-free, general-purpose inference models offer efficient, easy-to-use, and flexible solutions for a wide range of stochastic inverse problems.


AReinforcement Learning-based Bidding Strategy for Data Consumers in Auction-based Federated Learning

Neural Information Processing Systems

A major challenge in AFL pertains to how DCs select and bid for DOs. Existing methods are generally static, making them ill-suited for dynamic AFL markets. To address this issue, we propose the Reinforcement Learning-based Bidding Strategy for DCs in Auction-based Federated Learning (RLB-AFL). We incorporate historical states into a Deep Q-Network to capture sequential information critical for bidding decisions. To mitigate state space sparsity, where specific states rarely reoccur for each DC during auctions, we incorporate the Gaussian Mixture Model into RLB-AFL.



CReFT-CAD: Boosting Orthographic Projection Reasoning for CAD via Reinforcement Fine-Tuning

Neural Information Processing Systems

Computer-Aided Design (CAD) is pivotal in industrial manufacturing, with orthographic projection reasoning foundational to its entire workflow--encompassing design, manufacturing, and simulation. However, prevailing deep-learning approaches employ standard 3D reconstruction pipelines as an alternative, which often introduce imprecise dimensions and limit the parametric editability required for CAD workflows. Recently, some researchers adopt vision-language models (VLMs), particularly supervised fine-tuning (SFT), to tackle CAD-related challenges. SFT shows promise but often devolves into pattern memorization, resulting in poor out-of-distribution (OOD) performance on complex reasoning tasks. To tackle these limitations, we introduce CReFT-CAD, a two-stage finetuning paradigm: first, a curriculum-driven reinforcement learning stage with difficulty-aware rewards to steadily build reasoning abilities; second, supervised post-tuning to refine instruction following and semantic extraction. Complementing this, we release TriView2CAD, the first large-scale, open-source benchmark for orthographic projection reasoning, comprising 200,000 synthetic and 3,000 real-world orthographic projections with precise dimensional annotations and six interoperable data modalities. Benchmarking leading VLMs on orthographic projection reasoning, we show that CReFT-CAD significantly improves reasoning accuracy and OOD generalizability in real-world scenarios, providing valuable insights to advance CAD reasoning research.


World Central Banks (Supplementary Material)

Neural Information Processing Systems

Kaggle3 Publishing our dataset on Kaggle offers distinct advantages over platforms like HuggingFace, par-4 ticularly in terms of usability and community engagement. Kaggle provides integrated tools for data5 visualization, version control, and collaborative discussion, streamlining the research workflow. Unlike Hug-7 gingFace, which is primarily model-focused, Kaggle is optimized for dataset-driven experimenta-8 tion, making it a more practical platform for sharing, validating, and improving data-centric work.9 Hosting the dataset on Kaggle thus ensures greater transparency, accessibility, and impact across10 both academic and applied research communities.11 Website12 Our World Central Banks website offers a structured and accessible overview of our research. It13 features a task-specific model leaderboard with direct links to download and explore the best per-14 forming model.