Goto

Collaborating Authors

 perpendicular



Ananalytictheoryofshallownetworksdynamicsfor hingelossclassification--SupplementaryMaterial

Neural Information Processing Systems

In physical systems a particle instead interacts only with a finite number of other particles, hence the density field remains highly fluctuating. The effect of theฮธ(w x) term is to select one particular half-space over which the integralisdone. To estimate the fluctuations due to a finite number of nodes, we will have to estimate the width of the output distribution for a given set of parameters. Toestimate the error inFigure 1d ofthe main text, we ask what are the values ofxk = xcosฮธ such that the average output plus or minus a standard deviation, divided by M, would be equal to the threshold. Since the standard deviation involves|x|2, we estimate its average value for points 3 with a givenxk, i.e.


How to Use Physics to Escape an Ice Bowl

WIRED

Here are three smart tricks, based on an understanding of frictional forces, to beat a slippery slope. I don't know who invented this crazy challenge, but the idea is to put someone in a carved-out ice bowl and see if they can get out. The bowl is shaped like the inside of a sphere, so the higher up the sides you go, the steeper it gets. If you think an icy sidewalk is slippery, try going uphill on an icy sidewalk. What do you do when faced with a problem like this?


Rethinking Intermediate Representation for VLM-based Robot Manipulation

arXiv.org Artificial Intelligence

Vision-Language Model (VLM) is an important component to enable robust robot manipulation. Y et, using it to translate human instructions into an action-resolvable intermediate representation often needs a tradeoff between VLM-comprehensibility and generalizability. Inspired by context-free grammar, we design the Semantic Assembly representation named SEAM, by decomposing the intermediate representation into vocabulary and grammar . Doing so leads us to a concise vocabulary of semantically-rich operations and a VLM-friendly grammar for handling diverse unseen tasks. In addition, we design a new open-vocabulary segmentation paradigm with a retrieval-augmented few-shot learning strategy to localize fine-grained object parts for manipulation, effectively with the shortest inference time over all state-of-the-art parallel works. Also, we formulate new metrics for action-generalizability and VLM-comprehensibility, demonstrating the compelling performance of SEAM over mainstream representations on both aspects.


Towards Efficient Multimodal Unified Reasoning Model via Model Merging

arXiv.org Artificial Intelligence

Although Multimodal Large Language Models (MLLMs) have demonstrated remarkable capabilities across diverse tasks, they encounter challenges in terms of reasoning efficiency, large model size and overthinking. However, existing lightweight MLLMs lack the capability to balance high efficiency and performance at a small scale. T o this end, we propose Tiny-R1V, a novel lightweight 3B model that achieves faster inference and higher accuracy via a two-stage optimization, while unifying multimodal reasoning across multiple tasks with fewer inference tokens. In the first stage, Tiny-R1V introduces Length-Informed Relative Policy Optimization (LIPO), a new reinforcement learning method, to train each reasoning model, including mathematical reasoning, chart reasoning, and OCR capability. The LIPO dynamically adjusts the advantages of responses within groups by prioritizing concise yet high-quality responses to encourage the generation of shorter and more accurate responses. In the second stage, we propose Adaptive Model Merging (AMM), a training-free model merging method that merges multiple specialist models into a unified architecture. Specifically, AMM adap-tively adjusts the weights of task vectors via a novel gradient projection regularization loss function, thus mitigating redundant conflicts between them. Extensive evaluations on ten widely-used reasoning benchmarks covering mathematics, structured data (charts, tables, documents), OCR, and general capabilities showcase the superior performance of Tiny-R1V, enabling lightweight models to excel in diverse multimodal reasoning tasks.


Spiders 'decorate' their webs to help trap dinner

Popular Science

Environment Animals Wildlife Spiders Spiders'decorate' their webs to help trap dinner Stabilimenta may help spiders find a buggy snack. Breakthroughs, discoveries, and DIY tips sent every weekday. One of nature's most beautiful natural wonders, spider webs sometimes feature little extra bits of flair called stabilimenta . Stabilimenta are highly-reflective UV structures. Basically, think of them like spidey bike reflectors scattered throughout a web.


Enhancing Long Chain-of-Thought Reasoning through Multi-Path Plan Aggregation

arXiv.org Artificial Intelligence

Monte Carlo (TSMC) to provide scalable stepwise supervision using small LMs. This yields more efficient training, improved stability, and higher accuracy. OpenAI's o1 series (OpenAI, 2024) introduce inference-time scaling by increasing the length of the Chain-of-Thought (CoT) (Wei et al., 2022) reasoning process. Despite their empirical success, RL approaches that generate the entire reasoning chain in a single forward pass face notable limitations, including CoT derailment, where the reasoning trajectory drifts off course due to accumulated errors, and the inherent challenges of long-horizon RL with sparse outcome rewards. This sequential scaling strategy, i.e., simply extending the CoT length, can therefore be insufficient (Y ang et al., 2025). To improve planning quality, we introduce Multi-Path Plan Aggregation (MPP A). For each planning step, the model generates multiple alternative plans and aggregates them into an improved plan before proceeding to the subsequent execution steps. Beyond enhancing planning, we identify a fundamental challenge in credit assignment for long-horizon policy learning (Kaelbling et al., 1996). Existing RL fine-tuning frameworks struggle to provide effective process-level supervision (Guo et al., 2025). First, evaluating the correctness of intermediate steps is inherently difficult. Automated annotation using LLM judges (Gu et al., 2024) often yield unreliable or noisy signals Second, introducing a separate process reward model (PRM) adds complexity. We then define the process preference between two candidate continuations at the same step by comparing their incremental log-weights. We repurpose Twisted Sequential Monte Carlo (TSMC) to provide process-level preferences for online Step-DPO training. Results show that our approach consistently outperforms both distillation-based long-CoT methods and RL methods that rely solely on outcome rewards. The Chain-of-Thought trajectories can be lengthy and the positions of the first error vary considerably, making outcome-based RL fine-tuning inefficient. Training long trajectories with outcome rewards is highly inefficient.


ThinKV: Thought-Adaptive KV Cache Compression for Efficient Reasoning Models

arXiv.org Artificial Intelligence

The long-output context generation of large reasoning models enables extended chain of thought (CoT) but also drives rapid growth of the key-value (KV) cache, quickly overwhelming GPU memory. To address this challenge, we propose ThinKV, a thought-adaptive KV cache compression framework. ThinKV is based on the observation that attention sparsity reveals distinct thought types with varying importance within the CoT. It applies a hybrid quantization-eviction strategy, assigning token precision by thought importance and progressively evicting tokens from less critical thoughts as reasoning trajectories evolve. Furthermore, to implement ThinKV, we design a kernel that extends PagedAttention to enable efficient reuse of evicted tokens' memory slots, eliminating compaction overheads. Extensive experiments on DeepSeek-R1-Distill, GPT-OSS, and NVIDIA AceReason across mathematics and coding benchmarks show that ThinKV achieves near-lossless accuracy with less than 5% of the original KV cache, while improving performance with up to 5.8x higher inference throughput over state-of-the-art baselines.



Reviewer 1: " the statement in line 153 in the neighbourhood of z nullJ i (z), f (x)null = 0. "

Neural Information Processing Systems

We are grateful to the reviewers for the insightful comments on our submission. All the minor comments will also be addressed in the revised manuscript. We will update line 153 to " The domain of z can be easily adjusted by translation and dilation after the training process. Reviewer 1: "emphasize the need for gradient evaluations when you state the observation." " .......The first and fourth columns show the relationship between the output and NN is very efficient compared to the evaluating the FEM model in Case (ii).