Goto

Collaborating Authors

 pivot


Entropic Confinement and Mode Connectivity in Overparameterized Neural Networks

Di Carlo, Luca, Goddard, Chase, Schwab, David J.

arXiv.org Machine Learning

Modern neural networks exhibit a striking property: basins of attraction in the loss landscape are often connected by low-loss paths, yet optimization dynamics generally remain confined to a single convex basin (Baity-Jesi et al., 2019; Juneja et al., 2023) and rarely explore intermediate points. We resolve this paradox by identifying entropic barriers arising from the interplay between curvature variations along these paths and noise in optimization dynamics. Empirically, we find that curvature systematically rises away from minima, producing effective forces that bias noisy dynamics back toward the endpoints -- even when the loss remains nearly flat. These barriers persist longer than energetic barriers, shaping the late-time localization of solutions in parameter space. Our results highlight the role of curvature-induced entropic forces in governing both connectivity and confinement in deep learning landscapes. Deep neural networks trained, in the overparametrized regime, exhibit a number of surprising and counterintuitive properties. One of the most striking is the observation that distinct solutions, found with standard optimization algorithms, are often connected by low-loss paths in parameter space (Garipov et al., 2018; Draxler et al., 2018; Frankle et al., 2020). Such mode connectivity results imply that the landscape is far less rugged than once assumed: minima that appear isolated are, in fact, linked by paths of low, nearly constant loss. At the same time, however, optimization dynamics display a seemingly contradictory behavior.




Ensembling Graph Predictions for AMR Parsing Hoang Thanh Lam

Neural Information Processing Systems

AMR parsing is an important problem in natural language processing (NLP) research and it has a broad application in downstream tasks such as question answering [Kapanipathi et al., 2020] and common sense reasoning [Lim et al., 2020].


Solver-Free Decision-Focused Learning for Linear Optimization Problems

Berden, Senne, Mahmutoğulları, Ali İrfan, Tsouros, Dimos, Guns, Tias

arXiv.org Artificial Intelligence

Mathematical optimization is a fundamental tool for decision-making in a wide range of applications. However, in many real-world scenarios, the parameters of the optimization problem are not known a priori and must be predicted from contextual features. This gives rise to predict-then-optimize problems, where a machine learning model predicts problem parameters that are then used to make decisions via optimization. A growing body of work on decision-focused learning (DFL) addresses this setting by training models specifically to produce predictions that maximize downstream decision quality, rather than accuracy. While effective, DFL is computationally expensive, because it requires solving the optimization problem with the predicted parameters at each loss evaluation. In this work, we address this computational bottleneck for linear optimization problems, a common class of problems in both DFL literature and real-world applications. We propose a solver-free training method that exploits the geometric structure of linear optimization to enable efficient training with minimal degradation in solution quality. Our method is based on the insight that a solution is optimal if and only if it achieves an objective value that is at least as good as that of its adjacent vertices on the feasible polytope. Building on this, our method compares the estimated quality of the ground-truth optimal solution with that of its precomputed adjacent vertices, and uses this as loss function. Experiments demonstrate that our method significantly reduces computational cost while maintaining high decision quality.


RL makes MLLMs see better than SFT

Song, Junha, Yun, Sangdoo, Han, Dongyoon, Choo, Jaegul, Heo, Byeongho

arXiv.org Artificial Intelligence

A dominant assumption in Multimodal Language Model (MLLM) research is that its performance is largely inherited from the LLM backbone, given its immense parameter scale and remarkable capabilities. This has created a void in the understanding of the vision encoder, which determines how MLLMs perceive images. The recent shift in MLLM training paradigms, from Supervised Finetuning (SFT) to Reinforcement Learning (RL), magnifies this oversight-namely, the significant lack of analysis on how such training reshapes the vision encoder as well as the MLLM. To address this, we first investigate the impact of training strategies on MLLMs, where RL shows a clear advantage over SFT in strongly vision-related VQA benchmarks. Motivated by this, we conduct a critical yet under-explored analysis of the vision encoder of MLLMs through diverse and in-depth experiments, ranging from ImageNet classification and segmentation to gradient visualization. Our results demonstrate that MLLM's post-training strategy (i.e., SFT or RL) not only leads to distinct outcomes on MLLM downstream tasks, but also fundamentally reshapes MLLM's underlying visual representations. Specifically, the key finding of our study is that RL produces stronger and precisely localized visual representations compared to SFT, boosting the ability of the vision encoder for MLLM. We then reframe our findings into a simple recipe for building strong vision encoders for MLLMs, Preference-Instructed Vision OpTimization (PIVOT). When integrated into MLLMs, a PIVOT-trained vision encoder outperforms even larger and more heavily-trained counterparts, despite requiring less than 1% of the computational cost of standard vision pretraining. This result opens an effective and efficient path for advancing the vision backbones of MLLMs. Project page available at https://june-page.github.io/pivot/


See, Point, Fly: A Learning-Free VLM Framework for Universal Unmanned Aerial Navigation

Hu, Chih Yao, Lin, Yang-Sen, Lee, Yuna, Su, Chih-Hai, Lee, Jie-Ying, Tsai, Shr-Ruei, Lin, Chin-Yang, Chen, Kuan-Wen, Ke, Tsung-Wei, Liu, Yu-Lun

arXiv.org Artificial Intelligence

We present See, Point, Fly (SPF), a training-free aerial vision-and-language navigation (AVLN) framework built atop vision-language models (VLMs). SPF is capable of navigating to any goal based on any type of free-form instructions in any kind of environment. In contrast to existing VLM-based approaches that treat action prediction as a text generation task, our key insight is to consider action prediction for AVLN as a 2D spatial grounding task. SPF harnesses VLMs to decompose vague language instructions into iterative annotation of 2D waypoints on the input image. Along with the predicted traveling distance, SPF transforms predicted 2D waypoints into 3D displacement vectors as action commands for UAVs. Moreover, SPF also adaptively adjusts the traveling distance to facilitate more efficient navigation. Notably, SPF performs navigation in a closed-loop control manner, enabling UAVs to follow dynamic targets in dynamic environments. SPF sets a new state of the art in DRL simulation benchmark, outperforming the previous best method by an absolute margin of 63%. In extensive real-world evaluations, SPF outperforms strong baselines by a large margin. We also conduct comprehensive ablation studies to highlight the effectiveness of our design choice. Lastly, SPF shows remarkable generalization to different VLMs. Project page: https://spf-web.pages.dev



SoftBank's Vision Fund mulls 20% job cuts after Son's pivot to AI

The Japan Times

SoftBank's Vision Fund mulls 20% job cuts after Son's pivot to AI SoftBank Group's Vision Fund is considering cutting as much as 20% of its staff. SoftBank Group's Vision Fund is considering cutting as much as 20% of its staff, a person familiar with the matter said, underscoring a shift in CEO Masayoshi Son's focus to ambitious bets on artificial intelligence. The unit, which employed about 282 people as of the end of March, may shed more than 50 roles, the person said, asking not to be identified discussing private deliberations. The reduction extends years of cutbacks as the Vision Fund unit shrank in importance next to Son's growing appetite for big AI bets. Those include a plan to invest about $30 billion in OpenAI and a $6.5 billion deal to acquire chip designer Ampere Computing, which faces regulatory scrutiny.


Ensembling Graph Predictions for AMR Parsing Hoang Thanh Lam

Neural Information Processing Systems

AMR parsing is an important problem in natural language processing (NLP) research and it has a broad application in downstream tasks such as question answering [Kapanipathi et al., 2020] and common sense reasoning [Lim et al., 2020].