Goto

Collaborating Authors

 sparta



Mash, Spread, Slice! Learning to Manipulate Object States via Visual Spatial Progress

arXiv.org Artificial Intelligence

Most robot manipulation focuses on changing the kinematic state of objects: picking, placing, opening, or rotating them. However, a wide range of real-world manipulation tasks involve a different class of object state change--such as mashing, spreading, or slicing--where the object's physical and visual state evolve progressively without necessarily changing its position. We present SPARTA, the first unified framework for the family of object state change manipulation tasks. Our key insight is that these tasks share a common structural pattern: they involve spatially-progressing, object-centric changes that can be represented as regions transitioning from an actionable to a transformed state. Building on this insight, SPARTA integrates spatially progressing object change segmentation maps, a visual skill to perceive actionable vs. transformed regions for specific object state change tasks, to generate a) structured policy observations that strip away appearance variability, and b) dense rewards that capture incremental progress over time. These are leveraged in two SPARTA policy variants: reinforcement learning for fine-grained control without demonstrations or simulation; and greedy control for fast, lightweight deployment. We validate SPARTA on a real robot for three challenging tasks across 10 diverse real-world objects, achieving significant improvements in training time and accuracy over sparse rewards and visual goal-conditioned baselines. Our results highlight progress-aware visual representations as a versatile foundation for the broader family of object state manipulation tasks. Project website: https://vision.cs.utexas.edu/projects/sparta-robot


An Optimization Framework for Differentially Private Sparse Fine-Tuning

arXiv.org Machine Learning

Differentially private stochastic gradient descent (DP-SGD) is broadly considered to be the gold standard for training and fine-tuning neural networks under differential privacy (DP). With the increasing availability of high-quality pre-trained model checkpoints (e.g., vision and language models), fine-tuning has become a popular strategy. However, despite recent progress in understanding and applying DP-SGD for private transfer learning tasks, significant challenges remain -- most notably, the performance gap between models fine-tuned with DP-SGD and their non-private counterparts. Sparse fine-tuning on private data has emerged as an alternative to full-model fine-tuning; recent work has shown that privately fine-tuning only a small subset of model weights and keeping the rest of the weights fixed can lead to better performance. In this work, we propose a new approach for sparse fine-tuning of neural networks under DP. Existing work on private sparse finetuning often used fixed choice of trainable weights (e.g., updating only the last layer), or relied on public model's weights to choose the subset of weights to modify. Such choice of weights remains suboptimal. In contrast, we explore an optimization-based approach, where our selection method makes use of the private gradient information, while using off the shelf privacy accounting techniques. Our numerical experiments on several computer vision models and datasets show that our selection method leads to better prediction accuracy, compared to full-model private fine-tuning or existing private sparse fine-tuning approaches.


Sparsity May Be All You Need: Sparse Random Parameter Adaptation

arXiv.org Artificial Intelligence

Full fine-tuning of large language models for alignment and task adaptation has become prohibitively expensive as models have grown in size. Parameter-Efficient Fine-Tuning (PEFT) methods aim at significantly reducing the computational and memory resources needed for fine-tuning these models by only training on a small number of parameters instead of all model parameters. Currently, the most popular PEFT method is the Low-Rank Adaptation (LoRA), which freezes the parameters of the model to be fine-tuned and introduces a small set of trainable parameters in the form of low-rank matrices. We propose simply reducing the number of trainable parameters by randomly selecting a small proportion of the model parameters to train on. In this paper, we compare the efficiency and performance of our proposed approach with PEFT methods, including LoRA, as well as full parameter fine-tuning.


Modeling Strong and Human-Like Gameplay with KL-Regularized Search

arXiv.org Artificial Intelligence

We consider the task of building strong but human-like policies in multi-agent decision-making problems, given examples of human behavior. Imitation learning is effective at predicting human actions but may not match the strength of expert humans, while self-play learning and search techniques (e.g. AlphaZero) lead to strong performance but may produce policies that are difficult for humans to understand and coordinate with. We show in chess and Go that regularizing search policies based on the KL divergence from an imitation-learned policy by applying Monte Carlo tree search produces policies that have higher human prediction accuracy and are stronger than the imitation policy. We then introduce a novel regret minimization algorithm that is regularized based on the KL divergence from an imitation-learned policy, and show that applying this algorithm to no-press Diplomacy yields a policy that maintains the same human prediction accuracy as imitation learning while being substantially stronger.


Scalable Online Planning via Reinforcement Learning Fine-Tuning

arXiv.org Artificial Intelligence

Lookahead search has been a critical component of recent AI successes, such as in the games of chess, go, and poker. However, the search methods used in these games, and in many other settings, are tabular. Tabular search methods do not scale well with the size of the search space, and this problem is exacerbated by stochasticity and partial observability. In this work we replace tabular search with online model-based fine-tuning of a policy neural network via reinforcement learning, and show that this approach outperforms state-of-the-art search algorithms in benchmark settings. In particular, we use our search algorithm to achieve a new state-of-the-art result in self-play Hanabi, and show the generality of our algorithm by also showing that it outperforms tabular search in the Atari game Ms. Pacman.


Learned Belief Search: Efficiently Improving Policies in Partially Observable Settings

arXiv.org Artificial Intelligence

Search is an important tool for computing effective policies in single- and multi-agent environments, and has been crucial for achieving superhuman performance in several benchmark fully and partially observable games. However, one major limitation of prior search approaches for partially observable environments is that the computational cost scales poorly with the amount of hidden information. In this paper we present \emph{Learned Belief Search} (LBS), a computationally efficient search procedure for partially observable environments. Rather than maintaining an exact belief distribution, LBS uses an approximate auto-regressive counterfactual belief that is learned as a supervised task. In multi-agent settings, LBS uses a novel public-private model architecture for underlying policies in order to efficiently evaluate these policies during rollouts. In the benchmark domain of Hanabi, LBS can obtain 55% ~ 91% of the benefit of exact search while reducing compute requirements by $35.8 \times$ ~ $4.6 \times$, allowing it to scale to larger settings that were inaccessible to previous search methods.


Hadamard Wirtinger Flow for Sparse Phase Retrieval

arXiv.org Machine Learning

Phase retrieval, the problem of reconstructing a signal from the (squared) magnitude of its Fourier (or any linear) transform, arises in many fields of science and engineering. Such a task is naturally involved in applications such as crystallography (Millane, 1990) and diffraction imaging (Bunk et al., 2007), where optical sensors are able to measure the intensity, but not the phase of a light wave. Due to the loss of phase information, the one-dimensional Fourier phase retrieval problem is ill-posed in general. Common approaches to overcome this ill-posedness include using prior information such as non-negativity, sparsity and the signal's magnitude (Fienup, 1982; Jaganathan et al., 2016), or introducing redundancy into the measurements by oversampling random Gaussian measurements or coded diffraction patterns (Candès et al., 2015; Chen and Candès, 2015). In many applications, the underlying signal is naturally sparse (Jaganathan et al., 2016). A wide range of algorithms has been devised for phase retrieval with a sparse signal, including alternating minimization (SparseAltMinPhase) (Netrapalli et al., 2015), non-convex optimization based approaches such as thresholded Wirtinger flow (TWF) (Cai et al., 2016), sparse truncated amplitude flow (SPARTA) (Wang et al., 2018), compressive reweighted amplitude flow (CRAF) (Zhang et al., 2018) and sparse Wirtinger flow (SWF) (Yuan et al., 2019), and convex relaxation methods such as compressive phase retrieval via lifting (CPRL) (Ohlsson et al., 2012) and SparsePhaseMax (Hand and Voroninski, 2016). Other approaches to sparse phase retrieval include the greedy algorithm GESPAR (Schechtman et al., 2014), an algorithm based on generalized


Assassin's Creed: Odyssey's Exploration Mode isn't afraid of letting players get lost

PCWorld

Odysseus would be heartbroken to see his home now. The once resplendent palace on the island of Ithaca is now a crumbling ruin, not much more than scattered piles of stone and a couple of faltering walls. It's also host to a company of bandits, patrolling the ancient grounds like the suitors who once tried to steal poor Penelope away. Assassin that I am, I kill first one and then the other bandit leaders. Odysseus may be 800 years dead, but I can at least pay my respects.


Here's Why There Are No Assassins In 'Assassin's Creed Odyssey'

Forbes - Tech

Ubisoft's most-spotlighted (and most-leaked) game of E3 was definitely Assassin's Creed Odyssey, a new entry in the game that is…going back to making the series an annual franchise, even if Ubisoft said they were steering away from that. While Odyssey looks a lot like an Origins reskin, this time set in ancient Greece, it's going pretty hard into full-on RPG territory, complete with individual pieces of armor with different rarities, dialogue trees and even romance options for your character, where you can play as either the male Alexios or female Kassandra, both angling to become Spartan legends. What has been consistently weird about Assassin's Creed Odyssey is that other than looking like an Assassin's Creed game, there are almost no traces of…Assassins at all, at least as we've come to know them. There is an "assassin" skill tree, but that's a lower case "a" along with hunter and warrior skill trees. The only thing that seems remotely connected to the Assassin's Creed universe at all is that it seems pretty clear that your treasured weapon, the spear of Leonidas, is a Piece of Eden, giving you supernatural powers in combat.