Goto

Collaborating Authors

 srt


The Paramilitary ICE and CBP Units at the Center of Minnesota's Killings

WIRED

Two agents involved in the shooting deaths of US citizens in Minneapolis are reportedly part of highly militarized DHS units whose extreme tactics are generally reserved for war zones. An officer with a Department of Homeland Security Special Response Team stands against a protester in Portland, Oregon. As Minneapolis continues to reel from the fatal shooting of 37-year-old intensive care nurse Alex Pretti by federal agents on the morning of January 24, the international spotlight is firmly fixed on the heavily armed and masked operatives who have spearheaded the Trump administration's violent immigration sweeps. At the heart of the deployment in Minnesota, as well as the chaotic clashes with communities in Southern California and Illinois, are hundreds of agents that operate within Immigration and Customs Enforcement (ICE) and Customs and Border Protection: ICE's two Special Response Teams (SRT), CBP's one SRT, and the Border Patrol Tactical Unit (BORTAC). These paramilitary tactical units behave not like local police, but instead like special forces in Iraq, Afghanistan, or other far-flung battlefields from the Forever Wars of the past quarter century.


Can Large Reasoning Models Self-Train?

Shafayat, Sheikh, Tajwar, Fahim, Salakhutdinov, Ruslan, Schneider, Jeff, Zanette, Andrea

arXiv.org Artificial Intelligence

Recent successes of reinforcement learning (RL) in training large reasoning models motivate the question of whether self-training - the process where a model learns from its own judgments - can be sustained within RL. In this work, we study this question using majority voting as a simple self-feedback mechanism. On a comprehensive set of experiments on both synthetic and real reasoning tasks, we find that this basic approach improves not only the model's reasoning performance, but also its capability of generating better quality feedback for the next RL iteration, driving further model improvement. Yet our analysis also reveals a critical limitation of such a self-training paradigm - prolonged RL with self-reward leads to reward hacking where models learn to maximize training (pseudo-)reward, resulting in sudden and complete performance collapse. Together, these results highlight feedback design as the central challenge and call for future research on mechanisms to enable prolonged self-improvement.


Order Optimal Regret Bounds for Sharpe Ratio Optimization in the Bandit Setting

Shah, Mohammad Taha, Khurshid, Sabrina, Ghatak, Gourab

arXiv.org Artificial Intelligence

In this paper, we investigate the problem of sequential decision-making for Sharpe ratio (SR) maximization in a stochastic bandit setting. We focus on the Thompson Sampling (TS) algorithm, a Bayesian approach celebrated for its empirical performance and exploration efficiency, under the assumption of Gaussian rewards with unknown parameters. Unlike conventional bandit objectives focusing on maximizing cumulative reward, Sharpe ratio optimization instead introduces an inherent tradeoff between achieving high returns and controlling risk, demanding careful exploration of both mean and variance. Our theoretical contributions include a novel regret decomposition specifically designed for the Sharpe ratio, highlighting the role of information acquisition about the reward distribution in driving learning efficiency. Then, we establish fundamental performance limits for the proposed algorithm \texttt{SRTS} in terms of an upper bound on regret. We also derive the matching lower bound and show the order-optimality. Our results show that Thompson Sampling achieves logarithmic regret over time, with distribution-dependent factors capturing the difficulty of distinguishing arms based on risk-adjusted performance. Empirical simulations show that our algorithm significantly outperforms existing algorithms.


Soft regression trees: a model variant and a decomposition training algorithm

Consolo, Antonio, Amaldi, Edoardo, Manno, Andrea

arXiv.org Artificial Intelligence

Decision trees are widely used for classification and regression tasks in a variety of application fields due to their interpretability and good accuracy. During the past decade, growing attention has been devoted to globally optimized decision trees with deterministic or soft splitting rules at branch nodes, which are trained by optimizing the error function over all the tree parameters. In this work, we propose a new variant of soft multivariate regression trees (SRTs) where, for every input vector, the prediction is defined as the linear regression associated to a single leaf node, namely, the leaf node obtained by routing the input vector from the root along the branches with higher probability. SRTs exhibit the conditional computational property, i.e., each prediction depends on a small number of nodes (parameters), and our nonlinear optimization formulation for training them is amenable to decomposition. After showing a universal approximation result for SRTs, we present a decomposition training algorithm including a clustering-based initialization procedure and a heuristic for reassigning the input vectors along the tree. Under mild assumptions, we establish asymptotic convergence guarantees. Experiments on 15 wellknown datasets indicate that our SRTs and decomposition algorithm yield higher accuracy and robustness compared with traditional soft regression trees trained using the nonlinear optimization formulation of Blanquero et al., and a significant reduction in training times as well as a slightly better average accuracy compared with the mixed-integer optimization approach of Bertsimas and Dunn. We also report a comparison with the Random Forest ensemble method.


Surgical Robot Transformer (SRT): Imitation Learning for Surgical Tasks

Kim, Ji Woong, Zhao, Tony Z., Schmidgall, Samuel, Deguet, Anton, Kobilarov, Marin, Finn, Chelsea, Krieger, Axel

arXiv.org Artificial Intelligence

We explore whether surgical manipulation tasks can be learned on the da Vinci robot via imitation learning. However, the da Vinci system presents unique challenges which hinder straight-forward implementation of imitation learning. Notably, its forward kinematics is inconsistent due to imprecise joint measurements, and naively training a policy using such approximate kinematics data often leads to task failure. To overcome this limitation, we introduce a relative action formulation which enables successful policy training and deployment using its approximate kinematics data. A promising outcome of this approach is that the large repository of clinical data, which contains approximate kinematics, may be directly utilized for robot learning without further corrections. We demonstrate our findings through successful execution of three fundamental surgical tasks, including tissue manipulation, needle handling, and knot-tying.


Complex Event Recognition with Symbolic Register Transducers: Extended Technical Report

Alevizos, Elias, Artikis, Alexander, Paliouras, Georgios

arXiv.org Artificial Intelligence

We present a system for Complex Event Recognition (CER) based on automata. While multiple such systems have been described in the literature, they typically suffer from a lack of clear and denotational semantics, a limitation which often leads to confusion with respect to their expressive power. In order to address this issue, our system is based on an automaton model which is a combination of symbolic and register automata. We extend previous work on these types of automata, in order to construct a formalism with clear semantics and a corresponding automaton model whose properties can be formally investigated. We call such automata Symbolic Register Transducers (SRT). We show that SRT are closed under various operators, but are not in general closed under complement and they are not determinizable. However, they are closed under these operations when a window operator, quintessential in Complex Event Recognition, is used. We show how SRT can be used in CER in order to detect patterns upon streams of events, using our framework that provides declarative and compositional semantics, and that allows for a systematic treatment of such automata. For SRT to work in pattern detection, we allow them to mark events from the input stream as belonging to a complex event or not, hence the name "transducers". We also present an implementation of SRT which can perform CER. We compare our SRT-based CER engine against other state-of-the-art CER systems and show that it is both more expressive and more efficient.


Teaching Language Models to Self-Improve by Learning from Language Feedback

Hu, Chi, Hu, Yimin, Cao, Hang, Xiao, Tong, Zhu, Jingbo

arXiv.org Artificial Intelligence

Aligning Large Language Models (LLMs) with human intentions and values is crucial yet challenging. Current methods primarily rely on human preferences, which are costly and insufficient in capturing nuanced feedback expressed in natural language. In this paper, we present Self-Refinement Tuning (SRT), a method that leverages model feedback for alignment, thereby reducing reliance on human annotations. SRT uses a base language model (e.g., Tulu2) to generate initial responses, which are critiqued and refined by a more advanced model (e.g., GPT-4-Turbo). This process enables the base model to self-evaluate and improve its outputs, facilitating continuous learning. SRT further optimizes the model by learning from its self-generated feedback and refinements, creating a feedback loop that promotes model improvement. Our empirical evaluations demonstrate that SRT significantly outperforms strong baselines across diverse tasks and model sizes. When applied to a 70B parameter model, SRT increases the win rate from 9.6\% to 25.8\% on the AlpacaEval 2.0 benchmark, surpassing well-established systems such as GPT-4-0314, Claude 2, and Gemini. Our analysis highlights the crucial role of language feedback in the success of SRT, suggesting potential for further exploration in this direction.


Symmetric Replay Training: Enhancing Sample Efficiency in Deep Reinforcement Learning for Combinatorial Optimization

Kim, Hyeonah, Kim, Minsu, Ahn, Sungsoo, Park, Jinkyoo

arXiv.org Artificial Intelligence

Deep reinforcement learning (DRL) has significantly advanced the field of combinatorial optimization (CO). However, its practicality is hindered by the necessity for a large number of reward evaluations, especially in scenarios involving computationally intensive function assessments. To enhance the sample efficiency, we propose a simple but effective method, called symmetric replay training (SRT), which can be easily integrated into various DRL methods. Our method leverages high-reward samples to encourage exploration of the under-explored symmetric regions without additional online interactions - free. Through replay training, the policy is trained to maximize the likelihood of the symmetric trajectories of discovered high-rewarded samples. Experimental results demonstrate the consistent improvement of our method in sample efficiency across diverse DRL methods applied to real-world tasks, such as molecular optimization and hardware design.


Automated speech audiometry: Can it work using open-source pre-trained Kaldi-NL automatic speech recognition?

Araiza-Illan, Gloria, Meyer, Luke, Truong, Khiet P., Baskent, Deniz

arXiv.org Artificial Intelligence

A practical speech audiometry tool is the digits-in-noise (DIN) test for hearing screening of populations of varying ages and hearing status. The test is usually conducted by a human supervisor (e.g., clinician), who scores the responses spoken by the listener, or online, where a software scores the responses entered by the listener. The test has 24 digit-triplets presented in an adaptive staircase procedure, resulting in a speech reception threshold (SRT). We propose an alternative automated DIN test setup that can evaluate spoken responses whilst conducted without a human supervisor, using the open-source automatic speech recognition toolkit, Kaldi-NL. Thirty self-reported normal-hearing Dutch adults (19-64 years) completed one DIN+Kaldi-NL test. Their spoken responses were recorded, and used for evaluating the transcript of decoded responses by Kaldi-NL. Study 1 evaluated the Kaldi-NL performance through its word error rate (WER), percentage of summed decoding errors regarding only digits found in the transcript compared to the total number of digits present in the spoken responses. Average WER across participants was 5.0% (range 0 - 48%, SD = 8.8%), with average decoding errors in three triplets per participant. Study 2 analysed the effect that triplets with decoding errors from Kaldi-NL had on the DIN test output (SRT), using bootstrapping simulations. Previous research indicated 0.70 dB as the typical within-subject SRT variability for normal-hearing adults. Study 2 showed that up to four triplets with decoding errors produce SRT variations within this range, suggesting that our proposed setup could be feasible for clinical applications.


RePAST: Relative Pose Attention Scene Representation Transformer

Safin, Aleksandr, Duckworth, Daniel, Sajjadi, Mehdi S. M.

arXiv.org Artificial Intelligence

The Scene Representation Transformer (SRT) is a recent method to render novel views at interactive rates. Since SRT uses camera poses with respect to an arbitrarily chosen reference camera, it is not invariant to the order of the input views. As a result, SRT is not directly applicable to large-scale scenes where the reference frame would need to be changed regularly. In this work, we propose Relative Pose Attention SRT (RePAST): Instead of fixing a reference frame at the input, we inject pairwise relative camera pose information directly into the attention mechanism of the Transformers. This leads to a model that is by definition invariant to the choice of any global reference frame, while still retaining the full capabilities of the original method. Empirical results show that adding this invariance to the model does not lead to a loss in quality. We believe that this is a step towards applying fully latent transformer-based rendering methods to large-scale scenes.