Goto

Collaborating Authors

 supervisor


Jeffrey Epstein's Ties to CBP Agents Sparked a DOJ Probe

WIRED

Documents say customs officers in the US Virgin Islands had friendly relationships with Epstein years after his 2008 conviction, showing how the infamous sex offender tried to cultivate allies. United States prosecutors and federal law enforcement spent over a year examining ties between Jeffrey Epstein and Customs and Border Protection officers stationed in the US Virgin Islands (USVI), according to documents recently released by the Department of Justice. As The Guardian and New York Times have reported, emails, text messages, and investigative records show that Epstein cultivated friendships with several officers, entertaining them on his island and offering to take them for whale-watching trips in his helicopter. He even brought one cannolis for Christmas Eve. In turn, Epstein would bring certain officers his complaints about his treatment at the hands of other CBP and federal agents.


Comparing AI Agents to Cybersecurity Professionals in Real-World Penetration Testing

Lin, Justin W., Jones, Eliot Krzysztof, Jasper, Donovan Julian, Ho, Ethan Jun-shen, Wu, Anna, Yang, Arnold Tianyi, Perry, Neil, Zou, Andy, Fredrikson, Matt, Kolter, J. Zico, Liang, Percy, Boneh, Dan, Ho, Daniel E.

arXiv.org Artificial Intelligence

We present the first comprehensive evaluation of AI agents against human cybersecurity professionals in a live enterprise environment. We evaluate ten cybersecurity professionals alongside six existing AI agents and ARTEMIS, our new agent scaffold, on a large university network consisting of ~8,000 hosts across 12 subnets. ARTEMIS is a multi-agent framework featuring dynamic prompt generation, arbitrary sub-agents, and automatic vulnerability triaging. In our comparative study, ARTEMIS placed second overall, discovering 9 valid vulnerabilities with an 82% valid submission rate and outperforming 9 of 10 human participants. While existing scaffolds such as Codex and CyAgent underperformed relative to most human participants, ARTEMIS demonstrated technical sophistication and submission quality comparable to the strongest participants. We observe that AI agents offer advantages in systematic enumeration, parallel exploitation, and cost -- certain ARTEMIS variants cost $18/hour versus $60/hour for professional penetration testers. We also identify key capability gaps: AI agents exhibit higher false-positive rates and struggle with GUI-based tasks.


FPC-VLA: A Vision-Language-Action Framework with a Supervisor for Failure Prediction and Correction

Yang, Yifan, Duan, Zhixiang, Xie, Tianshi, Cao, Fuyu, Shen, Pinxi, Song, Peili, Jin, Piaopiao, Sun, Guokang, Xu, Shaoqing, You, Yangwei, Liu, Jingtai

arXiv.org Artificial Intelligence

Robotic manipulation is a fundamental component of automation. However, traditional perception-planning pipelines often fall short in open-ended tasks due to limited flexibility, while the architecture of a single end-to-end Vision-Language-Action (VLA) offers promising capabilities but lacks crucial mechanisms for anticipating and recovering from failure. To address these challenges, we propose FPC-VLA, a dual-model framework that integrates VLA with a supervisor for failure prediction and correction. The supervisor evaluates action viability through vision-language queries and generates corrective strategies when risks arise, trained efficiently without manual labeling. A dual-stream fusion module further refines actions by leveraging past predictions. Evaluation results on multiple simulation platforms (SIMPLER and LIBERO) and robot embodiments (WidowX, Google Robot, Franka) show that FPC-VLA outperforms state-of-the-art models in both zero-shot and fine-tuned settings. Successful real-world deployments on diverse, long-horizon tasks confirm FPC-VLA's strong generalization and practical utility for building more reliable autonomous systems.


CLIPPan: Adapting CLIP as A Supervisor for Unsupervised Pansharpening

Jian, Lihua, Liu, Jiabo, Wu, Shaowu, Chen, Lihui

arXiv.org Artificial Intelligence

Despite remarkable advancements in supervised pansharpening neural networks, these methods face domain adaptation challenges of resolution due to the intrinsic disparity between simulated reduced-resolution training data and real-world full-resolution scenarios.To bridge this gap, we propose an unsupervised pansharpening framework, CLIPPan, that enables model training at full resolution directly by taking CLIP, a visual-language model, as a supervisor. However, directly applying CLIP to supervise pansharpening remains challenging due to its inherent bias toward natural images and limited understanding of pansharpening tasks. Therefore, we first introduce a lightweight fine-tuning pipeline that adapts CLIP to recognize low-resolution multispectral, panchromatic, and high-resolution multispectral images, as well as to understand the pansharpening process. Then, building on the adapted CLIP, we formulate a novel \textit{loss integrating semantic language constraints}, which aligns image-level fusion transitions with protocol-aligned textual prompts (e.g., Wald's or Khan's descriptions), thus enabling CLIPPan to use language as a powerful supervisory signal and guide fusion learning without ground truth. Extensive experiments demonstrate that CLIPPan consistently improves spectral and spatial fidelity across various pansharpening backbones on real-world datasets, setting a new state of the art for unsupervised full-resolution pansharpening.



Teachable Reinforcement Learning via Advice Distillation

Neural Information Processing Systems

"), and what sub-goals to accomplish ("pick up the yellow ball"), offering We then describe an algorithmic framework for learning in CAMDPs via alternating advice grounding and distillation phases. "place the yellow ball in the green box and the blue key in the green box" or "open all doors in In multi-task RL, a learner's objective is produce a policy For instance, the agent in Fig 3 can leverage hints "go left" or "move towards the blue key" to guide In the grounding phase, agents learn how to interpret coaching.


ProST: Progressive Sub-task Training for Pareto-Optimal Multi-agent Systems Using Small Language Models

Bijoy, Biddut Sarker, Hasan, Mohammad Saqib, Alipoormolabashi, Pegah, Sil, Avirup, Balasubramanian, Aruna, Balasubramanian, Niranjan

arXiv.org Artificial Intelligence

Multi-agent systems with smaller language models (SLMs) present a viable alternative to single agent systems powered by large language models (LLMs) for addressing complex problems. In this work, we study how these alternatives compare in terms of both effectiveness and efficiency. To study this trade-off, we instantiate single and multi-agent systems for the complex problems in the AppWorld environment using different sized language models. We find that difficulties with long-trajectory learning in smaller language models (SLMs) limit their performance. Even when trained for specialized roles, SLMs fail to learn all subtasks effectively. To address this issue, we introduce a simple progressive sub-task training strategy, which introduces new sub-tasks progressively in each training epoch. We find that this novel strategy, analogous to instance level curriculum learning, consistently improves the effectiveness of multi-agents at all configurations. Our Pareto analysis shows that fine-tuned multi-agent systems yield better effectiveness-efficiency trade-offs. Additional ablations and analyses shows the importance of our progressive training strategy and its ability to reduce subtask error rates.


Dynamic Model Selection for Trajectory Prediction via Pairwise Ranking and Meta-Features

Bowen, Lu

arXiv.org Artificial Intelligence

Recent deep trajectory predictors (e.g., Jiang et al., 2023; Zhou et al., 2022) have achieved strong average accuracy but remain unreliable in complex long-tail driving scenarios. These limitations reveal the weakness of the prevailing "one-model-fits-all" paradigm, particularly in safety-critical urban contexts where simpler physics-based models can occasionally outperform advanced networks (Kalman, 1960). To bridge this gap, we propose a dynamic multi-expert gating framework that adaptively selects the most reliable trajectory predictor among a physics-informed LSTM, a Transformer, and a fine-tuned GameFormer on a per-sample basis. Our method leverages internal model signals (meta-features) such as stability and uncertainty (Gal and Ghahramani, 2016), which we demonstrate to be substantially more informative than geometric scene descriptors. To the best of our knowledge, this is the first work to formulate trajectory expert selection as a pairwise-ranking problem over internal model signals (Burges et al., 2005), directly optimizing decision quality without requiring post-hoc calibration. Evaluated on the nuPlan-mini dataset (Caesar et al., 2021) with 1,287 samples, our LLM-enhanced tri-expert gate achieves a Final Displacement Error (FDE) of 2.567 m, representing a 9.5 percent reduction over GameFormer (2.835 m), and realizes 57.8 percent of the oracle performance bound. In open-loop simulations, after trajectory horizon alignment, the same configuration reduces FDE on left-turn scenarios by approximately 10 percent, demonstrating consistent improvements across both offline validation and open-loop evaluation. These results indicate that adaptive hybrid systems enhance trajectory reliability in safety-critical autonomous driving, providing a practical pathway beyond static single-model paradigms.


Advancing Machine-Generated Text Detection from an Easy to Hard Supervision Perspective

Wu, Chenwang, Cheung, Yiu-ming, Han, Bo, Lian, Defu

arXiv.org Artificial Intelligence

Existing machine-generated text (MGT) detection methods implicitly assume labels as the "golden standard". However, we reveal boundary ambiguity in MGT detection, implying that traditional training paradigms are inexact. Moreover, limitations of human cognition and the superintelligence of detectors make inexact learning widespread and inevitable. To this end, we propose an easy-to-hard enhancement framework to provide reliable supervision under such inexact conditions. Distinct from knowledge distillation, our framework employs an easy supervisor targeting relatively simple longer-text detection tasks (despite weaker capabilities), to enhance the more challenging target detector. Firstly, longer texts targeted by supervisors theoretically alleviate the impact of inexact labels, laying the foundation for reliable supervision. Secondly, by structurally incorporating the detector into the supervisor, we theoretically model the supervisor as a lower performance bound for the detector. Thus, optimizing the supervisor indirectly optimizes the detector, ultimately approximating the underlying "golden" labels. Extensive experiments across diverse practical scenarios, including cross-LLM, cross-domain, mixed text, and paraphrase attacks, demonstrate the framework's significant detection effectiveness. The code is available at: https://github.com/tmlr-group/Easy2Hard.


Nancy Mace Curses, Berates Confused Cops in Airport Meltdown: Police Report

WIRED

At an airport in South Carolina on Thursday, US representative Nancy Mace called police officers "fucking incompetent" and berated them repeatedly, according to an incident report. Nancy Mace, the South Carolina Republican congresswoman, unleashed a tirade against law enforcement at the Charleston International Airport on Thursday, WIRED has learned. According to an incident report obtained by WIRED under South Carolina's Freedom of Information Act, Mace cursed at police officers, making repeated derogatory comments toward them. The report says that a Transportation Security Administration (TSA) supervisor told officers that Mace had treated their staff similarly and that they would be reporting her to their superiors. According to the report, officers with the Charleston County Aviation Authority Police Department were tasked with meeting Mace at 6:30 am to escort her from the curb to her flight and had been told that she would be arriving in a white BMW at the ticketing curb area.