Goto

Collaborating Authors

 inventory





ICE Is Using Palantir's AI Tools to Sort Through Tips

WIRED

ICE Is Using Palantir's AI Tools to Sort Through Tips ICE has been using an AI-powered Palantir system to summarize tips sent to its tip line since last spring, according to a newly released Homeland Security document. United States Immigration and Customs Enforcement is leveraging Palantir's generative artificial intelligence tools to sort and summarize immigration enforcement tips from its public submission form, according to an inventory released Wednesday of all use cases the Department of Homeland Security had for AI in 2025. The AI Enhanced ICE Tip Processing service is intended to help ICE investigators "to more quickly identify and action tips" for urgent cases, as well as translate submissions not made in English, according to the inventory. It also provides a "BLUF," defined as a "high-level summary of the tip," produced using at least one large language model. BLUF, or "bottom line up front," is a military term that's also used internally by some Palantir employees.


Automated Composition of Agents: A Knapsack Approach for Agentic Component Selection

Yuan, Michelle, Pahwa, Khushbu, Chang, Shuaichen, Kaba, Mustafa, Jiang, Jiarong, Ma, Xiaofei, Zhang, Yi, Sunkara, Monica

arXiv.org Artificial Intelligence

Designing effective agentic systems requires the seamless composition and integration of agents, tools, and models within dynamic and uncertain environments. Most existing methods rely on static, semantic retrieval approaches for tool or agent discovery. However, effective reuse and composition of existing components remain challenging due to incomplete capability descriptions and the limitations of retrieval methods. Component selection suffers because the decisions are not based on capability, cost, and real-time utility. To address these challenges, we introduce a structured, automated framework for agentic system composition that is inspired by the knapsack problem. Our framework enables a composer agent to systematically identify, select, and assemble an optimal set of agentic components by jointly considering performance, budget constraints, and compatibility. By dynamically testing candidate components and modeling their utility in real-time, our approach streamlines the assembly of agentic systems and facilitates scalable reuse of resources. Empirical evaluation with Claude 3.5 Sonnet across five benchmarking datasets shows that our online-knapsack-based composer consistently lies on the Pareto frontier, achieving higher success rates at significantly lower component costs compared to our baselines. In the single-agent setup, the online knapsack composer shows a success rate improvement of up to 31.6% in comparison to the retrieval baselines. In multi-agent systems, the online knapsack composer increases success rate from 37% to 87% when agents are selected from an agent inventory of 100+ agents. The substantial performance gap confirms the robust adaptability of our method across diverse domains and budget constraints.


Learning in Stackelberg Mean Field Games: A Non-Asymptotic Analysis

Zeng, Sihan, Evans, Benjamin Patrick, Bhatt, Sujay, Ardon, Leo, Ganesh, Sumitra, Koppel, Alec

arXiv.org Artificial Intelligence

We study policy optimization in Stackelberg mean field games (MFGs), a hierarchical framework for modeling the strategic interaction between a single leader and an infinitely large population of homogeneous followers. The objective can be formulated as a structured bi-level optimization problem, in which the leader needs to learn a policy maximizing its reward, anticipating the response of the followers. Existing methods for solving these (and related) problems often rely on restrictive independence assumptions between the leader's and followers' objectives, use samples inefficiently due to nested-loop algorithm structure, and lack finite-time convergence guarantees. To address these limitations, we propose AC-SMFG, a single-loop actor-critic algorithm that operates on continuously generated Markovian samples. The algorithm alternates between (semi-)gradient updates for the leader, a representative follower, and the mean field, and is simple to implement in practice. We establish the finite-time and finite-sample convergence of the algorithm to a stationary point of the Stackelberg objective. To our knowledge, this is the first Stackelberg MFG algorithm with non-asymptotic convergence guarantees. Our key assumption is a "gradient alignment" condition, which requires that the full policy gradient of the leader can be approximated by a partial component of it, relaxing the existing leader-follower independence assumption. Simulation results in a range of well-established economics environments demonstrate that AC-SMFG outperforms existing multi-agent and MFG learning baselines in policy quality and convergence speed.


Black Friday 2025 could be your last chance for cheap PC deals, experts warn

PCWorld

When you purchase through links in our articles, we may earn a small commission. AI is causing a DRAM apocalypse and it's affecting the whole PC market this holiday season. This year, Black Friday tech shoppers should heed one important message: Don't wait, buy now. Because certain components are skyrocketing in price--and it's expected to get even worse. DRAM prices, for example, have doubled in little more than a month. AI hyperscalers have snapped up whatever they can buy.


Reinforcement Learning in Queue-Reactive Models: Application to Optimal Execution

Espana, Tomas, Hafsi, Yadh, Lillo, Fabrizio, Vittori, Edoardo

arXiv.org Artificial Intelligence

We investigate the use of Reinforcement Learning for the optimal execution of meta-orders, where the objective is to execute incrementally large orders while minimizing implementation shortfall and market impact over an extended period of time. Departing from traditional parametric approaches to price dynamics and impact modeling, we adopt a model-free, data-driven framework. Since policy optimization requires counterfactual feedback that historical data cannot provide, we employ the Queue-Reactive Model to generate realistic and tractable limit order book simulations that encompass transient price impact, and nonlinear and dynamic order flow responses. Methodologically, we train a Double Deep Q-Network agent on a state space comprising time, inventory, price, and depth variables, and evaluate its performance against established benchmarks. Numerical simulation results show that the agent learns a policy that is both strategic and tactical, adapting effectively to order book conditions and outperforming standard approaches across multiple training configurations. These findings provide strong evidence that model-free Reinforcement Learning can yield adaptive and robust solutions to the optimal execution problem.


Unified Vision-Language-Action Tokenization Enables Open-World Instruction Following Agents Zihao Wang

Neural Information Processing Systems

These additional behavior tokens will be augmented to the vocabulary of pretrained Multimodal Language Models. With this encoder, we then pack long-term multimodal interactions involving task instructions, memories, thoughts, observations, textual responses, behavior trajectories, etc .