Goto

Collaborating Authors

 Agents


Adobe's new AI agent can show you how to use Photoshop

Engadget

You open the program after a long break to edit an image, but this being Photoshop we're talking about, there are about five different ways to complete the task before you and you can't quite remember the way you learned to do it. Adobe is trying to make it easier to use its flagship app with the introduction of a built-in AI agent that can navigate Photoshop and complete tasks for users. Users can access the tool from the redesigned Actions panel. If you've used an AI chat bot before, the interface will be familiar. There's a text box for users to input what they want the agent to do for them, with a list of suggested prompts above.


Cognitive Silicon: An Architectural Blueprint for Post-Industrial Computing Systems

arXiv.org Artificial Intelligence

Autonomous AI systems reveal foundational limitations in deterministic, human-authored computing architectures. This paper presents Cognitive Silicon: a hypothetical full-stack architectural framework projected toward 2035, exploring a possible trajectory for cognitive computing system design. The proposed architecture would integrate symbolic scaffolding, governed memory, runtime moral coherence, and alignment-aware execution across silicon-to-semantics layers. Our design grammar has emerged from dialectical co-design with LLMs under asymmetric epistemic conditions--creating structured friction to expose blind spots and trade-offs. The envisioned framework would establish mortality as a natural consequence of physical constraints, non-copyable tacit knowledge, and non-cloneable identity keys as cognitive-embodiment primitives. Core tensions (trust/agency, scaffolding/emergence, execution/governance) would function as central architectural pressures rather than edge cases. The architecture theoretically converges with the Free Energy Principle, potentially offering a formal account of how cognitive systems could maintain identity through prediction error minimization across physical and computational boundaries. The resulting framework aims to deliver a morally tractable cognitive infrastructure that could maintain human-alignment through irreversible hardware constraints and identity-bound epistemic mechanisms resistant to replication or subversion.


The Safety-Privacy Tradeoff in Linear Bandits

arXiv.org Artificial Intelligence

Arghavan Zibaie and Spencer Hutchinson and Ramtin Pedarsani and Mahnoosh Alizadeh University of California Santa Barbara {zibaie,shutchinson,ramtin,alizadeh}@ucsb.edu Abstract --We consider a collection of linear stochastic bandit problems, each modeling the random response of different agents to proposed interventions, coupled together by a global safety constraint. We assume a central coordinator must choose actions to play on each bandit with the objective of regret minimization, while also ensuring that the expected response of all agents satisfies the global safety constraints at each round, in spite of uncertainty about the bandits' parameters. The agents consider their observed responses to be private and in order to protect their sensitive information, the data sharing with the central coordinator is performed under local differential privacy (LDP). However, providing higher level of privacy to different agents would have consequences in terms of safety and regret. We formalize these tradeoffs by building on the notion of the sharpness of the safety set - a measure of how the geometric properties of the safe set affects the growth of regret - and propose a unilaterally unimprovable vector of privacy levels for different agents given a maximum regret budget. I. INTRODUCTION The stochastic linear bandit problem constitutes a sequential decision-making problem wherein, at each round, we observe a response to chosen actions which is a perturbed linear function of the action parameterized by an unknown parameter vector. When applying this tool to safety-critical applications such as health care [1], power systems and transportation [2], [3], the decision-making tasks must operate within certain safety constraints that depend on the unknown bandit parameters, and violations of these constraints can result in adverse events. For example, when sequentially learning how to price electricity to manage the demand of users whose price response is unknown, the price setting entity must ensure that the resulting demands do not violate electric distribution system constraints from day one, in spite of its uncertainty about user responses. As a result, variants of the linear stochastic bandit problem with constraints have been studied in the literature, e.g., [4]-[6]. In many such safety-critical applications, however, a central challenge lies in the fact that users might consider their responses to the central coordinator's interventions to be private information. This can further complicate the task of learning optimal interventions while keeping the system safe at all rounds of the learning process. To formalize this challenge mathematically, we consider a collection of linear stochastic bandit problems whose responses are tied together through a global safety constraint, coupling what actions can be safely played on each bandit.


Towards a Distributed Federated Learning Aggregation Placement using Particle Swarm Intelligence

arXiv.org Artificial Intelligence

--Federated learning has become a promising distributed learning concept with extra insurance on data privacy. Extensive studies on various models of Federated learning have been done since the coinage of its term. One of the important derivatives of federated learning is hierarchical semi-decentralized federated learning, which distributes the load of the aggregation task over multiple nodes and parallelizes the aggregation workload at the breadth of each level of the hierarchy. V arious methods have also been proposed to perform inter-cluster and intra-cluster aggregation optimally. Most of the solutions nonetheless require monitoring the nodes' performance and resource consumption at each round, which necessitates frequently exchanging systematic data. T o optimally perform distributed aggregation in SDFL with minimal reliance on systematic data, we propose Flag-Swap, a Particle Swarm Optimization (PSO) method that optimizes the aggregation placement according only to the processing delay. Our simulation results show that PSO-based placement can find the optimal placement relatively fast, even in scenarios with many clients as candidates for aggregation. Our real-world docker-based implementation of Flag-Swap over the recently emerged FL framework shows superior performance compared to black-box-based deterministic placement strategies, with about 43% minutes faster than random placement, and 32% minutes faster than uniform placement, in terms of total processing time. Index T erms --Distributed Systems, Federated Learning, Aggregation, T ask Placement, Swarm Intelligence, Black-box Optimization I.


SOTOPIA-S4: a user-friendly system for flexible, customizable, and large-scale social simulation

arXiv.org Artificial Intelligence

Social simulation through large language model (LLM) agents is a promising approach to explore and validate hypotheses related to social science questions and LLM agents behavior. We present SOTOPIA-S4, a fast, flexible, and scalable social simulation system that addresses the technical barriers of current frameworks while enabling practitioners to generate multi-turn and multi-party LLM-based interactions with customizable evaluation metrics for hypothesis testing. SOTOPIA-S4 comes as a pip package that contains a simulation engine, an API server with flexible RESTful APIs for simulation management, and a web interface that enables both technical and non-technical users to design, run, and analyze simulations without programming. We demonstrate the usefulness of SOTOPIA-S4 with two use cases involving dyadic hiring negotiation and multi-party planning scenarios.


Tinker Tales: Interactive Storytelling Framework for Early Childhood Narrative Development and AI Literacy

arXiv.org Artificial Intelligence

This paper presents Tinker Tales, an interactive storytelling framework in the format of a board game, designed to support both narrative development and AI literacy in early childhood. The framework integrates tangible and speech-based interactions with AI through NFC chip-attached pawns and tokens, along with a speaker and microphone. Children select and define key story elements-such as characters, places, items, and emotions-using the pawns and tokens, providing further details to the AI and receiving proper assistance, similar to how adults prompt AI for specific tasks (e.g., writing). For evaluation, several game sessions were simulated with a child AI agent, and the quality and safety of the generated stories were assessed from various perspectives. This work highlights the potential of combining physical and digital elements in AI literacy, offering a safe and engaging way for children to learn how to effectively collaborate with AI.


First autonomous AI agent is here, but is it worth the risks?

FOX News

"The Big Weekend Show" analyzes the possibilities of artificial intelligence when it comes to influencing voters. If you haven't heard the buzz about Manus yet, it's the new AI model unveiled by a Singapore-based company called Butterfly Effect. It's one of the first truly autonomous AI agents, able to do its own research, make decisions and even carry out plans, all with barely any human oversight. But here's the thing: While all this innovation opens up exciting possibilities, it also brings some serious privacy and security questions. Whether you're eager to try out the latest AI or you'd rather steer clear, it's worth understanding what Manus could mean for your personal data and digital safety.


Guiding VLM Agents with Process Rewards at Inference Time for GUI Navigation

arXiv.org Artificial Intelligence

Recent advancements in visual language models (VLMs) have notably enhanced their capabilities in handling complex Graphical User Interface (GUI) interaction tasks. Despite these improvements, current frameworks often struggle to generate correct actions in challenging GUI environments. State-of-the-art commercial VLMs are black-boxes, and fine-tuning open-source VLMs for GUI tasks requires significant resources. Additionally, existing trajectory-level evaluation and refinement techniques frequently fall short due to delayed feedback and local optimization issues. To address these challenges, we propose an approach that guides VLM agents with process supervision by a reward model during GUI navigation and control at inference time. This guidance allows the VLM agent to optimize actions at each inference step, thereby improving performance in both static and dynamic environments. In particular, our method demonstrates significant performance gains in three GUI navigation tasks, achieving a 3.4% improvement in single step action accuracy for static environments, along with a around 33% increase in task success rate in one dynamic environment. With further integration of trajectory reflection and retry mechanisms, we also demonstrate even greater enhancement in task success.


Dynamic Intent Queries for Motion Transformer-based Trajectory Prediction

arXiv.org Artificial Intelligence

Personal use of this material is permitted. Abstract -- In autonomous driving, accurately predicting the movements of other traffic participants is crucial, as it significantly influences a vehicle's planning processes. Modern trajectory prediction models strive to interpret complex patterns and dependencies from agent and map data. The Motion Transformer (MTR) architecture and subsequent work define the most accurate methods in common benchmarks such as the Waymo Open Motion Benchmark. The MTR model employs pre-generated static intention points as initial goal points for trajectory prediction. However, the static nature of these points frequently leads to misalignment with map data in specific traffic scenarios, resulting in unfeasible or unrealistic goal points. This adaptation of the MTR model was trained and evaluated on the Waymo Open Motion Dataset. Our findings demonstrate that incorporating dynamic intention points has a significant positive impact on trajectory prediction accuracy, especially for predictions over long time horizons. Furthermore, we analyze the impact on ground truth trajectories which are not compliant with the map data or are illegal maneuvers. Trajectory prediction is crucial for modern autonomous driving systems. It forms a deeper understanding of how other traffic participants will move in the future, which is the basis for subsequent motion planning of the autonomous vehicle.


DianJin-R1: Evaluating and Enhancing Financial Reasoning in Large Language Models

arXiv.org Artificial Intelligence

Effective reasoning remains a core challenge for large language models (LLMs) in the financial domain, where tasks often require domain-specific knowledge, precise numerical calculations, and strict adherence to compliance rules. We propose DianJin-R1, a reasoning-enhanced framework designed to address these challenges through reasoning-augmented supervision and reinforcement learning. Central to our approach is DianJin-R1-Data, a high-quality dataset constructed from CFLUE, FinQA, and a proprietary compliance corpus (Chinese Compliance Check, CCC), combining diverse financial reasoning scenarios with verified annotations. Our models, DianJin-R1-7B and DianJin-R1-32B, are fine-tuned from Qwen2.5-7B-Instruct and Qwen2.5-32B-Instruct using a structured format that generates both reasoning steps and final answers. To further refine reasoning quality, we apply Group Relative Policy Optimization (GRPO), a reinforcement learning method that incorporates dual reward signals: one encouraging structured outputs and another rewarding answer correctness. We evaluate our models on five benchmarks: three financial datasets (CFLUE, FinQA, and CCC) and two general reasoning benchmarks (MATH-500 and GPQA-Diamond). Experimental results show that DianJin-R1 models consistently outperform their non-reasoning counterparts, especially on complex financial tasks. Moreover, on the real-world CCC dataset, our single-call reasoning models match or even surpass the performance of multi-agent systems that require significantly more computational cost. These findings demonstrate the effectiveness of DianJin-R1 in enhancing financial reasoning through structured supervision and reward-aligned learning, offering a scalable and practical solution for real-world applications.