fsm
Agent-SAMA: State-Aware Mobile Assistant
Guo, Linqiang, Liu, Wei, Heng, Yi Wen, Tse-Hsun, null, Chen, null, Wang, Yang
Mobile Graphical User Interface (GUI) agents aim to autonomously complete tasks within or across apps based on user instructions. While recent Multimodal Large Language Models (MLLMs) enable these agents to interpret UI screens and perform actions, existing agents remain fundamentally reactive. They reason over the current UI screen but lack a structured representation of the app navigation flow, limiting GUI agents' ability to understand execution context, detect unexpected execution results, and recover from errors. We introduce Agent-SAMA, a state-aware multi-agent framework that models app execution as a Finite State Machine (FSM), treating UI screens as states and user actions as transitions. Agent-SAMA implements four specialized agents that collaboratively construct and use FSMs in real time to guide task planning, execution verification, and recovery. We evaluate Agent-SAMA on two types of benchmarks: cross-app (Mobile-Eval-E, SPA-Bench) and mostly single-app (AndroidWorld). On Mobile-Eval-E, Agent-SAMA achieves an 84.0% success rate and a 71.9% recovery rate. On SPA-Bench, it reaches an 80.0% success rate with a 66.7% recovery rate. Compared to prior methods, Agent-SAMA improves task success by up to 12% and recovery success by 13.8%. On AndroidWorld, Agent-SAMA achieves a 63.7% success rate, outperforming the baselines. Our results demonstrate that structured state modeling enhances robustness and can serve as a lightweight, model-agnostic memory layer for future GUI agents.
- Europe > Italy > Veneto > Venice (0.05)
- North America > United States (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- (2 more...)
- Leisure & Entertainment (1.00)
- Information Technology > Services (0.93)
- Media (0.68)
Hallucination-Resistant, Domain-Specific Research Assistant with Self-Evaluation and Vector-Grounded Retrieval
Bhavsar, Vivek, Ereifej, Joseph, Gurusami, Aravanan
Large language models accelerate literature synthesis but can hallucinate and mis-cite, limiting their usefulness in expert workflows. We present RA-FSM (Research Assistant - Finite State Machine), a modular GPT-based research assistant that wraps generation in a finite-state control loop: Relevance -> Confidence -> Knowledge. The system is grounded in vector retrieval and a deterministic citation pipeline. The controller filters out-of-scope queries, scores answerability, decomposes questions, and triggers retrieval only when needed, and emits answers with confidence labels and in-corpus, de-duplicated references. A ranked-tier ingestion workflow constructs a domain knowledge base from journals, conferences, indices, preprints, and patents, writing both to a dense vector index and to a relational store of normalized metrics. We implement the system for photonics and evaluate it on six task categories: analytical reasoning, numerical analysis, methodological critique, comparative synthesis, factual extraction, and application design. In blinded A/B reviews, domain experts prefer RA-FSM to both a strong Notebook LM (NLM) and a vanilla Default GPT API call single-pass baseline, citing stronger boundary-condition handling and more defensible evidence use. Coverage and novelty analyses indicate that RA-FSM explores beyond the NLM while incurring tunable latency and cost overheads. The design emphasizes transparent, well-cited answers for high-stakes technical work and is generalizable to other scientific domains.
- Europe > Austria > Vienna (0.14)
- North America > United States > California > Santa Clara County > Santa Clara (0.04)
- Workflow (0.86)
- Research Report > New Finding (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
RFSeek and Ye Shall Find
Rotman, Noga H., Ferreira, Tiago, Peleg, Hila, Silberstein, Mark, Silva, Alexandra
Requests for Comments (RFCs) are extensive specification documents for network protocols, but their prose-based format and their considerable length often impede precise operational understanding. We present RFSeek, an interactive tool that automatically extracts visual summaries of protocol logic from RFCs. RFSeek leverages large language models (LLMs) to generate provenance-linked, explorable diagrams, surfacing both official state machines and additional logic found only in the RFC text. Compared to existing RFC visualizations, RFSeek's visual summaries are more transparent and easier to audit against their textual source. We showcase the tool's potential through a series of use cases, including guided knowledge extraction and semantic diffing, applied to protocols such as TCP, QUIC, PPTP, and DCCP. In practice, RFSeek not only reconstructs the RFC diagrams included in some specifications, but, more interestingly, also uncovers important logic such as nodes or edges described in the text but missing from those diagrams. RFSeek further derives new visualization diagrams for complex RFCs, with QUIC as a representative case. Our approach, which we term \emph{Summary Visualization}, highlights a promising direction: combining LLMs with formal, user-customized visualizations to enhance protocol comprehension and support robust implementations.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Asia > Middle East > Israel (0.05)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (3 more...)
Hybrid Autonomy Framework for a Future Mars Science Helicopter
Di Pierno, Luca, Hewitt, Robert, Weiss, Stephan, Brockers, Roland
-- Autonomous aerial vehicles, such as NASA's Ingenuity, enable rapid planetary surface exploration beyond the reach of ground-based robots. Thus, NASA is studying a Mars Science Helicopter (MSH), an advanced concept capable of performing long-range science missions and autonomously navigating challenging Martian terrain. Given significant Earth-Mars communication delays and mission complexity, an advanced autonomy framework is required to ensure safe and efficient operation by continuously adapting behavior based on mission objectives and real-time conditions, without human intervention. This study presents a deterministic high-level control framework for aerial exploration, integrating a Finite State Machine (FSM) with Behavior Trees (BTs) to achieve a scalable, robust, and computationally efficient autonomy solution for critical scenarios like deep space exploration. In this paper we outline key capabilities of a possible MSH and detail the FSM-BT hybrid autonomy framework which orchestrates them to achieve the desired objectives. These inputs trigger state transitions or dynamically adjust behavior execution, enabling reactive and context-aware responses. The framework is middleware-agnostic, supporting integration with systems like F-Prime and extending beyond aerial robotics. Aerial vehicles have revolutionized planetary exploration by enabling access to scientifically valuable but hazardous terrain beyond the reach of ground-based robots. NASA's Ingenuity Mars helicopter demonstrated the feasibility of controlled flight on Mars, completing 72 successful flights despite the planet's thin atmosphere [1], [2]. However, Ingenuity was a technology demonstrator, designed primarily to validate powered flight rather than autonomously execute complex scientific missions.
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > United States > California > Los Angeles County > Pasadena (0.05)
- Europe > United Kingdom > England > Greater Manchester > Manchester (0.04)
- Europe > Austria (0.04)
- Transportation > Air (1.00)
- Government > Space Agency (1.00)
- Government > Regional Government > North America Government > United States Government (1.00)
- Aerospace & Defense > Aircraft (1.00)
MetaAgent: Automatically Constructing Multi-Agent Systems Based on Finite State Machines
Zhang, Yaolun, Liu, Xiaogeng, Xiao, Chaowei
Large Language Models (LLMs) have demonstrated the ability to solve a wide range of practical tasks within multi-agent systems. However, existing human-designed multi-agent frameworks are typically limited to a small set of pre-defined scenarios, while current automated design methods suffer from several limitations, such as the lack of tool integration, dependence on external training data, and rigid communication structures. In this paper, we propose MetaAgent, a finite state machine based framework that can automatically generate a multi-agent system. Given a task description, MetaAgent will design a multi-agent system and polish it through an optimization algorithm. When the multi-agent system is deployed, the finite state machine will control the agent's actions and the state transitions. To evaluate our framework, we conduct experiments on both text-based tasks and practical tasks. The results indicate that the generated multi-agent system surpasses other auto-designed methods and can achieve a comparable performance with the human-designed multi-agent system, which is optimized for those specific tasks.
- Europe > Austria > Vienna (0.14)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- Asia > South Korea (0.04)
- (6 more...)
- Research Report (0.83)
- Workflow (0.68)
CrossPL: Evaluating Large Language Models on Cross Programming Language Code Generation
Xiong, Zhanhang, Wang, Dongxia, Li, Yuekang, An, Xinyuan, Wang, Wenhai
--As large language models (LLMs) become increasingly embedded in software engineering workflows, a critical capability remains underexplored: generating correct code that enables cross-programming-language (CPL) interoperability. This skill is essential for building complex systems that integrate components written in multiple languages via mechanisms like inter-process communication (IPC). T o bridge this gap, we present CrossPL, the first benchmark designed to systematically evaluate LLMs' ability to generate CPL-interoperating code. CrossPL comprises 1,982 tasks centered around IPC, covering six widely-used programming languages and seven representative CPL techniques. We construct this benchmark by (i) analyzing 19,169 multi-language GitHub repositories using 156 hand-crafted finite state machines (FSMs), and (ii) developing an LLM-based pipeline that automatically extracts CPL code snippets, generates task instructions, and validates functional correctness. We evaluate 14 state-of-the-art general-purpose LLMs and 6 code-oriented LLMs released in the past three years on CrossPL via FSM-based validation. Results reveal that even the best-performing models struggle with CPL scenarios, underscoring the need for more targeted research in this space. Our benchmark and code are available at: https://anonymous.4open.science/r/crosspl-2814/. In software system development, the use of multiple programming languages (MPL) has become increasingly common, as it allows developers to exploit the unique strengths of different languages to improve performance, modularity, and scalability [1]-[5].
- Asia > China > Zhejiang Province > Hangzhou (0.04)
- Oceania > Australia > New South Wales > Sydney (0.04)
- Research Report (0.82)
- Workflow (0.68)
Autonomous Control Leveraging LLMs: An Agentic Framework for Next-Generation Industrial Automation
Vyas, Javal, Mercangoz, Mehmet
The increasing complexity of modern chemical processes, coupled with workforce shortages and intricate fault scenarios, demands novel automation paradigms that blend symbolic reasoning with adaptive control. In this work, we introduce a unified agentic framework that leverages large language models (LLMs) for both discrete fault-recovery planning and continuous process control within a single architecture. We adopt Finite State Machines (FSMs) as interpretable operating envelopes: an LLM-driven planning agent proposes recovery sequences through the FSM, a Simulation Agent executes and checks each transition, and a Validator-Reprompting loop iteratively refines invalid plans. In Case Study 1, across 180 randomly generated FSMs of varying sizes (4-25 states, 4-300 transitions), GPT-4o and GPT-4o-mini achieve 100% valid-path success within five reprompts-outperforming open-source LLMs in both accuracy and latency. In Case Study 2, the same framework modulates dual-heater inputs on a laboratory TCLab platform (and its digital twin) to maintain a target average temperature under persistent asymmetric disturbances. Compared to classical PID control, our LLM-based controller attains similar performance, while ablation of the prompting loop reveals its critical role in handling nonlinear dynamics. We analyze key failure modes-such as instruction following lapses and coarse ODE approximations. Our results demonstrate that, with structured feedback and modular agents, LLMs can unify high-level symbolic planningand low-level continuous control, paving the way towards resilient, language-driven automation in chemical engineering.
- Energy (0.93)
- Information Technology > Security & Privacy (0.93)
Guiding LLM-based Smart Contract Generation with Finite State Machine
Luo, Hao, Lin, Yuhao, Yan, Xiao, Hu, Xintong, Wang, Yuxiang, Zeng, Qiming, Wang, Hao, Jiang, Jiawei
Smart contract is a kind of self-executing code based on blockchain technology with a wide range of application scenarios, but the traditional generation method relies on manual coding and expert auditing, which has a high threshold and low efficiency. Although Large Language Models (LLMs) show great potential in programming tasks, they still face challenges in smart contract generation w.r.t. effectiveness and security. To solve these problems, we propose FSM-SCG, a smart contract generation framework based on finite state machine (FSM) and LLMs, which significantly improves the quality of the generated code by abstracting user requirements to generate FSM, guiding LLMs to generate smart contracts, and iteratively optimizing the code with the feedback of compilation and security checks. The experimental results show that FSM-SCG significantly improves the quality of smart contract generation. Compared to the best baseline, FSM-SCG improves the compilation success rate of generated smart contract code by at most 48%, and reduces the average vulnerability risk score by approximately 68%.
- Banking & Finance (1.00)
- Information Technology > Security & Privacy (0.46)
Efficiently Vectorized MCMC on Modern Accelerators
Dance, Hugh, Glaser, Pierre, Orbanz, Peter, Adams, Ryan
With the advent of automatic vectorization tools (e.g., JAX's $\texttt{vmap}$), writing multi-chain MCMC algorithms is often now as simple as invoking those tools on single-chain code. Whilst convenient, for various MCMC algorithms this results in a synchronization problem -- loosely speaking, at each iteration all chains running in parallel must wait until the last chain has finished drawing its sample. In this work, we show how to design single-chain MCMC algorithms in a way that avoids synchronization overheads when vectorizing with tools like $\texttt{vmap}$ by using the framework of finite state machines (FSMs). Using a simplified model, we derive an exact theoretical form of the obtainable speed-ups using our approach, and use it to make principled recommendations for optimal algorithm design. We implement several popular MCMC algorithms as FSMs, including Elliptical Slice Sampling, HMC-NUTS, and Delayed Rejection, demonstrating speed-ups of up to an order of magnitude in experiments.
Can Large Language Models Help Developers with Robotic Finite State Machine Modification?
Gan, Xiangyu Robin, Song, Yuxin Ray, Walker, Nick, Cakmak, Maya
Finite state machines (FSMs) are widely used to manage robot behavior logic, particularly in real-world applications that require a high degree of reliability and structure. However, traditional manual FSM design and modification processes can be time-consuming and error-prone. We propose that large language models (LLMs) can assist developers in editing FSM code for real-world robotic use cases. LLMs, with their ability to use context and process natural language, offer a solution for FSM modification with high correctness, allowing developers to update complex control logic through natural language instructions. Our approach leverages few-shot prompting and language-guided code generation to reduce the amount of time it takes to edit an FSM. To validate this approach, we evaluate it on a real-world robotics dataset, demonstrating its effectiveness in practical scenarios.
- Europe > Netherlands > North Brabant > Eindhoven (0.04)
- North America > United States > Washington > King County > Seattle (0.04)