AITopics | execution environment

Collaborating Authors

execution environment

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

bench Goes Live

Neural Information Processing SystemsJun-22-2026, 22:17:37 GMT

The issue-resolving task, where a model generates patches to fix real-world bugs, has emerged as a key benchmark for evaluating the capabilities of large language models (LLMs). While SWE-bench has become the dominant benchmark in this domain, it suffers from several limitations: it has not been updated since its release, is restricted to only 12 repositories, and relies heavily on manual effort for constructing test instances and setting up executable environments, significantly limiting its scalability. We present SWE-bench-Live3, a live-updatable benchmark designed to address these limitations. SWE-bench-Live currently includes 1,890 tasks derived from real GitHub issues created since 2024, spanning 223 repositories. Each task is accompanied by a dedicated Docker image to ensure reproducible execution. Additionally, we introduce an automated curation pipeline that streamlines the entire process from instance creation to environment setup, removing manual bottlenecks and enabling scalability and continuous updates. We evaluate a range of state-of-the-art models and agent frameworks on SWE-bench-Live, offering detailed empirical insights into their real-world bug-fixing capabilities. By providing a fresh, diverse, and executable benchmark grounded in live repository activity, SWE-bench-Live supports reliable, large-scale assessment of code LLMs and code agents in realistic development settings.

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

AgentBay: A Hybrid Interaction Sandbox for Seamless Human-AI Intervention in Agentic Systems

Piao, Yun, Min, Hongbo, Su, Hang, Zhang, Leilei, Wang, Lei, Yin, Yue, Wu, Xiao, Xu, Zhejing, Qu, Liwei, Li, Hang, Zeng, Xinxin, Tian, Wei, Yu, Fei, Li, Xiaowei, Jiang, Jiayi, Liu, Tongxu, Tian, Hao, Que, Yufei, Tu, Xiaobing, Suo, Bing, Li, Yuebing, Chen, Xiangting, Zhao, Zeen, Tang, Jiaming, Huang, Wei, Li, Xuguang, Zhao, Jing, Li, Jin, Shen, Jie, Ren, Jinkui, Zhang, Xiantao

arXiv.org Artificial IntelligenceDec-5-2025

The rapid advancement of Large Language Models (LLMs) is catalyzing a shift towards autonomous AI Agents capable of executing complex, multi-step tasks. However, these agents remain brittle when faced with real-world exceptions, making Human-in-the-Loop (HITL) supervision essential for mission-critical applications. In this paper, we present AgentBay, a novel sandbox service designed from the ground up for hybrid interaction. AgentBay provides secure, isolated execution environments spanning Windows, Linux, Android, Web Browsers, and Code interpreters. Its core contribution is a unified session accessible via a hybrid control interface: An AI agent can interact programmatically via mainstream interfaces (MCP, Open Source SDK), while a human operator can, at any moment, seamlessly take over full manual control. This seamless intervention is enabled by Adaptive Streaming Protocol (ASP). Unlike traditional VNC/RDP, ASP is specifically engineered for this hybrid use case, delivering an ultra-low-latency, smoother user experience that remains resilient even in weak network environments. It achieves this by dynamically blending command-based and video-based streaming, adapting its encoding strategy based on network conditions and the current controller (AI or human). Our evaluation demonstrates strong results in security, performance, and task completion rates. In a benchmark of complex tasks, the AgentBay (Agent + Human) model achieved more than 48% success rate improvement. Furthermore, our ASP protocol reduces bandwidth consumption by up to 50% compared to standard RDP, and in end-to-end latency with around 5% reduction, especially under poor network conditions. We posit that AgentBay provides a foundational primitive for building the next generation of reliable, human-supervised autonomous systems.

artificial intelligence, large language model, natural language, (13 more...)

arXiv.org Artificial Intelligence

2512.04367

Genre:

Research Report (0.82)
Overview (0.68)

Industry:

Information Technology > Security & Privacy (1.00)
Leisure & Entertainment (0.68)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Quantum-like Coherence Derived from the Interaction between Chemical Reaction and Its Environment

Gunji, Yukio-Pegio, Adamatzky, Andrew, Mougkogiannis, Panagiotis, Khrenikov, Andrei

arXiv.org Artificial IntelligenceSep-3-2025

By uncovering the contrast between Artificial Intelligence and Natural-born Intelligence as a computational process, we define closed computing and open computing, and implement open computing within chemical reactions. This involves forming a mixture and invalidation of the computational process and the execution environment, which are logically distinct, and coalescing both to create a system that adjusts fluctuations. We model chemical reactions by considering the computation as the chemical reaction and the execution environment as the degree of aggregation of molecules that interact with the reactive environment. This results in a chemical reaction that progresses while repeatedly clustering and de-clustering, where concentration no longer holds significant meaning. Open computing is segmented into Token computing, which focuses on the individual behavior of chemical molecules, and Type computing, which focuses on normative behavior. Ultimately, both are constructed as an interplay between the two. In this system, Token computing demonstrates self-organizing critical phenomena, while Type computing exhibits quantum logic. Through their interplay, the recruitment of fluctuations is realized, giving rise to interactions between quantum logical subspaces corresponding to quantum coherence across different Hilbert spaces. As a result, spike waves are formed, enabling signal transmission. This occurrence may be termed quantum-like coherence, implying the source of enzymes responsible for controlling spike waves and biochemical rhythms.

artificial intelligence, computing, molecule, (16 more...)

arXiv.org Artificial Intelligence

2509.01021

Country:

Europe > United Kingdom > England (0.28)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.28)

Genre: Research Report (0.50)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)

Add feedback

SWE-smith: Scaling Data for Software Engineering Agents

Yang, John, Lieret, Kilian, Jimenez, Carlos E., Wettig, Alexander, Khandpur, Kabir, Zhang, Yanzhe, Hui, Binyuan, Press, Ofir, Schmidt, Ludwig, Yang, Diyi

arXiv.org Artificial IntelligenceAug-13-2025

Despite recent progress in Language Models (LMs) for software engineering, collecting training data remains a significant pain point. Existing datasets are small, with at most 1,000s of training instances from 11 or fewer GitHub repositories. The procedures to curate such datasets are often complex, necessitating hundreds of hours of human labor; companion execution environments also take up several terabytes of storage, severely limiting their scalability and usability. To address this pain point, we introduce SWE-smith, a novel pipeline for generating software engineering training data at scale. Given any Python codebase, SWE-smith constructs a corresponding execution environment, then automatically synthesizes 100s to 1,000s of task instances that break existing test(s) in the codebase. Using SWE-smith, we create a dataset of 50k instances sourced from 128 GitHub repositories, an order of magnitude larger than all previous works. We train SWE-agent-LM-32B, achieving 40.2% Pass@1 resolve rate on the SWE-bench Verified benchmark, state of the art among open source models. We open source SWE-smith (collection procedure, task instances, trajectories, models) to lower the barrier of entry for research in LM systems for automated software engineering. All assets available at https://swesmith.com.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2504.21798

Genre: Research Report (0.83)

Industry:

Education (0.68)
Information Technology (0.67)

Technology:

Information Technology > Software Engineering (1.00)
Information Technology > Software (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

Agnostics: Learning to Code in Any Programming Language via Reinforcement with a Universal Learning Environment

Boruch-Gruszecki, Aleksander, Zi, Yangtian, Wu, Zixuan, Oberoi, Tejas, Anderson, Carolyn Jane, Biswas, Joydeep, Guha, Arjun

arXiv.org Artificial IntelligenceAug-8-2025

Large language models (LLMs) already excel at writing code in high-resource languages such as Python and JavaScript, yet stumble on low-resource languages that remain essential to science and engineering. Besides the obvious shortage of pre-training data, post-training itself is a bottleneck: every new language seems to require new datasets, test harnesses, and reinforcement-learning (RL) infrastructure. We introduce Agnostics, a language-agnostic post-training pipeline that eliminates this per-language engineering. The key idea is to judge code solely by its externally observable behavior, so a single verifier can test solutions written in any language. Concretely, we (i) use an LLM to rewrite existing unit-test datasets into an I/O format, (ii) supply a short configuration that tells the verifier how to compile and run a target language, and (iii) apply reinforcement learning with verifiable rewards (RLVR) in a robust code execution environment. Applied to five low-resource languages--Lua, Julia, R, OCaml, and Fortran--Agnostics (1) improves Qwen-3 4B to performance that rivals other 16B-70B open-weight models; (2) scales cleanly to larger and diverse model families (Qwen-3 8B, DeepSeek Coder 6.7B Instruct, Phi 4 Mini); and (3) for ${\le} 16$B parameter models, sets new state-of-the-art pass@1 results on MultiPL-E and a new multi-language version LiveCodeBench that we introduce. We will release the language-agnostic training datasets (Ag-MBPP-X, Ag-Codeforces-X, Ag-LiveCodeBench-X), training code, and ready-to-use configurations, making RL post-training in any programming language as simple as editing a short YAML file.

large language model, machine learning, programming language, (20 more...)

arXiv.org Artificial Intelligence

2508.04865

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.67)

Industry: Government > Regional Government > North America Government > United States Government (0.68)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

SWE-bench Goes Live!

Zhang, Linghao, He, Shilin, Zhang, Chaoyun, Kang, Yu, Li, Bowen, Xie, Chengxing, Wang, Junhao, Wang, Maoquan, Huang, Yufan, Fu, Shengyu, Nallipogu, Elsie, Lin, Qingwei, Dang, Yingnong, Rajmohan, Saravan, Zhang, Dongmei

arXiv.org Artificial IntelligenceJun-3-2025

The issue-resolving task, where a model generates patches to fix real-world bugs, has emerged as a critical benchmark for evaluating the capabilities of large language models (LLMs). While SWE-bench and its variants have become standard in this domain, they suffer from key limitations: they have not been updated since their initial releases, cover a narrow set of repositories, and depend heavily on manual effort for instance construction and environment setup. These factors hinder scalability and introduce risks of overfitting and data contamination. In this work, we present SWE-bench-Live, a live-updatable benchmark designed to overcome these challenges. Our initial release consists of 1,319 tasks derived from real GitHub issues created since 2024, spanning 93 repositories. Each task is accompanied by a dedicated Docker image to ensure reproducible execution. Central to our benchmark is \method, an automated curation pipeline that streamlines the entire process from instance creation to environment setup, removing manual bottlenecks and enabling scalability and continuous updates. We evaluate a range of state-of-the-art agent frameworks and LLMs on SWE-bench-Live, revealing a substantial performance gap compared to static benchmarks like SWE-bench, even under controlled evaluation conditions. To better understand this discrepancy, we perform detailed analyses across repository origin, issue recency, and task difficulty. By providing a fresh, diverse, and executable benchmark grounded in live repository activity, SWE-bench-Live facilitates rigorous, contamination-resistant evaluation of LLMs and agents in dynamic, real-world software development settings.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2505.23419

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.68)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

The Next Frontier of LLM Applications: Open Ecosystems and Hardware Synergy

Hou, Xinyi, Zhao, Yanjie, Wang, Haoyu

arXiv.org Artificial IntelligenceMar-6-2025

The second paradigm involves LLM agents developed using frameworks like LangChain [16], AutoGPT [11], Langroid [18], AutoGen [23], and LlamaIndex [22], which offer greater programmability and modularity, allowing developers to build sophisticated, multi-agent systems that integrate external tools and dynamic workflows [20]. Despite their advantages, both paradigms remain architecturally fragmented and lack standardized interoperability, leading to redundant development efforts and constrained scalability. From a software engineering (SE) perspective, current LLM application paradigms resemble traditional platform-centric software ecosystems, where applications are tightly coupled to proprietary APIs and execution environments. LLM app stores, while lowering the barrier to entry, impose constraints on extensibility and cross-platform interoperability, leading to vendor lock-in and duplicated development efforts across different ecosystems. In contrast, agent-based LLM frameworks provide modularity but lack standardized mechanisms for component reuse and integration, making it challenging to compose LLM applications that seamlessly operate across heterogeneous environments. This fragmentation mirrors historical challenges in SE, where monolithic architectures have given way to service-oriented and microservices-based designs to improve reusability, scalability, and maintainability. Another key limitation of existing LLM applications is inefficient hardware utilization.

application, architecture, llm application, (16 more...)

arXiv.org Artificial Intelligence

2503.04596

Country:

Asia > China > Hubei Province > Wuhan (0.05)
Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Overview (0.46)
Research Report (0.40)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

NDAI Agreements

Stephenson, Matthew, Miller, Andrew, Sun, Xyn, Annem, Bhargav, Parikh, Rohan

arXiv.org Artificial IntelligenceFeb-11-2025

We study a fundamental challenge in the economics of innovation: an inventor must reveal details of a new idea to secure compensation or funding, yet such disclosure risks expropriation. We present a model in which a seller (inventor) and buyer (investor) bargain over an information good under the threat of hold-up. In the classical setting, the seller withholds disclosure to avoid misappropriation, leading to inefficiency. We show that trusted execution environments (TEEs) combined with AI agents can mitigate and even fully eliminate this hold-up problem. By delegating the disclosure and payment decisions to tamper-proof programs, the seller can safely reveal the invention without risking expropriation, achieving full disclosure and an efficient ex post transfer. Moreover, even if the invention's value exceeds a threshold that TEEs can fully secure, partial disclosure still improves outcomes compared to no disclosure. Recognizing that real AI agents are imperfect, we model "agent errors" in payments or disclosures and demonstrate that budget caps and acceptance thresholds suffice to preserve most of the efficiency gains. Our results imply that cryptographic or hardware-based solutions can function as an "ironclad NDA," substantially mitigating the fundamental disclosure-appropriation paradox first identified by Arrow (1962) and Nelson (1959). This has far-reaching policy implications for fostering R&D, technology transfer, and collaboration.

agent, artificial intelligence, disclosure, (17 more...)

arXiv.org Artificial Intelligence

2502.07924

Country: North America > United States > Illinois > Cook County > Chicago (0.04)

Genre: Research Report > New Finding (0.34)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Banking & Finance (0.93)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.70)

Add feedback

A Collaborative Multi-Agent Approach to Retrieval-Augmented Generation Across Diverse Data

Salve, Aniruddha, Attar, Saba, Deshmukh, Mahesh, Shivpuje, Sayali, Utsab, Arnab Mitra

arXiv.org Artificial IntelligenceDec-8-2024

Retrieval-Augmented Generation (RAG) enhances Large Language Models (LLMs) by incorporating external, domain-specific data into the generative process. While LLMs are highly capable, they often rely on static, pre-trained datasets, limiting their ability to integrate dynamic or private data. Traditional RAG systems typically use a single-agent architecture to handle query generation, data retrieval, and response synthesis. However, this approach becomes inefficient when dealing with diverse data sources, such as relational databases, document stores, and graph databases, often leading to performance bottlenecks and reduced accuracy. This paper proposes a multi-agent RAG system to address these limitations. Specialized agents, each optimized for a specific data source, handle query generation for relational, NoSQL, and document-based systems. These agents collaborate within a modular framework, with query execution delegated to an environment designed for compatibility across various database types. This distributed approach enhances query efficiency, reduces token overhead, and improves response accuracy by ensuring that each agent focuses on its specialized task. The proposed system is scalable and adaptable, making it ideal for generative AI workflows that require integration with diverse, dynamic, or private data sources. By leveraging specialized agents and a modular execution environment, the system provides an efficient and robust solution for handling complex, heterogeneous data environments in generative AI applications.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2412.05838

Country:

Asia > India > Maharashtra > Pune (0.05)
Asia > Malaysia > Kuala Lumpur > Kuala Lumpur (0.04)
Asia > Bangladesh > Dhaka Division > Dhaka District > Dhaka (0.04)

Genre:

Research Report (0.64)
Overview (0.46)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.56)

Add feedback

Asynchronous Tool Usage for Real-Time Agents

Ginart, Antonio A., Kodali, Naveen, Lee, Jason, Xiong, Caiming, Savarese, Silvio, Emmons, John

arXiv.org Artificial IntelligenceOct-28-2024

While frontier large language models (LLMs) are capable tool-using agents, current AI systems still operate in a strict turn-based fashion, oblivious to passage of time. This synchronous design forces user queries and tool-use to occur sequentially, preventing the systems from multitasking and reducing interactivity. To address this limitation, we introduce asynchronous AI agents capable of parallel processing and real-time tool-use. Our key contribution is an event-driven finite-state machine architecture for agent execution and prompting, integrated with automatic speech recognition and text-to-speech. Drawing inspiration from the concepts originally developed for real-time operating systems, this work presents both a conceptual framework and practical tools for creating AI agents capable of fluid, multitasking interactions.

large language model, machine learning, real time system, (21 more...)

arXiv.org Artificial Intelligence

2410.2162

Country: Europe > Slovenia > Central Slovenia > Municipality of Komenda > Komenda (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback