AITopics | workbench

Collaborating Authors

workbench

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Learning API Functionality from In-Context Demonstrations for Tool-based Agents

Patel, Bhrij, Jagmohan, Ashish, Vempaty, Aditya

arXiv.org Artificial IntelligenceNov-13-2025

Digital tool-based agents, powered by Large Language Models (LLMs), that invoke external Application Programming Interfaces (APIs) often rely on documentation to understand API functionality. However, such documentation is frequently missing, outdated, privatized, or inconsistent-hindering the development of reliable, general-purpose agents. In this work, we propose a new research direction: learning of API functionality directly from in-context demonstrations. This task is a new paradigm applicable in scenarios without documentation. Using API benchmarks, we collect demonstrations from both expert agents and from self-exploration. To understand what information demonstrations must convey for successful task completion, we extensively study how the number of demonstrations and the use of LLM-generated summaries and evaluations affect the task success rate of the API-based agent. Our experiments across 3 datasets and 6 models show that learning functionality from in-context demonstrations remains a non-trivial challenge, even for state-of-the-art LLMs. We find that providing explicit function calls and natural language critiques significantly improves the agent's task success rate due to more accurate parameter filling. We analyze failure modes, identify sources of error, and highlight key open challenges for future work in documentation-free, self-improving, API-based agents.

demonstration, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2505.24197

Genre:

Research Report (0.82)
Workflow (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Unified Work Embeddings: Contrastive Learning of a Bidirectional Multi-task Ranker

De Lange, Matthias, Decorte, Jens-Joris, Van Hautte, Jeroen

arXiv.org Artificial IntelligenceNov-12-2025

Workforce transformation across diverse industries has driven an increased demand for specialized natural language processing capabilities. Nevertheless, tasks derived from work-related contexts inherently reflect real-world complexities, characterized by long-tailed distributions, extreme multi-label target spaces, and scarce data availability. The rise of generalist embedding models prompts the question of their performance in the work domain, especially as progress in the field has focused mainly on individual tasks. To this end, we introduce WorkBench, the first unified evaluation suite spanning six work-related tasks formulated explicitly as ranking problems, establishing a common ground for multi-task progress. Based on this benchmark, we find significant positive cross-task transfer, and use this insight to compose task-specific bipartite graphs from real-world data, synthetically enriched through grounding. This leads to Unified Work Em-beddings (UWE), a task-agnostic bi-encoder that exploits our training-data structure with a many-to-many InfoNCE objective, and leverages token-level embeddings with task-agnostic soft late interaction. UWE demonstrates zero-shot ranking performance on unseen target spaces in the work domain, enables low-latency inference by caching the task target space embeddings, and shows significant gains in macro-averaged MAP and RP@10 over generalist embedding models.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2511.07969

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)

Add feedback

Flotta: a Secure and Flexible Spark-inspired Federated Learning Framework

Bonesana, Claudio, Malpetti, Daniele, Mitrović, Sandra, Mangili, Francesca, Azzimonti, Laura

arXiv.org Artificial IntelligenceSep-20-2024

We present Flotta, a Federated Learning framework designed to train machine learning models on sensitive data distributed across a multi-party consortium conducting research in contexts requiring high levels of security, such as the biomedical field. Flotta is a Python package, inspired in several aspects by Apache Spark, which provides both flexibility and security and allows conducting research using solely machines internal to the consortium. In this paper, we describe the main components of the framework together with a practical use case to illustrate the framework's capabilities and highlight its security, flexibility and user-friendliness.

consortium, flotta, node, (13 more...)

arXiv.org Artificial Intelligence

2409.13473

Country:

Europe > Switzerland > Basel-City > Basel (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.40)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

WorkBench: a Benchmark Dataset for Agents in a Realistic Workplace Setting

Styles, Olly, Miller, Sam, Cerda-Mardini, Patricio, Guha, Tanaya, Sanchez, Victor, Vidgen, Bertie

arXiv.org Artificial IntelligenceMay-1-2024

We introduce WorkBench: a benchmark dataset for evaluating agents' ability to execute tasks in a workplace setting. WorkBench contains a sandbox environment with five databases, 26 tools, and 690 tasks. These tasks represent common business activities, such as sending emails and scheduling meetings. The tasks in WorkBench are challenging as they require planning, tool selection, and often multiple actions. If a task has been successfully executed, one (or more) of the database values may change. The correct outcome for each task is unique and unambiguous, which allows for robust, automated evaluation. We call this key contribution outcome-centric evaluation. We evaluate five existing ReAct agents on WorkBench, finding they successfully complete as few as 3% of tasks (Llama2-70B), and just 43% for the best-performing (GPT-4). We further find that agents' errors can result in the wrong action being taken, such as an email being sent to the wrong person. WorkBench reveals weaknesses in agents' ability to undertake common business activities, raising questions about their use in high-stakes workplace settings. WorkBench is publicly available as a free resource at https://github.com/olly-styles/WorkBench.

agent, conference paper, email, (17 more...)

arXiv.org Artificial Intelligence

2405.00823

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Europe > Monaco (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

37 Home Depot Black Friday Deals (2023): Tools, Grills

WIREDNov-25-2023, 18:12:55 GMT

Take a gander at these Home Depot Black Friday deals from the comfort of your living room without having to line up at the gates to do battle with the masses in the store. Winter means, for more of us than not, spending a lot more time at home than we'd otherwise choose to spend. Vacations are a fine remedy, sure, but who has money to travel all season long? Make home a pleasant place to be, inside and outside, with these deals on equipment for your garage, backyard, and home, and you won't want to leave. WIRED tests products year-round and handpicked these deals based on the actual discounts, not just the discounts retailers claim to offer. Products that are sold out or no longer discounted as of publishing will be crossed out . We'll update this guide through November.

battery, black friday deal, home depot black friday deal, (13 more...)

WIRED

Country: North America > United States > Wisconsin > Milwaukee County > Milwaukee (0.06)

Industry: Retail > Online (0.61)

Technology: Information Technology > Artificial Intelligence (0.95)

Add feedback

24 Home Depot Black Friday Deals (2023): Tools, Grills

WIREDNov-22-2023, 22:42:22 GMT

Take a gander at these early Home Depot Black Friday deals and get a lead on the masses who'll be ready to crowd the digital gates before the Thanksgiving turkey even gets cold. Winter means, for more of us than not, spending a lot more time at home than we'd otherwise choose to spend. Vacations are a fine remedy, sure, but who has money to travel all season long? Make home a pleasant place to be, inside and outside, with these deals on equipment for your garage, backyard, and home, and you won't want to leave. WIRED tests products year-round and handpicked these deals based on the actual discounts, not just the discounts retailers claim to offer.

battery, black friday deal, home depot black friday deal, (11 more...)

WIRED

Country: North America > United States > Wisconsin > Milwaukee County > Milwaukee (0.06)

Industry: Retail > Online (0.61)

Technology: Information Technology > Artificial Intelligence (0.70)

Add feedback

13 Best Home Depot Black Friday Deals (2023): Smart Home, Outdoor Grills, Garage Tools

WIREDNov-16-2023, 16:33:28 GMT

Winter means, for more of us than not, spending a lot more time at home than we'd otherwise choose to spend. Vacations are a fine remedy, sure, but who has money to travel all season long? Make home a pleasant place to be, inside and outside, with these deals on equipment for your garage, backyard, and home, and you won't want to leave. WIRED tests products year-round and handpicked these deals based on the actual discounts, not just the discounts retailers claim to offer. Products that are sold out or no longer discounted as of publishing will be crossed out .

home depot black friday deal, outdoor grill, workbench, (11 more...)

WIRED

Country: North America > United States > Wisconsin > Milwaukee County > Milwaukee (0.06)

Industry:

Information Technology > Smart Houses & Appliances (0.50)
Retail > Online (0.40)

Technology: Information Technology > Artificial Intelligence (0.72)

Add feedback

RLang: A Declarative Language for Describing Partial World Knowledge to Reinforcement Learning Agents

Rodriguez-Sanchez, Rafael, Spiegel, Benjamin A., Wang, Jennifer, Patel, Roma, Tellex, Stefanie, Konidaris, George

arXiv.org Artificial IntelligenceMay-30-2023

We introduce RLang, a domain-specific language (DSL) for communicating domain knowledge to an RL agent. Unlike existing RL DSLs that ground to \textit{single} elements of a decision-making formalism (e.g., the reward function or policy), RLang can specify information about every element of a Markov decision process. We define precise syntax and grounding semantics for RLang, and provide a parser that grounds RLang programs to an algorithm-agnostic \textit{partial} world model and policy that can be exploited by an RL agent. We provide a series of example RLang programs demonstrating how different RL methods can exploit the resulting knowledge, encompassing model-free and model-based tabular algorithms, policy gradient and value-based methods, hierarchical approaches, and deep methods.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

2208.06448

Country:

North America > United States > Rhode Island > Providence County > Providence (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
(3 more...)

Genre: Research Report (0.50)

Industry:

Transportation > Passenger (0.49)
Leisure & Entertainment (0.46)
Materials (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Joint order assignment and picking station scheduling in KIVA warehouses with multiple stations

Yang, Xiying, Hua, Guowei, Zhang, Li, Cheng, T. C. E, Choi, Tsan Ming

arXiv.org Artificial IntelligenceMay-5-2023

The rapid development of e-commerce has brought new challenges to warehouse operations. Order picking plays a crucial role among all these operations, which directly affects the overall order fulfillment efficiency (Lamballais et al., 2017; Shen et al., 2020). The Robotic Mobile Fulfillment System (RMFS) is invented to improve order picking efficiency and reduce labour costs by exploiting rack-moving mobile robots (Boysen et al., 2017). The cooperation between the robots and movable racks eliminates pickers' unproductive movement in the picker-to-parts system (Battini et al., 2017). Compared with traditional manual warehouses, the picking performance of RMFS is far superior, which is reported to achieve over 600 order-lines per hour per workstation (Wulfraat, 2012; Banker, 2016). Nevertheless, order picking in RMFS needs further efficiency improvement due to the growing demand and increasingly tight delivery schedules brought by the prosperity of e-commerce (Batt & Gallino, 2019; Azadeh et al., 2017; Zhuang et al., 2021).

artificial intelligence, warehouse, workstation, (17 more...)

arXiv.org Artificial Intelligence

2108.09056

Country:

Asia > China > Beijing > Beijing (0.04)
Asia > Middle East > Jordan (0.04)
Asia > China > Hong Kong (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry:

Information Technology (0.54)
Transportation > Freight & Logistics Services (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)

Add feedback

TigerGraph launches Workbench for graph neural network ML/AI modeling

#artificialintelligenceMay-29-2022, 06:55:19 GMT

TigerGraph, maker of a graph analytics platform for data scientists, during its Graph & AI Summit event today introduced its TigerGraph ML (Machine Learning) Workbench, a new-gen toolkit that ostensibly will enable analysts to improve ML model accuracy significantly and shorten development cycles. Workbench does this while using familiar tools, workflows, and libraries in a single environment that plugs directly into existing data pipelines and ML infrastructure, TigerGraph VP Victor Lee told VentureBeat. The ML Workbench is a Jupyter-based Python development framework that enables data scientists to build deep-learning AI models using connected data directly from the business. Graph-enabled ML has proven to have more accurate predictive power and take far less run time than the conventional ML approach. Conventional machine learning algorithms are based on the learning of systems by training sets to develop a trained model.

data scientist, ml workbench, workbench, (8 more...)

#artificialintelligence

Industry: Information Technology (0.31)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.55)

Add feedback