AITopics | contact app

Collaborating Authors

contact app

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents

Xu, Tianqi, Chen, Linyao, Wu, Dai-Jie, Chen, Yanjun, Zhang, Zecheng, Yao, Xiang, Xie, Zhiqiang, Chen, Yongchao, Liu, Shilong, Qian, Bochen, Torr, Philip, Ghanem, Bernard, Li, Guohao

arXiv.org Artificial IntelligenceJul-1-2024

The development of autonomous agents increasingly relies on Multimodal Language Models (MLMs) to perform tasks described in natural language with GUI environments, such as websites, desktop computers, or mobile phones. Existing benchmarks for MLM agents in interactive environments are limited by their focus on a single environment, lack of detailed and generalized evaluation methods, and the complexities of constructing tasks and evaluators. To overcome these limitations, we introduce Crab, the first agent benchmark framework designed to support cross-environment tasks, incorporating a graph-based fine-grained evaluation method and an efficient mechanism for task and evaluator construction. Our framework supports multiple devices and can be easily extended to any environment with a Python interface. Leveraging Crab, we developed a cross-platform Crab Benchmark-v0 comprising 100 tasks in computer desktop and mobile phone environments. We evaluated four advanced MLMs using different single and multi-agent system configurations on this benchmark. The experimental results demonstrate that the single agent with GPT-4o achieves the best completion ratio of 35.26%. All framework code, agent code, and task datasets are publicly available at https://github.com/camel-ai/crab.

agent, application, evaluator, (13 more...)

arXiv.org Artificial Intelligence

2407.01511

Country:

North America > United States > California > Los Angeles County > Pasadena (0.04)
North America > Canada (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)

Genre:

Workflow (1.00)
Research Report > New Finding (1.00)

Industry:

Information Technology > Software (0.49)
Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Communications > Mobile (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

Autonomous Evaluation and Refinement of Digital Agents

Pan, Jiayi, Zhang, Yichi, Tomlin, Nicholas, Zhou, Yifei, Levine, Sergey, Suhr, Alane

arXiv.org Artificial IntelligenceApr-10-2024

We show that domain-general automatic evaluators can significantly improve the performance of agents for web navigation and device control. We experiment with multiple evaluation models that trade off between inference cost, modularity of design, and accuracy. We validate the performance of these models in several popular benchmarks for digital agents, finding between 74.4 and 92.9% agreement with oracle evaluation metrics. Finally, we use these evaluators to improve the performance of existing agents via fine-tuning and inference-time guidance. Without any additional supervision, we improve state-of-the-art performance by 29% on the popular benchmark WebArena, and achieve a 75% relative improvement in a challenging domain transfer scenario.

agent, evaluator, statictext, (15 more...)

arXiv.org Artificial Intelligence

2404.06474

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > Mexico > Mexico City > Mexico City (0.04)
(14 more...)

Genre: Research Report > New Finding (0.67)

Industry:

Retail (1.00)
Consumer Products & Services > Restaurants (1.00)
Information Technology > Services (0.93)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Communications > Mobile (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(4 more...)

Add feedback

Your iPhone's Contacts App Is More Powerful Than You Realize. Here Are 5 Ways to Get the Most Out of It

TIME - TechFeb-15-2019, 13:25:18 GMT

You're not the only one who silently laments spending time searching through the Contacts app on your iPhone or other iOS device, hunting for that one person you barely remember yet need to get in touch with for whatever reason. It only gets worse when you realize their information is either incorrect, outdated, or not where you thought you saved it. Whether you're looking for a co-worker, a client, an acquaintance, or a long-lost friend you bumped into at a party, it's helpful to keep who's who in order in your Contacts app. And you just might find that the Contacts app is far more powerful when you take the time to get the most out of it. Filling out contact information beyond a person's name, email, and phone number might seem like overkill, but doing so can make Siri a more powerful tool when it comes to connecting with people.

contact app, contact card, iphone, (5 more...)

TIME - Tech

Technology:

Information Technology > Communications > Mobile (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.40)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.40)

Add feedback