AITopics | Anwar, Ammar

Collaborating Authors

Anwar, Ammar

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

In-Context Ensemble Learning from Pseudo Labels Improves Video-Language Models for Low-Level Workflow Understanding

Xu, Moucheng, Chatzaroulas, Evangelos, McCutcheon, Luc, Ahad, Abdul, Azeem, Hamzah, Marecki, Janusz, Anwar, Ammar

arXiv.org Artificial IntelligenceOct-20-2024

A Standard Operating Procedure (SOP) defines a low-level, step-by-step written guide for a business software workflow. SOP generation is a crucial step towards automating end-to-end software workflows. Manually creating SOPs can be time-consuming. Recent advancements in large video-language models offer the potential for automating SOP generation by analyzing recordings of human demonstrations. However, current large video-language models face challenges with zero-shot SOP generation. In this work, we first explore in-context learning with video-language models for SOP generation. We then propose an exploration-focused strategy called In-Context Ensemble Learning, to aggregate pseudo labels of multiple possible paths of SOPs. The proposed in-context ensemble learning as well enables the models to learn beyond its context window limit with an implicit consistency regularisation. We report that in-context learning helps video-language models to generate more temporally accurate SOP, and the proposed in-context ensemble learning can consistently enhance the capabilities of the video-language models in SOP generation.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2409.15867

Country:

North America > Canada (0.14)
Europe (0.14)
Asia > Thailand (0.14)

Genre:

Workflow (1.00)
Research Report (0.82)

Industry: Health & Medicine (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Training a Vision Language Model as Smartphone Assistant

Dorka, Nicolai, Marecki, Janusz, Anwar, Ammar

arXiv.org Artificial IntelligenceApr-12-2024

Addressing the challenge of a digital assistant capable of executing a wide array of user tasks, our research focuses on the realm of instruction-based mobile device control. We leverage recent advancements in large language models (LLMs) and present a visual language model (VLM) that can fulfill diverse tasks on mobile devices. It uses the visual input from the device screen and mimics human-like interactions, encompassing gestures such as tapping and swiping. This generality in the input and output space allows our agent to interact with any application on the device. Unlike previous methods, our model operates not only on a single screen image but on vision-language sentences created from sequences of past screenshots along with corresponding actions. Evaluating our method on the challenging Android in the Wild benchmark demonstrates its promising efficacy and potential. As mobile devices continue to evolve, there is an increasing demand for intuitive and efficient methods of interaction. Traditionally, users operate their devices through a series of taps and gestures on the screen.

arxiv preprint arxiv, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

2404.08755

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Communications > Mobile (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback