AITopics | div id

Collaborating Authors

div id

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

WAFFLE: Multi-Modal Model for Automated Front-End Development

Liang, Shanchao, Jiang, Nan, Qian, Shangshu, Tan, Lin

arXiv.org Artificial IntelligenceOct-23-2024

Web development involves turning UI designs into functional webpages, which can be difficult for both beginners and experienced developers due to the complexity of HTML's hierarchical structures and styles. While Large Language Models (LLMs) have shown promise in generating source code, two major challenges persist in UI-to-HTML code generation: (1) effectively representing HTML's hierarchical structure for LLMs, and (2) bridging the gap between the visual nature of UI designs and the text-based format of HTML code. To tackle these challenges, we introduce Waffle, a new fine-tuning strategy that uses a structure-aware attention mechanism to improve LLMs' understanding of HTML's structure and a contrastive fine-tuning approach to align LLMs' understanding of UI images and HTML code. Models fine-tuned with Waffle show up to 9.00 pp (percentage point) higher HTML match, 0.0982 higher CW-SSIM, 32.99 higher CLIP, and 27.12 pp higher LLEM on our new benchmark WebSight-Test and an existing benchmark Design2Code, outperforming current fine-tuning methods.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2410.18362

Country:

North America > United States > California > San Diego County > San Diego (0.04)
North America > Dominican Republic (0.04)
North America > Canada > Ontario > Toronto (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

PeriGuru: A Peripheral Robotic Mobile App Operation Assistant based on GUI Image Understanding and Prompting with LLM

Fu, Kelin, Tian, Yang, Bian, Kaigui

arXiv.org Artificial IntelligenceSep-14-2024

Smartphones have significantly enhanced our daily learning, communication, and entertainment, becoming an essential component of modern life. However, certain populations, including the elderly and individuals with disabilities, encounter challenges in utilizing smartphones, thus necessitating mobile app operation assistants, a.k.a. mobile app agent. With considerations for privacy, permissions, and cross-platform compatibility issues, we endeavor to devise and develop PeriGuru in this work, a peripheral robotic mobile app operation assistant based on GUI image understanding and prompting with Large Language Model (LLM). PeriGuru leverages a suite of computer vision techniques to analyze GUI screenshot images and employs LLM to inform action decisions, which are then executed by robotic arms. PeriGuru achieves a success rate of 81.94% on the test task set, which surpasses by more than double the method without PeriGuru's GUI image interpreting and prompting design. Our code is available on https://github.com/Z2sJ4t/PeriGuru.

agent, div id, periguru, (15 more...)

arXiv.org Artificial Intelligence

2409.09354

Country:

Europe > France (0.04)
Asia > China (0.04)

Genre: Research Report (0.64)

Industry:

Health & Medicine (0.94)
Information Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Model Of Information System Towards Harmonized Industry And Computer Science

Faith, Edafetanure-Ibeh, Tamarauefiye, Evah Patrick, Uyi, Mark Uwuoruya

arXiv.org Artificial IntelligenceMar-28-2024

The aim of attending an educational institution is learning, which in turn is sought after for the reason of independence of thoughts, ideologies as well as physical and material independence. This physical and material independence is gotten from working in the industry, that is, being a part of the independent working population of the country. There needs to be a way by which students upon graduation can easily adapt to the real world with necessary skills and knowledge required. This problem has been a challenge in some computer science departments, which after effects known after the student begins to work in an industry. The objectives of this project include: Designing a web based chat application for the industry and computer science department, Develop a web based chat application for the industry and computer science and Evaluate the web based chat application for the industry and computer science department. Waterfall system development lifecycle is used in establishing a system project plan, because it gives an overall list of processes and sub-processes required in developing a system. The descriptive research method applied in this project is documentary analysis of previous articles. The result of the project is the design, software a web-based chat application that aids communication between the industry and the computer science department and the evaluation of the system. The application is able to store this information which can be decided to be used later. Awareness of the software to companies and universities, implementation of the suggestions made by the industry in the computer science curriculum, use of this software in universities across Nigeria and use of this not just in the computer science field but in other field of study

computer science department, div class, information system, (12 more...)

arXiv.org Artificial Intelligence

2406.11848

Country:

North America > Canada (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > United Kingdom (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry:

Information Technology > Software (0.46)
Education > Educational Setting > Higher Education (0.46)

Technology:

Information Technology > Software Engineering (1.00)
Information Technology > Artificial Intelligence (1.00)
Information Technology > Communications > Web (0.93)
(2 more...)

Add feedback

Synapse: Trajectory-as-Exemplar Prompting with Memory for Computer Control

Zheng, Longtao, Wang, Rundong, Wang, Xinrun, An, Bo

arXiv.org Artificial IntelligenceJan-19-2024

Building agents with large language models (LLMs) for computer control is a burgeoning research area, where the agent receives computer states and performs actions to complete complex tasks. Previous computer agents have demonstrated the benefits of in-context learning (ICL); however, their performance is hindered by several issues. First, the limited context length of LLMs and complex computer states restrict the number of exemplars, as a single webpage can consume the entire context. Second, the exemplars in current methods, such as high-level plans and multi-choice questions, cannot represent complete trajectories, leading to suboptimal performance in long-horizon tasks. Third, existing computer agents rely on task-specific exemplars and overlook the similarity among tasks, resulting in poor generalization to novel tasks. To address these challenges, we introduce Synapse, a computer agent featuring three key components: i) state abstraction, which filters out task-irrelevant information from raw states, allowing more exemplars within the limited context, ii) trajectory-as-exemplar prompting, which prompts the LLM with complete trajectories of the abstracted states and actions to improve multi-step decision-making, and iii) exemplar memory, which stores the embeddings of exemplars and retrieves them via similarity search for generalization to novel tasks. We evaluate Synapse on MiniWoB++, a standard task suite, and Mind2Web, a real-world website benchmark. In MiniWoB++, Synapse achieves a 99.2% average success rate (a 10% relative improvement) across 64 tasks using demonstrations from only 48 tasks. Notably, Synapse is the first ICL method to solve the book-flight task in MiniWoB++. Synapse also exhibits a 56% relative improvement in average step success rate over the previous state-of-the-art prompting scheme in Mind2Web.

agent, div id, ref, (16 more...)

arXiv.org Artificial Intelligence

2306.07863

Country:

North America > United States > Connecticut > Hartford County > Hartford (0.04)
North America > United States > New York > Suffolk County > Islip (0.04)
North America > United States > Texas > Taylor County > Abilene (0.04)
(4 more...)

Genre:

Workflow (1.00)
Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Multimodal Web Navigation with Instruction-Finetuned Foundation Models

Furuta, Hiroki, Lee, Kuang-Huei, Nachum, Ofir, Matsuo, Yutaka, Faust, Aleksandra, Gu, Shixiang Shane, Gur, Izzeddin

arXiv.org Machine LearningOct-1-2023

The progress of autonomous web navigation has been hindered by the dependence on billions of exploratory interactions via online reinforcement learning, and domain-specific model designs that make it difficult to leverage generalization from rich out-of-domain data. In this work, we study data-driven offline training for web agents with vision-language foundation models. We propose an instruction-following multimodal agent, WebGUM, that observes both webpage screenshots and HTML pages and outputs web navigation actions, such as click and type. WebGUM is trained by jointly finetuning an instruction-finetuned language model and a vision encoder with temporal and local perception on a large corpus of demonstrations. We empirically demonstrate this recipe improves the agent's ability of grounded multimodal perception, HTML comprehension, and multi-step reasoning, outperforming prior works by a significant margin. On the MiniWoB, we improve over the previous best offline methods by more than 45.8%, even outperforming online-finetuned SoTA, humans, and GPT-4-based agent. On the WebShop benchmark, our 3-billion-parameter model achieves superior performance to the existing SoTA, PaLM-540B. Furthermore, WebGUM exhibits strong positive transfer to the real-world planning tasks on the Mind2Web. We also collect 347K high-quality demonstrations using our trained models, 38 times larger than prior work, and make them available to promote future research in this direction. Web navigation is a class of sequential decision making problems where agents interact with web interfaces following user instructions (Shi et al., 2017; Liu et al., 2018; Gur et al., 2019). Common web navigation tasks include, for example, form filling (Diaz et al., 2013), information retrieval (Nogueira & Cho, 2016; Adolphs et al., 2022), or sending emails via a sequence of interactions with computer interface such as click or type (Figure 1). Recently, there has been a growing interest in developing agents to automate these actions and free humans from repetitive interactions (Mazumder & Riva, 2020; Li et al., 2020; Shvo et al., 2021). Most prior works studied web navigation problems as online RL to learn the optimal action distribution with task-specific models from scratch (Liu et al., 2018; Gur et al., 2019; Jia et al., 2019; Humphreys et al., 2022).

arxiv preprint arxiv, large language model, machine learning, (17 more...)

arXiv.org Machine Learning

2305.11854

Country:

Asia > Japan > Honshū > Chūbu > Toyama Prefecture > Toyama (0.04)
Asia > Middle East > Jordan (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Mobile-Env: An Evaluation Platform and Benchmark for Interactive Agents in LLM Era

Zhang, Danyang, Chen, Lu, Zhao, Zihan, Cao, Ruisheng, Yu, Kai

arXiv.org Artificial IntelligenceJun-14-2023

Diverse evaluation benchmarks play a crucial role to assess a wide range of capabilities of large language models (LLM). Although plenty of endeavors have been dedicated to building valuable benchmarks, there is still little work aiming at evaluating the capability of LLM in multistep interactive environments. Noticing that LLM requires a text representation of the environment observations for interaction, we choose to fill such a blank by building a novel benchmark based on the information user interface (InfoUI). InfoUI consists of rich text contents and can be represented in some text formats, thus is suitable for the assessment of interaction ability of LLM. Additionally, the complex structures of InfoUI can further raise a challenge for LLM to understand structured texts rather than plain texts. An interaction platform is always used to evaluate an agent, however, there is still a lack of a satisfactory interaction platform dedicated to InfoUI. Consequently, we propose to build a novel easily-extendable, adaptable, and close-to-reality interaction platform, Mobile-Env, to provide a base for an appropriate benchmark. Based on Mobile-Env, an InfoUI task set WikiHow is then built to establish a benchmark for the multistep interaction capability of LLM in structured text-based environments. Agents based on a series of LLMs are tested on the task set to obtain an insight into the potential and challenge of LLM for InfoUI interaction. It is sincerely welcome that the community contribute new environments and new task sets for Mobile-Env to provide better test benchmarks and facilitate the development of the corresponding domains.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2305.08144

Country:

Asia > Japan > Honshū > Chūbu > Toyama Prefecture > Toyama (0.04)
Europe > United Kingdom > England (0.04)
Europe > Austria (0.04)
(5 more...)

Genre: Research Report (1.00)

Industry: Law (0.92)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

Enabling Conversational Interaction with Mobile UI using Large Language Models

Wang, Bryan, Li, Gang, Li, Yang

arXiv.org Artificial IntelligenceFeb-17-2023

Conversational agents show the promise to allow users to interact with mobile devices using language. However, to perform diverse UI tasks with natural language, developers typically need to create separate datasets and models for each specific task, which is expensive and effort-consuming. Recently, pre-trained large language models (LLMs) have been shown capable of generalizing to various downstream tasks when prompted with a handful of examples from the target task. This paper investigates the feasibility of enabling versatile conversational interactions with mobile UIs using a single LLM. We designed prompting techniques to adapt an LLM to mobile UIs. We experimented with four important modeling tasks that address various scenarios in conversational interaction. Our method achieved competitive performance on these challenging tasks without requiring dedicated datasets and training, offering a lightweight and generalizable approach to enable language-based mobile interaction.

information, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2209.08655

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Germany > Hamburg (0.05)
North America > United States > New York > New York County > New York City (0.05)
(7 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Generative Pre-Trained Transformer-3 (GPT-3) – The Engineering of Conscious Experience

#artificialintelligenceSep-29-2020, 04:20:09 GMT

Facebook Oversight Board plans to take time over rulings /a span class "rss-date" September 24, 2020 /span div class "rssSummary" Former Danish PM Helle Thorning-Schmidt says her panel will take its time over takedown rulings.

large language model, machine learning, natural language, (23 more...)

#artificialintelligence

Country: Europe > Denmark > Capital Region > Copenhagen (0.04)

Industry:

Education > Curriculum > Subject-Specific Education (0.47)
Health & Medicine (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.86)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback