AITopics | desktop application

Collaborating Authors

desktop application

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Grounding Computer Use Agents on Human Demonstrations

Feizi, Aarash, Nayak, Shravan, Jian, Xiangru, Lin, Kevin Qinghong, Li, Kaixin, Awal, Rabiul, Lù, Xing Han, Obando-Ceron, Johan, Rodriguez, Juan A., Chapados, Nicolas, Vazquez, David, Romero-Soriano, Adriana, Rabbany, Reihaneh, Taslakian, Perouz, Pal, Christopher, Gella, Spandana, Rajeswar, Sai

arXiv.org Artificial IntelligenceNov-11-2025

Building reliable computer-use agents requires grounding: accurately connecting natural language instructions to the correct on-screen elements. While large datasets exist for web and mobile interactions, high-quality resources for desktop environments are limited. CUA, a large-scale desktop grounding dataset built from expert human demonstrations. It covers 87 applications across 12 categories and includes 56K screenshots, with every on-screen element carefully annotated for a total of over 3.56M human-verified annotations. From these demonstrations, we generate diverse instructions that capture a wide range of real-world tasks, providing high-quality data for model training. These results demonstrate the critical role of high-quality, expert-driven datasets in advancing general-purpose computer-use agents. The vision of computer-use agents (CUA) that operate software on behalf of users has gained significant momentum with recent progress in multimodal large language model-based agents (OpenAI, 2025; Anthropic, 2024a; Qin et al., 2025; Wang et al., 2025a). These agents promise to automate routine work and make complex digital tools more accessible. For such agents to succeed, they must first plan the next step in a task, then ground the plan to the exact on-screen element to click, type, or drag. Accurate grounding is critical: without correctly identifying the right button or menu item, even a flawless plan cannot be executed. In FreeCAD, for instance, when asked to "open the color picker" (Figure 1), the agent must distinguish a small palette icon from look-alike tools, one of which it must precisely click. When grounding fails, the plan quickly veers off course, minor errors compound, and tasks ultimately fail (Nayak et al., 2025). Moreover, grounding in desktop applications is challenging due to their complexity and diversity. These applications often feature high-resolution displays with dense layouts and visually similar elements, making precise localization difficult.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2511.07332

Country:

Asia (0.46)
North America (0.28)

Genre: Research Report > New Finding (0.66)

Industry:

Media (0.74)
Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

An Embedded Intelligent System for Attendance Monitoring

Abderraouf, Touzene, Wassim, Abed Abdeljalil, Larabi, Slimane

arXiv.org Artificial IntelligenceJun-19-2024

In this paper, we propose an intelligent embedded system for monitoring class attendance and sending the attendance list to a remote computer. The proposed system consists of two parts : an embedded device (Raspberry with PI camera) for facial recognition and a web application for attendance management. The proposed solution take into account the different challenges: the limited resources of the Raspberry Pi, the need to adapt the facial recognition model and achieving acceptable performance using images provided by the Raspberry Pi camera.

application, face detection, student, (14 more...)

arXiv.org Artificial Intelligence

2406.13694

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
Asia > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
(5 more...)

Genre: Research Report (0.50)

Industry:

Information Technology > Security & Privacy (0.46)
Education > Educational Setting (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web

Kapoor, Raghav, Butala, Yash Parag, Russak, Melisa, Koh, Jing Yu, Kamble, Kiran, Alshikh, Waseem, Salakhutdinov, Ruslan

arXiv.org Artificial IntelligenceFeb-28-2024

For decades, human-computer interaction has fundamentally been manual. Even today, almost all productive work done on the computer necessitates human input at every step. Autonomous virtual agents represent an exciting step in automating many of these menial tasks. Virtual agents would empower users with limited technical proficiency to harness the full possibilities of computer systems. They could also enable the efficient streamlining of numerous computer tasks, ranging from calendar management to complex travel bookings, with minimal human intervention. In this paper, we introduce OmniACT, the first-of-a-kind dataset and benchmark for assessing an agent's capability to generate executable programs to accomplish computer tasks. Our scope extends beyond traditional web automation, covering a diverse range of desktop applications. The dataset consists of fundamental tasks such as "Play the next song", as well as longer horizon tasks such as "Send an email to John Doe mentioning the time and place to meet". Specifically, given a pair of screen image and a visually-grounded natural language task, the goal is to generate a script capable of fully executing the task. We run several strong baseline language model agents on our benchmark. The strongest baseline, GPT-4, performs the best on our benchmark However, its performance level still reaches only 15% of the human proficiency in generating executable scripts capable of completing the task, demonstrating the challenge of our task for conventional web agents. Our benchmark provides a platform to measure and evaluate the progress of language model agents in automating computer tasks and motivates future work towards building multimodal models that bridge large language models and the visual grounding of computer screens.

agent, application, pyautogui, (15 more...)

arXiv.org Artificial Intelligence

2402.17553

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Washington > King County > Seattle (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)

Genre: Research Report (1.00)

Industry:

Information Technology (0.67)
Education (0.46)
Media > Music (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

ChatGPT for Windows - Desktop Application

#artificialintelligenceFeb-15-2023, 10:27:58 GMT

Use ChatGPT directly on the Windows desktop instead of in the browser: "ChatGPT Desktop Application" makes it possible. ChatGPT is currently the hottest text AI tool that is usually completely free to use on the web. Instead of using the AI text generator in the browser, the desktop application available here brings ChatGPT directly to the desktop. As a result, you no longer have to keep ChatGPT open in its own tab to interact with the chatbot. Yes, ChatGPT can easily create longer texts.

chatgpt, chatgpt desktop application, desktop application, (2 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Top 10 Programming Languages Recruiters are Looking For in 2022

#artificialintelligenceJan-26-2022, 09:33:49 GMT

Post pandemic, AI has become one of the top agendas for businesses as it offers enhanced customer experience, resilience, and reliability. With the advancements in machine learning, data analytics, and conversational AI, companies are finding it feasible and affordable to deploy AI tools that allow them to solve problems and increase efficiency. Here are the 10 most popular programming languages among job seekers. Python can be regarded as the future of programming languages. As per the latest statistics, Python is the main coding language for around 80% of developers.

application, programming language, programming language recruiter, (11 more...)

#artificialintelligence

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Artificial Intelligence (1.00)

Add feedback

How to Detect Rotten Fruits Using Image Processing Python?

#artificialintelligenceDec-6-2021, 15:20:15 GMT

Freshness provides one of the essential characteristics for consumers. Consumers prefer fresh fruits rather than rotten ones when it comes to hygiene. An efficient fruit detection system is required to facilitate humans. So, for the easiness of people, this desktop application is proposed, named "Detection of Rotten Fruits (DRF)" by using Artificial Intelligence and Computer Vision. DRF is a desktop application for detecting rottenness in fruits that can be used to indicate the fruits according to their rottenness.

accuracy, neural network, rottenness, (14 more...)

#artificialintelligence

Country: Asia > Pakistan (0.05)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.33)

Add feedback

Top 10 Open Source and Free RPA Tools of 2020

#artificialintelligenceDec-13-2020, 12:01:54 GMT

Similar to numerous software usage, there's a build-or-buy choice when getting started with Robotic Process Automation (RPA). Actually, Gartner recently called RPA the fastest-growing enterprise software segment of 2018, with 63% development in worldwide incomes. It's a serious market, as well, you have alternatives. Besides, commercial RPA merchants have commonly tried to prioritize ease of use, with expectations of empowering non-developers to have the option to make and deploy bots without a huge amount of technical overhead. Some of the commercial merchants offer a "freemium" product as a method of tempting prospective customers to kick the tires on their platforms. There are various RPA tools accessible in the market and picking one could be a challenge.

application, platform, source and free rpa tool, (10 more...)

#artificialintelligence

Country: Asia > Singapore (0.05)

Industry: Information Technology (0.30)

Technology: Information Technology > Artificial Intelligence > Robots (0.76)

Add feedback

mysam

#artificialintelligenceFeb-19-2018, 12:40:49 GMT

Sam is an open-source, web-based "intelligent" assistant. It can listen to you, learn new actions and is extensible with JavaScript plugins. Sam runs a NodeJS server and in any modern browser or as an Electron desktop application. At first startup Sam will load the basic frontend training data (like learning your name, provide help, saying hi or to learn something new) and ask for your name. To talk to Sam press CTRL SPACE (make sure the window is focused).

application, artificial intelligence, plugin, (7 more...)

#artificialintelligence

Technology:

Information Technology > Software (0.58)
Information Technology > Artificial Intelligence (0.58)

Add feedback

Fake celebrity porn is all over Reddit thanks to a new app

Daily Mail - Science & techJan-25-2018, 22:28:03 GMT

Back in December, it was discovered that Reddit users were creating fake pornography using celebrity faces pasted on to adult film actresses' bodies. The disturbing videos, created by Reddit user deepfakes, look strikingly real as a result of a sophisticated machine learning algorithm, which uses photographs to create human masks that are then overlaid on top of adult film footage. Now, AI-assisted porn is spreading all over Reddit, thanks to an easy-to-use app that can be downloaded directly to your desktop computer, according to Motherboard. Star Wars lead Daisy Ridley has been featured in a fake video on the Reddit thread. One of the site's users, deepfakeapp, created a desktop application called FakeApp that lets users take adult film footage and swap any female celebrity's face onto porn actresses' bodies The app, called FakeApp, uses deepfakes' algorithm, but doesn't require any knowledge of coding.

artificial intelligence, machine learning, social media, (14 more...)

Daily Mail - Science & tech

Industry:

Media > News (1.00)
Media > Film (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.96)

Add feedback

Finding Image Regions with Human Computation and Games with a Purpose

Lux, Mathias (Klagenfurt University) | Müller, Alexander (Klagenfurt University) | Guggenberger, Mario (Klagenfurt University)

AAAI ConferencesOct-7-2012

Manual image annotation is a tedious and time-consuming task, while automated methods are error prone and limited in their results. Human computation, and especially games with a purpose, have shown potential to create high quality annotations by "hiding the complexity" of the actual annotation task and employing the "wisdom of the crowds". In this demo paper we present two games with a single purpose: finding regions in images that correspond to given terms. We discuss approach, implementation, and preliminary results of our work and give an outlook to immediate future work.

artificial intelligence, human computer interaction, rpmobile, (16 more...)

AAAI Conferences

Eighth Artificial Intelligence and Interactive Digital Entertainment Conference

Country:

North America > United States > New York > New York County > New York City (0.05)
Europe > Austria (0.05)

Industry: Leisure & Entertainment > Games (0.95)

Technology:

Information Technology > Artificial Intelligence (1.00)
Information Technology > Human Computer Interaction (0.73)

Add feedback