AITopics | screenshot

Collaborating Authors

screenshot

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

AWS Billing Glitch Hits Customers With Billion-Dollar Fees

WIREDJul-17-2026, 19:50:38 GMT

An error with the cloud computing giant's billing operation caused some customers' monthly bills to rise from a few cents to billions of dollars. A glitch with Amazon Web Services' billing operation led some customers to believe they owed the world's fifth most valuable company billions of dollars. Bill Radjewski, who runs CollegeFootballData.com, was one of the affected customers. This morning, he woke up to a jarring email alert from AWS: He had racked up more than $1.5 billion in usage fees, and his August 1 bill was on track to be upwards of $3 billion. "I've had this account for 6+ years and in that time my monthly spend has never exceeded $0.02," Radjewski tells WIRED.

artificial intelligence, main content security politics, wired, (8 more...)

WIRED

Country: North America > United States (0.71)

Industry:

Information Technology > Services (0.71)
Media > News (0.49)

Technology:

Information Technology > Artificial Intelligence (0.71)
Information Technology > Communications > Web (0.61)

Add feedback

C'mon, you don't need an AI to check your spelling

EngadgetJun-25-2026, 09:50:31 GMT

C'mon, you don't need an AI to check your spelling C'mon, you don't need an AI to check your spelling Florida Republican claimed to use Claude as a proofreader, nothing else. If there's one thing we love more than catching a politician doing something silly, it's the excuse they confect to try and get out of it. The latest involves Florida Republican Anna Paulina Luna, who was caught using AI in a draft amendment to a bill because the text included the phrase Claude responded:. Which might hint that someone pasted in a conversation with the Anthropic chatbot of the same name and forgot to hide it. Luna was quick to shut down the accusation, posting on X (as reported by Gizmodo), that her staff used AI to correct a draft text and didn't edit, adding that it's not a shocker as most staff use it.

artificial intelligence, chatbot, natural language, (9 more...)

Engadget

Industry:

Leisure & Entertainment > Games > Computer Games (0.75)
Law > Statutes (0.53)

Technology:

Information Technology > Artificial Intelligence > Applied AI (0.61)
Information Technology > Communications > Mobile (0.55)
Information Technology > Communications > Social Media (0.44)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.39)

Add feedback

Thinking vs. Doing: Improving Agent Reasoning by Scaling Test-Time Interaction

Neural Information Processing SystemsJun-23-2026, 03:27:10 GMT

The current paradigm of test-time scaling relies on generating long reasoning traces ("thinking" more) before producing a response. In agent problems that require interaction, this can be done by generating thinking traces before acting in the world. However, this process does not allow agents to acquire new information from the environment or adapt their behavior over time. In this work, we propose to scale test-time interaction, an untapped dimension of test-time scaling that increases the agent's interaction horizon to enable running rich behaviors such as exploration, backtracking, and dynamic re-planning within a single rollout. To demonstrate the promise of this scaling dimension, we study the domain of web agents.

large language model, machine learning, natural language, (23 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre:

Research Report > Experimental Study (1.00)
Workflow (0.68)
Research Report > New Finding (0.67)

Industry:

Education > Educational Setting > Online (0.67)
Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(4 more...)

Add feedback

OPENCUA: Open Foundations for Computer-Use Agents

Neural Information Processing SystemsJun-22-2026, 19:13:06 GMT

Vision-language models have demonstrated impressive capabilities as computer-use agents (CUAs) capable of automating diverse computer tasks. As their commercial potential grows, critical details of the most capable CUA systems remain closed. As these agents will increasingly mediate digital interactions and execute consequential decisions on our behalf, the research community needs access to open CUA frameworks to study their capabilities, limitations, and risks. To bridge this gap, we propose OPENCUA, a comprehensive open-source framework for scaling CUA data and foundation models. Our framework consists of: (1) an annotation infrastructure that seamlessly captures human computer-use demonstrations; (2) AGENTNET, the first large-scale computer-use task dataset spanning 3 operating systems and 200+ applications and websites; (3) a scalable pipeline that transforms demonstrations into state-action pairs with reflective long Chain-of-Thought reasoning that sustain robust performance gains as data scales.

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Genre:

Workflow (1.00)
Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Information Technology > Services (0.68)

Technology:

Information Technology > Software (1.00)
Information Technology > Information Management > Search (1.00)
Information Technology > Communications > Social Media (1.00)
(5 more...)

Add feedback

macOSWorld: AMultilingual Interactive Benchmark for GUIAgents

Neural Information Processing SystemsJun-22-2026, 16:18:45 GMT

Graphical User Interface (GUI) agents show promising capabilities for automating computer-use tasks and facilitating accessibility, but existing interactive benchmarks are mostly English-only, covering web-use or Windows, Linux, and Android environments, but not macOS.

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Genre:

Workflow (1.00)
Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Software (1.00)
Information Technology > Graphics (1.00)
Information Technology > Communications (1.00)
(7 more...)

Add feedback

People training new AI models admit they just get chatbots to do it

New ScientistJun-22-2026, 10:57:59 GMT

The next generation of AI models are meant to be trained by people paid to have conversations with them, but several of these workers have admitted to that they simply get chatbots to do it instead. People who are paid to train new AI models by supplying them with high-quality conversation and tests are cheating and using chatbots like ChatGPT to do the job instead, multiple whistleblowers have told . The seemingly widespread practice risks undermining the future of AI, as it could lead to the "collapse" of more advanced models. Most AI models operating today were trained on text and data scraped from the internet . But as models have scaled up, requiring yet more training data, AI firms have begun using workers who carry out conversations and tests with AI, in the hope that the resulting high-quality data can improve the power and usefulness of future large language models (LLMs). These workers are normally employed by third parties, rather than AI companies directly, and are often working without full-time contracts and for low pay.

artificial intelligence, large language model, natural language, (18 more...)

New Scientist

Industry:

Marketing (0.43)
Law (0.30)
Health & Medicine > Therapeutic Area (0.30)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)

Add feedback

GUI-Reflection: Empowering Multimodal GUIModels with Self-Reflection Behavior Task: Find the size of the file.Penghao Wu, Shengnan Ma, Bo Wang, Jiaheng Yu, Lewei Lu, Ziwei Liu

Neural Information Processing SystemsJun-19-2026, 21:27:51 GMT

Multimodal Large Language Models (MLLMs) have shown great potential in re GUI volutionizing models mostly Graphical rely on User learning Interf from ace nearly (GUI) error automation.

large language model, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Genre:

Workflow (1.00)
Research Report > Experimental Study (0.46)

Industry: Education (0.94)

Technology:

Information Technology > Graphics (1.00)
Information Technology > Communications > Mobile (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

BTL-UI: Blink-Think-Link Reasoning Modelfor GUIAgent

Neural Information Processing SystemsJun-17-2026, 06:23:31 GMT

In the field of AI-driven human-GUI interaction automation, while rapid advances in multimodal large language models and reinforcement fine-tuning techniques have yielded remarkable progress, a fundamental challenge persists: their interaction logic significantly deviates from natural human-GUI communication patterns. To address this gap, we propose Blink-Think-Link (BTL), a brain-inspired framework for human-GUI interaction that mimics the human cognitive process between users and graphical interfaces. The system decomposes interactions into three biologically plausible phases: (1) Blink - rapid detection and attention to relevant screen areas, analogous to saccadic eye movements; (2) Think - higher-level reasoning and decision-making, mirroring cognitive planning; and (3) Link - generation of executable commands for precise motor control, emulating human action selection mechanisms. Additionally, we introduce two key technical innovations for BTL framework: (1) Blink Data Generation - an automated annotation pipeline specifically optimized for blink data, and (2) BTLReward - the first rule-based reward mechanism that enables reinforcement learning driven by both process and outcome. Building upon this framework, we develop a GUI agent model named BTL-UI, which demonstrates competitive performance across both static GUI understanding and dynamic interaction tasks in comprehensive benchmarks. These results provide conclusive empirical validation of the framework's efficacy in developing advanced GUI agents.

arxiv preprint arxiv, large language model, machine learning, (19 more...)

Neural Information Processing Systems

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Graphics (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(4 more...)

Add feedback

Scaling Computer-Use Grounding via User Interface Decomposition and Synthesis

Neural Information Processing SystemsJun-15-2026, 15:33:24 GMT

Graphical user interface (GUI) grounding, the ability to map natural language instructions to specific actions on graphical user interfaces, remains a critical bottleneck in computer use agent development. Current benchmarks oversimplify grounding tasks as short referring expressions, failing to capture the complexity of real-world interactions that require software commonsense, layout understanding, and fine-grained manipulation capabilities. To address these limitations, we introduce OSWORLD-G, a comprehensive benchmark comprising 564 finely annotated samples across diverse task types including text matching, element recognition, layout understanding, and precise manipulation. Additionally, we synthesize and release the largest computer use grounding dataset JEDI, which contains 4 million examples through multi-perspective decoupling of tasks. Our multi-scale models trained on JEDI demonstrate its effectiveness by outperforming existing approaches on ScreenSpot-v2, ScreenSpot-Pro, and our OSWORLD-G. Furthermore, we demonstrate that improved grounding with JEDI directly enhances agentic capabilities of general foundation models on complex computer tasks with state-of-the-art performance, improving from 23% to 51% on OSWorld. Through detailed ablation studies, we identify key factors contributing to grounding performance and verify that combining specialized data for different interface elements enables compositional generalization to novel interfaces.

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Genre:

Research Report > Experimental Study (1.00)
Workflow (0.93)
Research Report > New Finding (0.92)

Industry:

Media (1.00)
Leisure & Entertainment (1.00)
Information Technology > Software (1.00)
(7 more...)

Technology:

Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
(3 more...)

Add feedback

MIP against Agent: Malicious Image Patches Hijacking Multimodal OSAgents

Neural Information Processing SystemsJun-15-2026, 08:25:41 GMT

Large language models (LLMs) and vision-language models (VLMs) have demonstrated remarkable capabilities, driving significant advancements across a wide range of applications. These models are typically fine-tuned to align with specific objectives, such as being "helpful and harmless" [39]. However, recent work on adversarial attacks has demonstrated that carefully crafted inputs can bypass these alignment safeguards [65, 10, 4, 26, 52]. While such adversarial attacks can elicit harmful responses, the output is usually constrained to text that is not directly actionable, limiting the scope of possible harm. While malicious text outputs are concerning, it remains unclear whether the associated risks exceed those posed by information already accessible through the internet [18].

large language model, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country: