AITopics

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.64)

Neural Information Processing SystemsFeb-18-2026, 06:00:58 GMT

d0718553fd6b227a353c6432cf893285-Paper-Datasets_and_Benchmarks_Track.pdf

large language model, machine learning, programming language, (23 more...)

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.05)
Oceania > Australia (0.04)
North America > Montserrat (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry:

Information Technology (0.67)
Government (0.67)
Law > Intellectual Property & Technology Law (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(4 more...)

Neural Information Processing SystemsFeb-18-2026, 04:21:54 GMT

Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs Sukmin Y un

We hope our work will contribute to the development of general MLLMs suitable for web-based content generation and task automation.

large language model, machine learning, natural language, (20 more...)

Genre: Research Report (0.46)

Industry: Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.98)

Neural Information Processing SystemsFeb-17-2026, 22:20:26 GMT

bff09ce4b210b185a265c9bcd58048bb-Paper-Conference.pdf

large language model, machine learning, natural language, (22 more...)

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
North America > United States > California > San Diego County > San Diego (0.04)
Europe > Russia (0.04)
Asia > Russia (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry:

Information Technology (0.67)
Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(6 more...)

Neural Information Processing SystemsFeb-12-2026, 04:12:26 GMT

5950bf290a1570ea401bf98882128160-Paper-Datasets_and_Benchmarks.pdf

large language model, machine learning, natural language, (22 more...)

Country:

North America > United States > Ohio (0.05)
North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > Washington > King County > Seattle (0.04)
(13 more...)

Genre:

Research Report (0.68)
Workflow (0.46)

Industry:

Government (0.67)
Transportation > Passenger (0.46)
Transportation > Air (0.46)

Technology:

Information Technology > Communications > Web (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(3 more...)

arXiv.org Artificial IntelligenceDec-9-2025

An Index-based Approach for Efficient and Effective Web Content Extraction

Chen, Yihan, Xu, Benfeng, Wang, Xiaorui, Mao, Zhendong

As web agents (e.g., Deep Research) routinely consume massive volumes of web pages to gather and analyze information, LLM context management -- under large token budgets and low signal density -- emerges as a foundational, high-importance, and technically challenging problem for agentic and RAG pipelines. Existing solutions for extracting relevant content are inadequate: generative extraction models suffer from high latency, rule-based heuristics lack adaptability, and chunk-and-rerank methods are blind to webpage structure. To overcome these issues, we introduce Index-based Web Content Extraction to reframe the extraction process from slow, token-by-token generation into a highly efficient, discriminative task of index prediction, achieving both effectiveness and efficiency. We partition HTML into structure-aware, addressable segments, and extract only the positional indices of content relevant to a given query. This method decouples extraction latency from content length, enabling rapid, query-relevant extraction. We first evaluate our method as a post-retrieval processing component within an RAG QA system and find that it improves QA accuracy. Then we directly measure its match rate with the target content in two scenarios: main content extraction (ME) and query-relevant extraction (QE). Experimental results show that our method outperforms existing works in both accuracy and speed, effectively bridging the gap between LLMs and the vast webpages.

large language model, machine learning, natural language, (17 more...)

2512.06641

Country:

North America > United States (0.28)
Asia > China (0.28)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Communications > Web (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

arXiv.org Artificial IntelligenceDec-2-2025

RealWebAssist: A Benchmark for Long-Horizon Web Assistance with Real-World Users

Ye, Suyu, Shi, Haojun, Shih, Darren, Yun, Hyokun, Roosta, Tanya, Shu, Tianmin

To achieve successful assistance with long-horizon web-based tasks, AI agents must be able to sequentially follow real-world user instructions over a long period. Unlike existing web-based agent benchmarks, sequential instruction following in the real world poses significant challenges beyond performing a single, clearly defined task. For instance, real-world human instructions can be ambiguous, require different levels of AI assistance, and may evolve over time, reflecting changes in the user's mental state. To address this gap, we introduce RealWebAssist, a novel benchmark designed to evaluate sequential instruction-following in realistic scenarios involving long-horizon interactions with the web, visual GUI grounding, and understanding ambiguous real-world user instructions. RealWebAssist includes a dataset of sequential instructions collected from real-world human users. Each user instructs a web-based assistant to perform a series of tasks on multiple websites. A successful agent must reason about the true intent behind each instruction, keep track of the mental state of the user, understand user-specific routines, and ground the intended tasks to actions on the correct GUI elements. Our experimental results show that state-of-the-art models struggle to understand and ground user instructions, posing critical challenges in following real-world user instructions for long-horizon web assistance.

large language model, machine learning, natural language, (20 more...)

2504.10445

Country: North America > United States (1.00)

Genre:

Workflow (1.00)
Research Report > New Finding (1.00)

Industry:

Transportation (1.00)
Consumer Products & Services > Travel (1.00)
Information Technology (0.93)
(3 more...)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
(2 more...)

arXiv.org Artificial IntelligenceNov-11-2025

WebVIA: A Web-based Vision-Language Agentic Framework for Interactive and Verifiable UI-to-Code Generation

Xu, Mingde, Yang, Zhen, Hong, Wenyi, Pan, Lihang, Fan, Xinyue, Wang, Yan, Gu, Xiaotao, Xu, Bin, Tang, Jie

User interface (UI) development requires translating design mockups into functional code, a process that remains repetitive and labor-intensive. While recent Vision-Language Models (VLMs) automate UI-to-Code generation, they generate only static HTML/CSS/JavaScript layouts lacking interactivity. To address this, we propose WebVIA, the first agentic framework for interactive UI-to-Code generation and validation. The framework comprises three components: 1) an exploration agent to capture multi-state UI screenshots; 2) a UI2Code model that generates executable interactive code; 3) a validation module that verifies the interactivity. Experiments demonstrate that WebVIA-Agent achieves more stable and accurate UI exploration than general-purpose agents (e.g., Gemini-2.5-Pro). In addition, our fine-tuned WebVIA-UI2Code models exhibit substantial improvements in generating executable and interactive HTML/CSS/JavaScript code, outperforming their base counterparts across both interactive and static UI2Code benchmarks. Our code and models are available at \href{https://zheny2751-dotcom.github.io/webvia.github.io/}{\texttt{https://webvia.github.io}}.

large language model, machine learning, natural language, (21 more...)

2511.06251

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Automatic Programming (0.84)

arXiv.org Artificial IntelligenceOct-28-2025

Code Aesthetics with Agentic Reward Feedback

Xiao, Bang, Jiang, Lingjie, Huang, Shaohan, Lv, Tengchao, Huang, Yupan, Wu, Xun, Cui, Lei, Wei, Furu

Large Language Models (LLMs) have become valuable assistants for developers in code-related tasks. While LLMs excel at traditional programming tasks such as code generation and bug fixing, they struggle with visually-oriented coding tasks, often producing suboptimal aesthetics. In this paper, we introduce a new pipeline to enhance the aesthetic quality of LLM-generated code. We first construct AesCode-358K, a large-scale instruction-tuning dataset focused on code aesthetics. Next, we propose agentic reward feedback, a multi-agent system that evaluates executability, static aesthetics, and interactive aesthetics. Building on this, we develop GRPO-AR, which integrates these signals into the GRPO algorithm for joint optimization of functionality and code aesthetics. Finally, we develop OpenDesign, a benchmark for assessing code aesthetics. Experimental results show that combining supervised fine-tuning on AesCode-358K with reinforcement learning using agentic reward feedback significantly improves performance on OpenDesign and also enhances results on existing benchmarks such as PandasPlotBench. Notably, our AesCoder-4B surpasses GPT -4o and GPT -4.1, and achieves performance comparable to large open-source models with 480B-685B parameters, underscoring the effectiveness of our approach.Figure 1: Performance comparison of different models on the OpenDesign benchmark.

aesthetics, large language model, machine learning, (19 more...)

2510.23272

Genre: Research Report > New Finding (0.66)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)