AITopics | css selector

Collaborating Authors

css selector

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

WebLists: Extracting Structured Information From Complex Interactive Websites Using Executable LLM Agents

Bohra, Arth, Saroyan, Manvel, Melkozerov, Danil, Karufanyan, Vahe, Maher, Gabriel, Weinberger, Pascal, Harutyunyan, Artem, Campagna, Giovanni

arXiv.org Artificial IntelligenceApr-18-2025

Most recent web agent research has focused on navigation and transaction tasks, with little emphasis on extracting structured data at scale. We present WebLists, a benchmark of 200 data-extraction tasks across four common business and enterprise use-cases. Each task requires an agent to navigate to a webpage, configure it appropriately, and extract complete datasets with well-defined schemas. We show that both LLMs with search capabilities and SOTA web agents struggle with these tasks, with a recall of 3% and 31%, respectively, despite higher performance on question-answering tasks. To address this challenge, we propose BardeenAgent, a novel framework that enables web agents to convert their execution into repeatable programs, and replay them at scale across pages with similar structure. BardeenAgent is also the first LLM agent to take advantage of the regular structure of HTML. In particular BardeenAgent constructs a generalizable CSS selector to capture all relevant items on the page, then fits the operations to extract the data. On the WebLists benchmark, BardeenAgent achieves 66% recall overall, more than doubling the performance of SOTA web agents, and reducing cost per output row by 3x.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2504.12682

Country: North America > United States > California (0.67)

Genre: Research Report (0.66)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Optimal Scraping Technique: CSS Selector, XPath, & RegEx - DataScienceCentral.com

#artificialintelligenceMar-1-2022, 15:23:59 GMT

In nearly all cases, what is required is a small sample from a very large file (e.g. Therefore, an essential part of scraping is searching through an HTML document and finding the correct information. How that should be done is the matter of some debate, preferences, experience, and types of data. While all scraping and parsing methods are "correct", some of them have benefits that may be vital when more optimization is required. Some methods may be easier for specific types of data.

css selector, regex, xpath, (13 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence (0.37)
Information Technology > Communications > Web (0.36)
Information Technology > Software (0.32)

Add feedback

Multi-modal Program Inference: a Marriage of Pre-trainedLanguage Models and Component-based Synthesis

Rahmani, Kia, Raza, Mohammad, Gulwani, Sumit, Le, Vu, Morris, Daniel, Radhakrishna, Arjun, Soares, Gustavo, Tiwari, Ashish

arXiv.org Artificial IntelligenceSep-3-2021

Multi-modal program synthesis refers to the task of synthesizing programs (code) from their specification given in different forms, such as a combination of natural language and examples. Examples provide a precise but incomplete specification, and natural language provides an ambiguous but more "complete" task description. Machine-learned pre-trained models (PTMs) are adept at handling ambiguous natural language, but struggle with generating syntactically and semantically precise code. Program synthesis techniques can generate correct code, often even from incomplete but precise specifications, such as examples, but they are unable to work with the ambiguity of natural languages. We present an approach that combines PTMs with component-based synthesis (CBS): PTMs are used to generate candidates programs from the natural language description of the task, which are then used to guide the CBS procedure to find the program that matches the precise examples-based specification. We use our combination approach to instantiate multi-modal synthesis systems for two programming domains: the domain of regular expressions and the domain of CSS selectors. Our evaluation demonstrates the effectiveness of our domain-agnostic approach in comparison to a state-of-the-art specialized system, and the generality of our approach in providing multi-modal program synthesis from natural language and examples in different programming domains.

expression, regular expression, synthesis, (15 more...)

arXiv.org Artificial Intelligence

2109.02445

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > New York > New York County > New York City (0.04)
Asia > Taiwan > Taiwan Province > Taipei (0.04)
(7 more...)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

GoodReads: Webscraping and Text Analysis with R (Part 1)

#artificialintelligenceSep-9-2016, 15:50:37 GMT

Inspired by this article about sentiment analysis and this guide to webscraping, I have decided to get my hands dirty by scraping and analyzing a sample of reviews on the website Goodreads. The goal of this project is to demonstrate a complete example, going from data collection to machine learning analysis, and to illustrate a few of the dead ends and mistakes I encountered on my journey. We'll be looking at the reviews for five popular romance books. I have voluntarily chosen books in the same genre in order to make comments text more homogeneous a priori; these five books are popular enough that I can easily pull a few thousands reviews for each, yielding a significant corpus with minimum effort. If you don't like romance books, feel free to replicate the analysis with your genre of choice!

artificial intelligence, natural language, text processing, (15 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.50)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.36)

Add feedback