Dr Web: a modern, query-based web data retrieval engine
Prifti, Ylli, Provetti, Alessandro, de Meo, Pasquale
–arXiv.org Artificial Intelligence
Counters are generally in the form of users, number of pages, number of websites, number of tweets, etc. In reality, it is a non-trivial quest to determine the memory size of the internet. The situation becomes more challenging if we consider the deep web, which is usually estimated to be much larger than the visible web. Nevertheless, the indeterministic characteristic of the memory size of the internet, the number is bound to be large and ever-growing. The amount of data presents unprecedented opportunities for data mining and information extraction from the web. This has proven to be true given the number of scientific papers and research based on data from the web. However, the web is unstructured. Previous tentatives to apply a machine-readable structure [1] to the web have failed to become large-scale standards.
arXiv.org Artificial Intelligence
Apr-9-2025
- Country:
- North America > United States (0.68)
- Genre:
- Research Report (0.50)
- Industry:
- Information Technology > Services (0.69)
- Technology:
- Information Technology
- Data Science > Data Mining (1.00)
- Communications
- Web (1.00)
- Social Media (1.00)
- Artificial Intelligence > Natural Language
- Information Extraction (0.48)
- Information Retrieval > Query Processing (0.47)
- Information Technology