TakeLab Retriever: AI-Driven Search Engine for Articles from Croatian News Outlets
Dukić, David, Petričević, Marin, Ćurković, Sven, Šnajder, Jan
–arXiv.org Artificial Intelligence
TakeLab Retriever is an AI-driven search engine designed to discover, collect, and semantically analyze news articles from Croatian news outlets. It offers a unique perspective on the history and current landscape of Croatian online news media, making it an essential tool for researchers seeking to uncover trends, patterns, and correlations that general-purpose search engines cannot provide. TakeLab retriever utilizes cutting-edge natural language processing (NLP) methods, enabling users to sift through articles using named entities, phrases, and topics through the web application. This technical report is divided into two parts: the first explains how TakeLab Retriever is utilized, while the second provides a detailed account of its design. In the second part, we also address the software engineering challenges involved and propose solutions for developing a microservice-based semantic search engine capable of handling over ten million news articles published over the past two decades.
arXiv.org Artificial Intelligence
Nov-29-2024
- Country:
- North America > United States (0.04)
- Europe
- Ukraine (0.04)
- Croatia > Zagreb County
- Zagreb (0.04)
- Genre:
- Research Report (0.84)
- Technology: