AITopics | Information Retrieval

Collaborating Authors

Information Retrieval

Our accustomed systems of retrieving particular bits of information no longer fill the needs of many people. Searching traditional indexes of print publications has been aided by computerized databases, but still usually requires time-consuming serial searching of one database after the other, and then moving on to other methods of searching for internet sources. And what if the information being sought is a sound byte? A video clip? Yesterday's e-mail exchange between respected scientists? Artificial intelligence may hold the key to information retrieval in an age where widely different formats contain the information being sought, and the universe of knowledge is simply too big and growing too rapidly for successful searching to proceed at a human's slow speed.

News Overviews Instructional Materials AI-Alerts Classics

Ahana Cloud for Presto review: Fast SQL queries against data lakes

#artificialintelligenceApr-14-2021, 10:20:12 GMT

Hope springs eternal in the database business. While we're still hearing about data warehouses (fast analysis databases, typically featuring in-memory columnar storage) and tools that improve the ETL step (extract, transform, and load), we're also hearing about improvements in data lakes (which store data in its native format) and data federation (on-demand data integration of heterogeneous data stores). Presto keeps coming up as a fast way to perform SQL queries on big data that resides in data lake files. Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes. Presto allows querying data where it lives, including Hive, Cassandra, relational databases, and proprietary data stores.

ahana cloud, presto, query, (13 more...)

#artificialintelligence

Industry: Information Technology > Services (0.48)

Technology:

Information Technology > Databases (1.00)
Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.50)

Add feedback

FRAKE: Fusional Real-time Automatic Keyword Extraction

Zehtab-Salmasi, Aidin, Feizi-Derakhshi, Mohammad-Reza, Balafar, Mohamad-Ali

arXiv.org Artificial IntelligenceApr-10-2021

Keyword extraction is called identifying words or phrases that express the main concepts of texts in best. There is a huge amount of texts that are created every day and at all times through electronic infrastructure. So, it is practically impossible for humans to study and manage this volume of documents. However, the need for efficient and effective access to these documents is evident in various purposes. Weblogs, News, and technical notes are almost long texts, while the reader seeks to understand the concepts by topics or keywords to decide for reading the full text. To this aim, we use a combined approach that consists of two models of graph centrality features and textural features. In the following, graph centralities, such as degree, betweenness, eigenvector, and closeness centrality, have been used to optimally combine them to extract the best keyword among the candidate keywords extracted by the proposed method. Also, another approach has been introduced to distinguishing keywords among candidate phrases and considering them as a separate keyword. To evaluate the proposed method, seven datasets named, Semeval2010, SemEval2017, Inspec, fao30, Thesis100, pak2018 and WikiNews have been used, and results reported Precision, Recall, and F- measure.

dataset, extraction, keyword, (14 more...)

arXiv.org Artificial Intelligence

2104.0483

Country:

Asia > Middle East > Iran > East Azerbaijan Province > Tabriz (0.05)
Europe > France (0.04)
Oceania > New Zealand > North Island > Waikato (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

INODE: Building an End-to-End Data Exploration System in Practice [Extended Vision]

Amer-Yahia, Sihem, Koutrika, Georgia, Bastian, Frederic, Belmpas, Theofilos, Braschler, Martin, Brunner, Ursin, Calvanese, Diego, Fabricius, Maximilian, Gkini, Orest, Kosten, Catherine, Lanti, Davide, Litke, Antonis, Lücke-Tieke, Hendrik, Massucci, Francesco Alessandro, de Farias, Tarcisio Mendes, Mosca, Alessandro, Multari, Francesco, Papadakis, Nikolaos, Papadopoulos, Dimitris, Patil, Yogendra, Personnaz, Aurélien, Rull, Guillem, Sima, Ana, Smith, Ellery, Skoutas, Dimitrios, Subramanian, Srividya, Xiao, Guohui, Stockinger, Kurt

arXiv.org Artificial IntelligenceApr-9-2021

A full-fledged data exploration system must combine different access modalities with a powerful concept of guiding the user in the exploration process, by being reactive and anticipative both for data discovery and for data linking. Such systems are a real opportunity for our community to cater to users with different domain and data science expertise. We introduce INODE -- an end-to-end data exploration system -- that leverages, on the one hand, Machine Learning and, on the other hand, semantics for the purpose of Data Management (DM). Our vision is to develop a classic unified, comprehensive platform that provides extensive access to open datasets, and we demonstrate it in three significant use cases in the fields of Cancer Biomarker Reearch, Research and Innovation Policy Making, and Astrophysics. INODE offers sustainable services in (a) data modeling and linking, (b) integrated query processing using natural language, (c) guidance, and (d) data exploration through visualization, thus facilitating the user in discovering new insights. We demonstrate that our system is uniquely accessible to a wide range of users from larger scientific communities to the public. Finally, we briefly illustrate how this work paves the way for new research opportunities in DM.

inode, proceedings, query, (12 more...)

arXiv.org Artificial Intelligence

2104.04194

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.28)
North America > United States > Oregon > Multnomah County > Portland (0.04)
Europe > Germany (0.04)
(15 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Oncology (1.00)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
(2 more...)

Add feedback

Topological Data Analysis of Database Representations for Information Retrieval

Vlontzos, Athanasios, Cao, Yueqi, Schmidtke, Luca, Kainz, Bernhard, Monod, Anthea

arXiv.org Machine LearningApr-4-2021

Appropriately representing elements in a database so that queries may be accurately matched is a central task in information retrieval. This recently has been achieved by embedding the graphical structure of the database into a manifold so that the hierarchy is preserved. Persistent homology provides a rigorous characterization for the database topology in terms of both its hierarchy and connectivity structure. We compute persistent homology on a variety of datasets and show that some commonly used embeddings fail to preserve the connectivity. Moreover, we show that embeddings which successfully retain the database topology coincide in persistent homology. We introduce the dilation-invariant bottleneck distance to capture this effect, which addresses metric distortion on manifolds. We use it to show that distances between topology-preserving embeddings of databases are small.

bottleneck distance, homology, persistence diagram, (15 more...)

arXiv.org Machine Learning

2104.01672

Country:

Europe > United Kingdom > England > Greater London > London (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Europe > Germany (0.04)
(2 more...)

Genre: Research Report (0.82)

Industry: Health & Medicine > Therapeutic Area (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.70)

Add feedback

Evidence-based Verification for Real World Information Needs

Thorne, James, Glockner, Max, Vallejo, Gisela, Vlachos, Andreas, Gurevych, Iryna

arXiv.org Artificial IntelligenceApr-1-2021

Claim verification is the task of predicting the veracity of written statements against evidence. Previous large-scale datasets model the task as classification, ignoring the need to retrieve evidence, or are constructed for research purposes, and may not be representative of real-world needs. In this paper, we introduce a novel claim verification dataset with instances derived from search-engine queries, yielding 10,987 claims annotated with evidence that represent real-world information needs. For each claim, we annotate evidence from full Wikipedia articles with both section and sentence-level granularity. Our annotation allows comparison between two complementary approaches to verification: stance classification, and evidence extraction followed by entailment recognition. In our comprehensive evaluation, we find no significant difference in accuracy between these two approaches. This enables systems to use evidence extraction to summarize a rationale for an end-user while maintaining the accuracy when predicting a claim's veracity. With challenging claims and evidence documents containing hundreds of sentences, our dataset presents interesting challenges that are not captured in previous work -- evidenced through transfer learning experiments. We release code and data to support further research on this task.

annotator, computational linguistic, dataset, (14 more...)

arXiv.org Artificial Intelligence

2104.0064

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > China > Hong Kong (0.04)
Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)
(6 more...)

Genre: Research Report > Experimental Study > Negative Result (0.34)

Industry:

Education (0.67)
Government (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.34)

Add feedback

Classifying the Unstructured IT Service Desk Tickets Using Ensemble of Classifiers

C, Ramya, P, Paramesh S., S, Shreedhara K

arXiv.org Artificial IntelligenceMar-30-2021

Manual classification of IT service desk tickets may result in routing of the tickets to the wrong resolution group. Incorrect assignment of IT service desk tickets leads to reassignment of tickets, unnecessary resource utilization and delays the resolution time. Traditional machine learning algorithms can be used to automatically classify the IT service desk tickets. Service desk ticket classifier models can be trained by mining the historical unstructured ticket description and the corresponding label. The model can then be used to classify the new service desk ticket based on the ticket description. The performance of the traditional classifier systems can be further improved by using various ensemble of classification techniques. This paper brings out the three most popular ensemble methods ie, Bagging, Boosting and Voting ensemble for combining the predictions from different models to further improve the accuracy of the ticket classifier system. The performance of the ensemble classifier system is checked against the individual base classifiers using various performance metrics. Ensemble of classifiers performed well in comparison with the corresponding base classifiers. The advantages of building such an automated ticket classifier systems are simplified user interface, faster resolution time, improved productivity, customer satisfaction and growth in business. The real world service desk ticket data from a large enterprise IT infrastructure is used for our research purpose.

classifier, classifier model, ticket, (13 more...)

arXiv.org Artificial Intelligence

2103.15822

Country:

Asia > India > Karnataka (0.05)
Asia > Singapore (0.04)

Genre: Research Report (0.83)

Industry: Information Technology > Services (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.97)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.95)
(2 more...)

Add feedback

Effective Search Engine Optimization Requires a Commitment, Not a Campaign

#artificialintelligenceMar-27-2021, 08:31:47 GMT

A successful product and brand strategy develops brand awareness and identity that distinguishes a product from countless others based on just the brand name. A well-crafted strategy repeatedly reminds prospective and existing customers why they should buy a particular product over others with similar characteristics. Brand is just a perception, and perception will match reality over time. A brand is any trade mark through which a product is correctly identified and described by consumers. Therefore, the brand includes any action and remedy by which the product is identified.

customer, effective search engine optimization require, perception

#artificialintelligence

Industry: Marketing (0.66)

Technology:

Information Technology > Information Management > Search (0.85)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.40)

Add feedback

A PSO Strategy of Finding Relevant Web Documents using a New Similarity Measure

C, Dr. Ramya, S, Dr. Shreedhara K

arXiv.org Artificial IntelligenceMar-26-2021

In the world of the Internet and World Wide Web, which offers a tremendous amount of information, an increasing emphasis is being given to searching services and functionality. Currently, a majority of web portals offer their searching utilities, be it better or worse. These can search for the content within the sites, mainly text the textual content of documents. In this paper a novel similarity measure called SMDR (Similarity Measure for Documents Retrieval) is proposed to help retrieve more similar documents from the repository thus contributing considerably to the effectiveness of Web Information Retrieval (WIR) process. Bio-inspired PSO methodology is used with the intent to reduce the response time of the system and optimizes WIR process, hence contributes to the efficiency of the system. This paper also demonstrates a comparative study of the proposed system with the existing method in terms of accuracy, sensitivity, F-measure and specificity. Finally, extensive experiments are conducted on CACM collections. Better precision-recall rates are achieved than the existing system. Experimental results demonstrate the effectiveness and efficiency of the proposed system.

query, relevant document, retrieval, (13 more...)

arXiv.org Artificial Intelligence

2103.14371

Country:

Asia > India > Karnataka (0.05)
North America > United States > New Jersey > Middlesex County > Piscataway (0.04)
Asia > Singapore (0.04)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
(4 more...)

Add feedback

Enable feature reuse across accounts and teams using Amazon SageMaker Feature Store

#artificialintelligenceMar-23-2021, 20:14:54 GMT

Amazon SageMaker Feature Store is a new capability of Amazon SageMaker that helps data scientists and machine learning (ML) engineers securely store, discover, and share curated data used in training and prediction workflows. As organizations build data-driven applications using ML, they're constantly assembling and moving features between more and more functional teams. This constant movement of data can lead to inconsistencies in features and become a bottleneck when designing ML initiatives spanning multiple teams. For example, an ecommerce company might have several data science and engineering teams working on different aspects of their platform. The Core Search team focuses on query understanding and information retrieval tasks. The Product Success team solves problems involving customer reviews and feedback signals. The Personalization team uses clickstream and session data to create ML models for personalized recommendations.

feature group, feature store, pipeline, (15 more...)

#artificialintelligence

Country: North America > United States > New York (0.04)

Industry:

Retail > Online (0.51)
Information Technology > Services (0.34)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.49)

Add feedback

Database Systems Research in the Arab World

Communications of the ACMMar-23-2021, 10:43:32 GMT

From Hammurabi's stone tablets to papyrus rolls and leather-bound books, the Arab region has a rich history of recordkeeping and transactional systems that closely matches the evolution of data storage mediums. Even modern-day data management concepts like data provenance and lineage have historic roots in the Arab world; generations of scribes meticulously tracked Islamic prophetic narrations from one narrator to the next, forming lineage chains that originated from central Arabia. Database systems research has been part of the academic culture in the Arab world since the 1970s. High-quality computer science and database education was always available at several universities within the Arab region, such as Alexandria University in Egypt. Many students who went through these programs were drawn to database systems research and became globally prominent, such as Ramez Elmasri (professor at University of Texas, Arlington), Amr El Abbadi (professor at University of California, Santa Barbara), and Walid Aref (professor at Purdue University).

Communications of the ACM

Country:

Africa > Middle East > Egypt (0.25)
North America > United States > Texas > Tarrant County > Arlington (0.24)
North America > United States > California > Santa Barbara County > Santa Barbara (0.24)
(9 more...)

Industry: Information Technology > Security & Privacy (0.47)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Databases (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(4 more...)

Add feedback