AITopics

doi: 10.18653/v1/2020.coling-main.30

2105.10606

Country:

North America > United States > Texas (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
(10 more...)

Genre: Research Report (0.64)

Industry: Information Technology (0.46)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.73)

#artificialintelligenceMay-30-2021, 19:21:02 GMT

Science Fiction: Apple Builds A Search Engine

These efforts may have been energized by the 2018 addition of AI and Machine Learning expert John Giannandrea, a Silicon Valley veteran who was …

apple build, science fiction, search engine

Country: North America > United States > California (0.52)

Industry: Media > News (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.52)
Information Technology > Information Management > Search (0.40)
Information Technology > Artificial Intelligence > Science Fiction (0.40)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.40)

#artificialintelligenceMay-27-2021, 11:08:13 GMT

Database Workload Characterization with Query Plan Encoders

Smart databases are adopting artificial intelligence (AI) technologies to achieve instance optimality, and in the future, databases will come with prepackaged AI models within their core components. The reason is that every database runs on different workloads, demands specific resources, and settings to achieve optimal performance. It prompts the necessity to understand workloads running in the system along with their features comprehensively, which we dub as workload characterization. To address this workload characterization problem, we propose our query plan encoders that learn essential features and their correlations from query plans. Our pretrained encoders capture the structural and the computational performance of queries independently.

database workload characterization, encoder, query plan encoder, (1 more...)

Technology: Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (1.00)

#artificialintelligenceMay-26-2021, 02:00:21 GMT

Postdoc Position in Computer Science - Applied Machine Learning and Information Retrieval

Postdoctoral fellowship in Applied Machine Learning and Information Retrieval is available at the Department of Computer Science, University of Copenhagen, Denmark. The application, in English, should be submitted electronically by clicking APPLY ONLINE above. The postdoctoral fellow will join the Machine Learning Section at DIKU. The Machine Learning section is among the leading research environments in Artificial Intelligence and Web & Information Retrieval in Europe (in the top 5 for 2020, according to csrankings.org), with a strong presence at top-tier conferences, continuous collaboration in international & national research networks, and solid synergies with big tech, small tech, and industry. The Machine Learning section consists of a vibrant selection of approximately 65 talented researchers (40 of whom are PhD and postdoctoral fellows) from around the world with a diverse set of backgrounds and a common incessant scientific curiosity and openness to innovation.

applied machine learning, computer science, machine learning and information retrieval, (4 more...)

Country: Europe > Denmark > Capital Region > Copenhagen (0.29)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.96)

arXiv.org Artificial IntelligenceMay-25-2021

Database Workload Characterization with Query Plan Encoders

Paul, Debjyoti, Cao, Jie, Li, Feifei, Srikumar, Vivek

Smart databases are adopting artificial intelligence (AI) technologies to achieve {\em instance optimality}, and in the future, databases will come with prepackaged AI models within their core components. The reason is that every database runs on different workloads, demands specific resources, and settings to achieve optimal performance. It prompts the necessity to understand workloads running in the system along with their features comprehensively, which we dub as workload characterization. To address this workload characterization problem, we propose our query plan encoders that learn essential features and their correlations from query plans. Our pretrained encoders capture the {\em structural} and the {\em computational performance} of queries independently. We show that our pretrained encoders are adaptable to workloads that expedite the transfer learning process. We performed independent assessments of structural encoder and performance encoders with multiple downstream tasks. For the overall evaluation of our query plan encoders, we architect two downstream tasks (i) query latency prediction and (ii) query classification. These tasks show the importance of feature-based workload characterization. We also performed extensive experiments on individual encoders to verify the effectiveness of representation learning and domain adaptability.

encoder, operator, query, (14 more...)

2105.12287

Country:

North America > United States > Utah (0.04)
North America > United States > New York (0.04)
North America > United States > California > Los Angeles County > Los Angeles (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Brochier, Robin, Béchet, Frédéric

Predicting Links on Wikipedia with Anchor Text Information

arXiv.org Artificial IntelligenceMay-25-2021

Wikipedia, the largest open-collaborative online encyclopedia, is a corpus of documents bound together by internal hyperlinks. These links form the building blocks of a large network whose structure contains important information on the concepts covered in this encyclopedia. The presence of a link between two articles, materialised by an anchor text in the source page pointing to the target page, can increase readers' understanding of a topic. However, the process of linking follows specific editorial rules to avoid both under-linking and over-linking. In this paper, we study the transductive and the inductive tasks of link prediction on several subsets of the English Wikipedia and identify some key challenges behind automatic linking based on anchor text information. We propose an appropriate evaluation sampling methodology and compare several algorithms. Moreover, we propose baseline models that provide a good estimation of the overall difficulty of the tasks.

algorithm, prediction, wikipedia, (15 more...)

doi: 10.1145/3404835.3462994

2105.11734

Country:

Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.05)
North America > Canada (0.04)
Europe > United Kingdom (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (0.64)

Industry: Government > Regional Government > North America Government > United States Government (0.47)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Information Management > Search (0.90)
Information Technology > Data Science > Data Mining (0.89)
(3 more...)

#artificialintelligenceMay-21-2021, 10:10:30 GMT

Faiss: A library for efficient similarity search - Facebook Engineering

This month, we released Facebook AI Similarity Search (Faiss), a library that allows us to quickly search for multimedia documents that are similar to each other -- a challenge where traditional query search engines fall short. We've built nearest-neighbor search implementations for billion-scale data sets that are some 8.5x faster than the previous reported state-of-the-art, along with the fastest k-selection algorithm on the GPU known in the literature. This lets us break some records, including the first k-nearest-neighbor graph constructed on 1 billion high-dimensional vectors. Traditional databases are made up of structured tables containing symbolic information. For example, an image collection would be represented as a table with one row per indexed photo.

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.90)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.54)

arXiv.org Artificial IntelligenceMay-18-2021

Conversations with Search Engines: SERP-based Conversational Response Generation

Ren, Pengjie, Chen, Zhumin, Ren, Zhaochun, Kanoulas, Evangelos, Monz, Christof, de Rijke, Maarten

In this paper, we address the problem of answering complex information needs by conversing conversations with search engines, in the sense that users can express their queries in natural language, and directly receivethe information they need from a short system response in a conversational manner. Recently, there have been some attempts towards a similar goal, e.g., studies on Conversational Agents (CAs) and Conversational Search (CS). However, they either do not address complex information needs, or they are limited to the development of conceptual frameworks and/or laboratory-based user studies. We pursue two goals in this paper: (1) the creation of a suitable dataset, the Search as a Conversation (SaaC) dataset, for the development of pipelines for conversations with search engines, and (2) the development of astate-of-the-art pipeline for conversations with search engines, the Conversations with Search Engines (CaSE), using this dataset. SaaC is built based on a multi-turn conversational search dataset, where we further employ workers from a crowdsourcing platform to summarize each relevant passage into a short, conversational response. CaSE enhances the state-of-the-art by introducing a supporting token identification module and aprior-aware pointer generator, which enables us to generate more accurate responses. We carry out experiments to show that CaSE is able to outperform strong baselines. We also conduct extensive analyses on the SaaC dataset to show where there is room for further improvement beyond CaSE. Finally, we release the SaaC dataset and the code for CaSE and all models used for comparison to facilitate future research on this topic.

dataset, information, query, (14 more...)

2004.14162

Country:

Asia > India (0.14)
Europe > Netherlands > North Holland > Amsterdam (0.05)
Asia > China > Shandong Province > Qingdao (0.04)
(7 more...)

Genre:

Research Report > Experimental Study (0.46)
Research Report > New Finding (0.46)

Industry:

Energy (0.93)
Health & Medicine (0.93)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)

#artificialintelligenceMay-17-2021, 23:40:07 GMT

Google AI Researchers Are Dreaming Up a New Species of Search Engine

Imagine a collection of books--maybe millions or even billions of them--haphazardly tossed by publishers into a heaping pile in a field. Every day the pile grows exponentially. Those books are brimming with knowledge and answers. But how would a seeker find them? Lacking organization, the books are useless. This is the raw internet in all its unfiltered glory.

algorithm, information, language model, (11 more...)

Genre: Summary/Review (0.31)

Industry: Health & Medicine (0.49)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.72)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.50)

Yousef, Malik, Qundus, Jamal Al, Peikert, Silvio, Paschke, Adrian

TopicsRanksDC: Distance-based Topic Ranking applied on Two-Class Data

arXiv.org Artificial IntelligenceMay-17-2021

In this paper, we introduce a novel approach named TopicsRanksDC for topics ranking based on the distance between two clusters that are generated by each topic. We assume that our data consists of text documents that are associated with two-classes. Our approach ranks each topic contained in these text documents by its significance for separating the two-classes. Firstly, the algorithm detects topics using Latent Dirichlet Allocation (LDA). The words defining each topic are represented as two clusters, where each one is associated with one of the classes. We compute four distance metrics, Single Linkage, Complete Linkage, Average Linkage and distance between the centroid. We compare the results of LDA topics and random topics. The results show that the rank for LDA topics is much higher than random topics. The results of TopicsRanksDC tool are promising for future work to enable search engines to suggest related topics.

distance-based topic ranking, random topic, topicsranksdc, (14 more...)

2105.07826

Country:

Europe > Germany > Berlin (0.04)
Asia > Middle East > Jordan (0.04)
Asia > Middle East > Israel (0.04)

Genre: Research Report > New Finding (0.66)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.67)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.67)