AITopics | Information Retrieval

Collaborating Authors

Information Retrieval

Our accustomed systems of retrieving particular bits of information no longer fill the needs of many people. Searching traditional indexes of print publications has been aided by computerized databases, but still usually requires time-consuming serial searching of one database after the other, and then moving on to other methods of searching for internet sources. And what if the information being sought is a sound byte? A video clip? Yesterday's e-mail exchange between respected scientists? Artificial intelligence may hold the key to information retrieval in an age where widely different formats contain the information being sought, and the universe of knowledge is simply too big and growing too rapidly for successful searching to proceed at a human's slow speed.

News Overviews Instructional Materials AI-Alerts Classics

Beyond Social Media Analytics: Understanding Human Behaviour and Deep Emotion using Self Structuring Incremental Machine Learning

Bandaragoda, Tharindu

arXiv.org Machine LearningSep-5-2020

This thesis develops a conceptual framework considering social data as representing the surface layer of a hierarchy of human social behaviours, needs and cognition which is employed to transform social data into representations that preserve social behaviours and their causalities. Based on this framework two platforms were built to capture insights from fast-paced and slow-paced social data. For fast-paced, a self-structuring and incremental learning technique was developed to automatically capture salient topics and corresponding dynamics over time. An event detection technique was developed to automatically monitor those identified topic pathways for significant fluctuations in social behaviours using multiple indicators such as volume and sentiment. This platform is demonstrated using two large datasets with over 1 million tweets. The separated topic pathways were representative of the key topics of each entity and coherent against topic coherence measures. Identified events were validated against contemporary events reported in news. Secondly for the slow-paced social data, a suite of new machine learning and natural language processing techniques were developed to automatically capture self-disclosed information of the individuals such as demographics, emotions and timeline of personal events. This platform was trialled on a large text corpus of over 4 million posts collected from online support groups. This was further extended to transform prostate cancer related online support group discussions into a multidimensional representation and investigated the self-disclosed quality of life of patients (and partners) against time, demographics and clinical factors. The capabilities of this extended platform have been demonstrated using a text corpus collected from 10 prostate cancer online support groups comprising of 609,960 prostate cancer discussions and 22,233 patients.

information retrieval, machine learning, patient-reported information multidimensional exploration, (20 more...)

arXiv.org Machine Learning

2009.09078

Country:

Asia > Russia (0.45)
North America > United States > New York > New York County > New York City (0.14)
Asia > Middle East > Iran (0.14)
(38 more...)

Genre:

Research Report > Strength High (1.00)
Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
(3 more...)

Industry:

Health & Medicine > Therapeutic Area > Urology (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)
(3 more...)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
(9 more...)

Add feedback

KILT: a Benchmark for Knowledge Intensive Language Tasks

Petroni, Fabio, Piktus, Aleksandra, Fan, Angela, Lewis, Patrick, Yazdani, Majid, De Cao, Nicola, Thorne, James, Jernite, Yacine, Plachouras, Vassilis, Rocktäschel, Tim, Riedel, Sebastian

arXiv.org Artificial IntelligenceSep-4-2020

Challenging problems such as open-domain question answering, fact checking, slot filling and entity linking require access to large, external knowledge sources. While some models do well on individual tasks, developing general models is difficult as each task might require computationally expensive indexing of custom knowledge sources, in addition to dedicated infrastructure. To catalyze research on models that condition on specific information in large textual resources, we present a benchmark for knowledge-intensive language tasks (KILT). All tasks in KILT are grounded in the same snapshot of Wikipedia, reducing engineering turnaround through the re-use of components, as well as accelerating research into task-agnostic memory architectures. We test both task-specific and general baselines, evaluating downstream performance in addition to the ability of the models to provide provenance. We find that a shared dense vector index coupled with a seq2seq model is a strong baseline, outperforming more tailor-made approaches for fact checking, open-domain question answering and dialogue, and yielding competitive results on entity linking and slot filling, by generating disambiguated text. KILT data and code are available at https://github.com/facebookresearch/KILT.

information retrieval, machine learning, question answering, (17 more...)

arXiv.org Artificial Intelligence

2009.02252

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > North Macedonia (0.05)
Europe > Romania > București - Ilfov Development Region > Municipality of Bucharest > Bucharest (0.04)
(5 more...)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Sports > Soccer (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.69)
(3 more...)

Add feedback

HyperBench: A Benchmark and Tool for Hypergraphs and Empirical Findings

Fischl, Wolfgang, Gottlob, Georg, Longo, Davide Mario, Pichler, Reinhard

arXiv.org Artificial IntelligenceSep-2-2020

To cope with the intractability of answering Conjunctive Queries (CQs) and solving Constraint Satisfaction Problems (CSPs), several notions of hypergraph decompositions have been proposed -- giving rise to different notions of width, noticeably, plain, generalized, and fractional hypertree width (hw, ghw, and fhw). Given the increasing interest in using such decomposition methods in practice, a publicly accessible repository of decomposition software, as well as a large set of benchmarks, and a web-accessible workbench for inserting, analyzing, and retrieving hypergraphs are called for. We address this need by providing (i) concrete implementations of hypergraph decompositions (including new practical algorithms), (ii) a new, comprehensive benchmark of hypergraphs stemming from disparate CQ and CSP collections, and (iii) HyperBench, our new web-inter\-face for accessing the benchmark and the results of our analyses. In addition, we describe a number of actual experiments we carried out with this new infrastructure.

artificial intelligence, hypergraph, natural language, (17 more...)

arXiv.org Artificial Intelligence

2009.01769

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(15 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Software (1.00)
Information Technology > Databases (1.00)
Information Technology > Communications (1.00)
(2 more...)

Add feedback

Emergent Web Intelligence: Advanced Information Retrieval - Programmer Books

#artificialintelligenceSep-1-2020, 23:30:09 GMT

Emergent Web Intelligence: Advanced Information Retrieval PDF Download for free: Book Description: This volume reviews cutting-edge technologies and insights related to XML-based and multimedia information access and data retrieval. And by applying new techniques to real-world scenarios, it details how organizations can gain competitive advantages.

advanced information retrieval, artificial intelligence, natural language, (2 more...)

#artificialintelligence

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.60)

Add feedback

The search engine boss who wants to help us all plant trees

BBC NewsAug-30-2020, 23:05:03 GMT

This week we speak to Christian Kroll, the founder and chief executive of internet search engine Ecosia. Christian Kroll wants nothing less than to change the world. "I want to make the world a greener, better place," he says. "I also want to prove that there is a more ethical alternative to the kind of greedy capitalism that is coming close to destroying the planet." The 35-year-old German is the boss of search engine Ecosia, which has an unusual but very environmentally friendly business model - it gives away most of its profits to enable trees to be planted around the world. Founded by Christian in 2009, Ecosia makes its money in the same way as Google - from advertising revenues.

artificial intelligence, information retrieval, natural language, (16 more...)

BBC News

Country:

North America > Haiti (0.16)
South America > Brazil (0.05)
Europe > Germany > Bavaria > Middle Franconia > Nuremberg (0.05)
(4 more...)

Industry:

Media > News (0.40)
Energy (0.32)
Information Technology (0.31)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.87)

Add feedback

SOLAR: Sparse Orthogonal Learned and Random Embeddings

Medini, Tharun, Chen, Beidi, Shrivastava, Anshumali

arXiv.org Artificial IntelligenceAug-30-2020

Dense embedding models are commonly deployed in commercial search engines, wherein all the document vectors are pre-computed, and near-neighbor search (NNS) is performed with the query vector to find relevant documents. However, the bottleneck of indexing a large number of dense vectors and performing an NNS hurts the query time and accuracy of these models. In this paper, we argue that high-dimensional and ultra-sparse embedding is a significantly superior alternative to dense low-dimensional embedding for both query efficiency and accuracy. Extreme sparsity eliminates the need for NNS by replacing them with simple lookups, while its high dimensionality ensures that the embeddings are informative even when sparse. However, learning extremely high dimensional embeddings leads to blow up in the model size. To make the training feasible, we propose a partitioning algorithm that learns such high dimensional embeddings across multiple GPUs without any communication. This is facilitated by our novel asymmetric mixture of Sparse, Orthogonal, Learned and Random (SOLAR) Embeddings. The label vectors are random, sparse, and near-orthogonal by design, while the query vectors are learned and sparse. We theoretically prove that our way of one-sided learning is equivalent to learning both query and label embeddings. With these unique properties, we can successfully train 500K dimensional SOLAR embeddings for the tasks of searching through 1.6M books and multi-label classification on the three largest public datasets. We achieve superior precision and recall compared to the respective state-of-the-art baselines for each of the tasks with up to 10 times faster speed.

information retrieval, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2008.13225

Country:

Asia > Afghanistan > Parwan Province > Charikar (0.04)
North America > United States > New York (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(2 more...)

Add feedback

Seasonal-adjustment Based Feature Selection Method for Large-scale Search Engine Logs

Tran, Thien Q., Sakuma, Jun

arXiv.org Machine LearningAug-21-2020

Search engine logs have a great potential in tracking and predicting outbreaks of infectious disease. More precisely, one can use the search volume of some search terms to predict the infection rate of an infectious disease in nearly real-time. However, conducting accurate and stable prediction of outbreaks using search engine logs is a challenging task due to the following two-way instability characteristics of the search logs. First, the search volume of a search term may change irregularly in the short-term, for example, due to environmental factors such as the amount of media or news. Second, the search volume may also change in the long-term due to the demographic change of the search engine. That is to say, if a model is trained with such search logs with ignoring such characteristic, the resulting prediction would contain serious mispredictions when these changes occur. In this work, we proposed a novel feature selection method to overcome this instability problem. In particular, we employ a seasonal-adjustment method that decomposes each time series into three components: seasonal, trend and irregular component and build prediction models for each component individually. We also carefully design a feature selection method to select proper search terms to predict each component. We conducted comprehensive experiments on ten different kinds of infectious diseases. The experimental results show that the proposed method outperforms all comparative methods in prediction accuracy for seven of ten diseases, in both now-casting and forecasting setting. Also, the proposed method is more successful in selecting search terms that are semantically related to target diseases.

information retrieval, machine learning, search term, (20 more...)

arXiv.org Machine Learning

doi: 10.1145/3292500.3330766

2008.09727

Country:

Asia > Japan > Honshū > Kantō > Ibaraki Prefecture > Tsukuba (0.05)
North America > United States > Alaska > Anchorage Municipality > Anchorage (0.04)
Europe > United Kingdom > England (0.04)
(2 more...)

Genre: Research Report > New Finding (0.48)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Deep Search Query Intent Understanding

Liu, Xiaowei, Guo, Weiwei, Gao, Huiji, Long, Bo

arXiv.org Artificial IntelligenceAug-18-2020

Understanding a user's query intent behind a search is critical for modern search engine success. Accurate query intent prediction allows the search engine to better serve the user's need by rendering results from more relevant categories. This paper aims to provide a comprehensive learning framework for modeling query intent under different stages of a search. We focus on the design for 1) predicting users' intents as they type in queries on-the-fly in typeahead search using character-level models; and 2) accurate word-level intent prediction models for complete queries. Various deep learning components for query text understanding are experimented. Offline evaluation and online A/B test experiments show that the proposed methods are effective in understanding query intent and efficient to scale for online search systems.

information retrieval, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2008.06759

Country:

Europe > Ireland > Connaught > County Galway > Galway (0.05)
North America > United States > California > Santa Clara County > Mountain View (0.04)
North America > United States > California > San Diego County > San Diego (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Services (0.93)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

COLD: Towards the Next Generation of Pre-Ranking System

Wang, Zhe, Zhao, Liqin, Jiang, Biye, Zhou, Guorui, Zhu, Xiaoqiang, Gai, Kun

arXiv.org Artificial IntelligenceAug-17-2020

Multi-stage cascade architecture exists widely in many industrial systems such as recommender systems and online advertising, which often consists of sequential modules including matching, pre-ranking, ranking, etc. For a long time, it is believed pre-ranking is just a simplified version of the ranking module, considering the larger size of the candidate set to be ranked. Thus, efforts are made mostly on simplifying ranking model to handle the explosion of computing power for online inference. In this paper, we rethink the challenge of the pre-ranking system from an algorithm-system co-design view. Instead of saving computing power with restriction of model architecture which causes loss of model performance, here we design a new pre-ranking system by joint optimization of both the pre-ranking model and the computing power it costs. We name it COLD (Computing power cost-aware Online and Lightweight Deep pre-ranking system). COLD beats SOTA in three folds: (i) an arbitrary deep model with cross features can be applied in COLD under a constraint of controllable computing power cost. (ii) computing power cost is explicitly reduced by applying optimization tricks for inference acceleration. This further brings space for COLD to apply more complex deep models to reach better performance. (iii) COLD model works in an online learning and severing manner, bringing it excellent ability to handle the challenge of the data distribution shift. Meanwhile, the fully online pre-ranking system of COLD provides us with a flexible infrastructure that supports efficient new model developing and online A/B testing.Since 2019, COLD has been deployed in almost all products involving the pre-ranking module in the display advertising system in Alibaba, bringing significant improvements.

information retrieval, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2007.16122

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.65)

Industry:

Information Technology > Services (0.88)
Marketing (0.87)
Education > Educational Setting > Online (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.46)

Add feedback

Mozilla and Google renew Firefox search agreement

#artificialintelligenceAug-16-2020, 23:56:27 GMT

Mozilla and Google have extended their arrangement to keep Google the default search engine within the Firefox browser until at least 2023, ZDNet reported. The companies have not formally announced the deal, which ZDNet estimates is worth between $400 and $450 million per year, but are expected to announce it later this fall. The current arrangement was due to expire at the end of 2020. "Mozilla's search partnership with Google is ongoing, with Google as the default search provider in the Firefox browser in many places around the world, Mozilla spokesperson Justin O'Kelly said in an email to The Verge. "We've recently extended the partnership, and the relationship isn't changing."

google renew firefox search agreement, information retrieval, natural language, (9 more...)

#artificialintelligence

Country:

Europe > Russia (0.07)
Asia > Russia (0.07)
Asia > China (0.07)

Genre: Press Release (0.40)

Technology:

Information Technology > Information Management > Search (0.70)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.70)

Add feedback