AITopics | Information Retrieval

Collaborating Authors

Information Retrieval

Our accustomed systems of retrieving particular bits of information no longer fill the needs of many people. Searching traditional indexes of print publications has been aided by computerized databases, but still usually requires time-consuming serial searching of one database after the other, and then moving on to other methods of searching for internet sources. And what if the information being sought is a sound byte? A video clip? Yesterday's e-mail exchange between respected scientists? Artificial intelligence may hold the key to information retrieval in an age where widely different formats contain the information being sought, and the universe of knowledge is simply too big and growing too rapidly for successful searching to proceed at a human's slow speed.

News Overviews Instructional Materials AI-Alerts Classics

Unsupervised Sentiment Analysis for Code-mixed Data

Yadav, Siddharth, Chakraborty, Tanmoy

arXiv.org Artificial IntelligenceJan-20-2020

Code-mixing is the practice of alternating between two or more languages. Mostly observed in multilingual societies, its occurrence is increasing and therefore its importance. A major part of sentiment analysis research has been monolingual, and most of them perform poorly on code-mixed text. In this work, we introduce methods that use different kinds of multilingual and cross-lingual embeddings to efficiently transfer knowledge from monolingual text to code-mixed text for sentiment analysis of code-mixed text. Our methods can handle code-mixed text through a zero-shot learning. Our methods beat state-of-the-art on English-Spanish code-mixed sentiment analysis by absolute 3\% F1-score. We are able to achieve 0.58 F1-score (without parallel corpus) and 0.62 F1-score (with parallel corpus) on the same benchmark in a zero-shot way as compared to 0.68 F1-score in supervised settings. Our code is publicly available.

code-mixed text, computational linguistic, proceedings, (14 more...)

arXiv.org Artificial Intelligence

2001.11384

Country:

Europe > Italy > Tuscany > Florence (0.05)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
Asia > Indonesia > Bali (0.04)
(14 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
(2 more...)

Add feedback

Google Search Console unparsable structured data report data issue - Search Engine Land

#artificialintelligenceJan-18-2020, 13:07:50 GMT

Google has informed us that you may see a spike in errors in the unparsable structured data report within Google Search Console. This is a bug in the reporting system and you do not need to worry. The issue happened between January 13, 2020 and January 16, 2020. Google wrote on the data anomalies page "Some users may see a spike in unparsable structured data errors. This was due to an internal misconfiguration that will be fixed soon, and can be ignored."

google search console, google search console unparsable, search engine land, (3 more...)

#artificialintelligence

Industry: Information Technology > Services (0.68)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.40)

Add feedback

On Making A Multilingual Search Engine

#artificialintelligenceJan-17-2020, 13:09:00 GMT

You can read more about USE in this paper. Let's first read the data. Because the quora dataset is huge and takes a lot of time, we will take only 1% of the data. This will take around 3 minutes for encoding and indexing. It will have 4000 questions.

let, multilingual search engine, query

#artificialintelligence

Technology:

Information Technology > Information Management > Search (0.53)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.53)

Add feedback

University of Warwick Job Search: Research Fellow or Senior Research Fellow (102493-0120)

#artificialintelligenceJan-16-2020, 18:49:31 GMT

Research Fellow or Senior Research Fellow (Deep Learning for Health Trajectory Perdiction) The full-time fixed term post is available until 31st March 2023 (approximately 3 years). You will work with the Principal Investigator (Dr Leandro Pecchia), the project partners and the Warwick GATEKEEPER team for the successful execution of the project. Further information on the project can be read here https://www.gatekeeper-project.eu/ You will have a PhD in Biomedical Engineering or in a relevant discipline (e.g., Computer Science, Information Engineering, Applied Math or similar disciplines). The level of appointment (Research or Senior Research Fellow) will be determined by the successful candidate--s skills and experience, including a proven ability and achievement in research and the ability to generate external funding to support research projects.

research fellow, senior research fellow, university, (8 more...)

#artificialintelligence

Industry:

Health & Medicine (0.58)
Education (0.42)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.41)
Information Technology > Information Management > Search (0.40)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.40)

Add feedback

Privacy concerns over Russia's 'most popular search engine' Yandex as its uses facial recognition

Daily Mail - Science & techJan-16-2020, 16:36:19 GMT

A Russian search engine is being accused of providing an unregulated facial recognition system to members of the public -- violating personal privacy. Experts have slammed the feature as'poor' and'creepy' while dubbing it a'definite privacy concern'. Yandex, much like Google, Bing and other search engines, allows users to input an image and see similar results. But only Yandex, which claims to conduct more than 50 per cent of Russian searches on Android, produces images of the exact same person. MailOnline tested the image search facilities of Yandex, Bing, Google and specialist site TinEye by submitting a photo that was not available online.

engine, search engine, yandex, (13 more...)

Daily Mail - Science & tech

Country:

Europe > Russia (0.42)
Asia > Russia (0.42)
North America > United States > Nevada > Clark County > Las Vegas (0.05)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.98)
(2 more...)

Add feedback

Verizon launches 'privacy-focused' search engine leaving some skeptical because of the firm's past

Daily Mail - Science & techJan-14-2020, 20:37:06 GMT

There is a new internet watchdog in town and it is powered by Verizon. The tech giant released a'privacy-focused' search engine, called OneSearch, which encrypts searches, leaves results unfiltered and claims to not store or transfer user information. The platform is also Advanced Privacy Mode enabled, meaning all search result links expire within an hour. However, some users are suspicions about the platform, as Verizon has come under fire in the past for its tracking customers as on the internet without permission. Verizon launched a'privacy-focused' search engine, called OneSearch.

onesearch, search engine, verizon, (12 more...)

Daily Mail - Science & tech

Country: North America > United States > New York (0.06)

Industry: Information Technology > Networks (1.00)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.89)

Add feedback

Modeling Product Search Relevance in e-Commerce

Iyer, Rahul Radhakrishnan, Kohli, Rohan, Prabhumoye, Shrimai

arXiv.org Machine LearningJan-14-2020

With the rapid growth of e-Commerce, online product search has emerged as a popular and effective paradigm for customers to find desired products and engage in online shopping. However, there is still a big gap between the products that customers really desire to purchase and relevance of products that are suggested in response to a query from the customer. In this paper, we propose a robust way of predicting relevance scores given a search query and a product, using techniques involving machine learning, natural language processing and information retrieval. We compare conventional information retrieval models such as BM25 and Indri with deep learning models such as word2vec, sentence2vec and paragraph2vec. We share some of our insights and findings from our experiments.

arxiv preprint arxiv, relevance score, search term, (13 more...)

arXiv.org Machine Learning

2001.0498

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.05)
North America > United States > New Jersey > Middlesex County > Piscataway (0.04)
North America > Canada (0.04)
Asia (0.04)

Genre: Research Report (0.82)

Industry:

Information Technology > Services > e-Commerce Services (0.91)
Retail (0.66)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.94)
(2 more...)

Add feedback

Humpty Dumpty: Controlling Word Meanings via Corpus Poisoning

Schuster, Roei, Schuster, Tal, Meri, Yoav, Shmatikov, Vitaly

arXiv.org Machine LearningJan-14-2020

Word embeddings, i.e., low-dimensional vector representations such as GloVe and SGNS, encode word "meaning" in the sense that distances between words' vectors correspond to their semantic proximity. This enables transfer learning of semantics for a variety of natural language processing tasks. Word embeddings are typically trained on large public corpora such as Wikipedia or Twitter. We demonstrate that an attacker who can modify the corpus on which the embedding is trained can control the "meaning" of new and existing words by changing their locations in the embedding space. We develop an explicit expression over corpus features that serves as a proxy for distance between words and establish a causative relationship between its values and embedding distances. We then show how to use this relationship for two adversarial objectives: (1) make a word a top-ranked neighbor of another word, and (2) move a word from one semantic cluster to another. An attack on the embedding can affect diverse downstream tasks, demonstrating for the first time the power of data poisoning in transfer learning scenarios. We use this attack to manipulate query expansion in information retrieval systems such as resume search, make certain names more or less visible to named entity recognition models, and cause new words to be translated to a particular target word regardless of the language. Finally, we show how the attacker can generate linguistically likely corpus modifications, thus fooling defenses that attempt to filter implausible sentences from the corpus using a language model.

attacker, objective, proximity, (16 more...)

arXiv.org Machine Learning

2001.04935

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)

Genre:

Research Report (0.64)
Workflow (0.46)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)

Add feedback

Search engine for Japanese sex hotels announces security breach ZDNet

#artificialintelligenceJan-11-2020, 05:39:24 GMT

HappyHotel, a Japanese search engine for finding and booking rooms in "love hotels," disclosed a security breach at the end of last year. Love hotels are hotels built and operated primarily for allowing guests privacy for sexual activities. Love hotels, also known as sex hotels, are used by both married couples and cheating spouses, alike, and are found all over the world, but they are particularly popular in East Asia, and especially Japan. HappyHotel.jp is a website that operates similarly to Booking.com, but lets registered users search and book rooms in love hotels across Japan. In a message posted on its website, Almex, the company behind the service, said it detected unauthorized access to its servers on December 22, last year.

hotel, hotel announce security breach zdnet, website, (10 more...)

#artificialintelligence

Country:

Asia > Japan (0.51)
Asia > East Asia (0.26)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Information Management > Search (0.65)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.65)

Add feedback

Inductive Document Network Embedding with Topic-Word Attention

Brochier, Robin, Guille, Adrien, Velcin, Julien

arXiv.org Machine LearningJan-10-2020

Document network embedding aims at learning representations for a structured text corpus i.e. when documents are linked to each other. Recent algorithms extend network embedding approaches by incorporating the text content associated with the nodes in their formulations. In most cases, it is hard to interpret the learned representations. Moreover, little importance is given to the generalization to new documents that are not observed within the network. In this paper, we propose an interpretable and inductive document network embedding method. We introduce a novel mechanism, the Topic-Word Attention (TW A), that generates document representations based on the interplay between word and topic representations. We train these word and topic vectors through our general model, Inductive Document Network Embedding (IDNE), by leveraging the connections in the document network. Quantitative evaluations show that our approach achieves state-of-the-art performance on various networks and we qualitatively show that our model produces meaningful and interpretable representations of the words, topics and documents.

inductive document network embedding, representation, vector, (12 more...)

arXiv.org Machine Learning

2001.03369

Country: Europe > Portugal > Lisbon > Lisbon (0.04)

Genre: Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.68)

Add feedback