Information Retrieval
Dynamic faceted search: from haystack to highlight
In the digital age, the amount of scholarly articles is growing exponentially. In the Open Research Knowledge Graph's question-answering facility ASK, for example, more than 80 million research articles have already been indexed. Finding the most relevant information from vast collections of scholarly data can be daunting for researchers, students, and academics. To tackle this challenge, search engines and digital libraries often rely on advanced search techniques, one of the most effective being faceted search. Faceted search is an advanced search method that allows users to filter and refine search results based on multiple predefined attributes, known as facets.
Perplexity now lets you buy stuff directly in the AI search engine
Perplexity searches are now shoppable with a Perplexity Pro subscription. Today, the AI-driven search engine unveiled "Buy with Pro," a shopping assistant within Perplexity searches that gives you a one-click shopping experience. If you're researching products, or some kind of shopping-related query, Perplexity includes products in its responses that can be bought on the page -- as long as your payment info is saved in the app. Search results include relevant product info to inform your purchases by comparing prices and features. To sweeten the deal, Perplexity is also offering free shipping. Buy with Pro is available with a Perplexity Pro subscription, which costs 20 a month.
The Death of Search
For nearly two years, the world's biggest tech companies have said that AI will transform the web, your life, and the world. But first, they are remaking the humble search engine. Chatbots and search, in theory, are a perfect match. A standard Google search interprets a query and pulls up relevant results; tech companies have spent tens or hundreds of millions of dollars engineering chatbots that interpret human inputs, synthesize information, and provide fluent, useful responses. No more keyword refining or scouring Wikipedia--ChatGPT will do it all.
How to replace Google with ChatGPT Search as your default search engine
ChatGPT Search is here, challenging Google's dominance in the search engine realm. Curious about whether ChatGPT Search can replace Google, a search engine that has captured nearly 90 percent of the market share? Well, there's a way to find out. How? Make the OpenAI tool your default search option. To make ChatGPT Search your default search engine, you'll need the Google Chrome browser to download an extension from the Chrome Web Store.
Watch out, Google: Meta is reportedly working on an AI-powered search engine
Meta is throwing its hat into the AI search engine ring. According to a report from The Information, Meta is working on an AI-powered search engine to power its Meta AI feature on Facebook, Instagram, WhatsApp, and Messenger. Currently, the Meta AI search function uses Google Search and Microsoft's Bing to surface real-time information about news, sports, and stocks. Meta also recently struck a deal with Reuters for providing news reports in Meta AI responses. But Meta's reported development of its own search engine is an effort to rely less on its competitors' search engines, especially if the circumstances of the arrangements change.
Knowledge-Aware Bayesian Deep Topic Model
We propose a Bayesian generative model for incorporating prior domain knowledge into hierarchical topic modeling. Although embedded topic models (ETMs) and its variants have gained promising performance in text analysis, they mainly focus on mining word co-occurrence patterns, ignoring potentially easy-to-obtain prior topic hierarchies that could help enhance topic coherence. While several knowledge-based topic models have recently been proposed, they are either only applicable to shallow hierarchies or sensitive to the quality of the provided prior knowledge. To this end, we develop a novel deep ETM that jointly models the documents and the given prior knowledge by embedding the words and topics into the same space. Guided by the provided domain knowledge, the proposed model tends to discover topic hierarchies that are organized into interpretable taxonomies.
Debiased and Denoised Entity Recognition from Distant Supervision
While distant supervision has been extensively explored and exploited in NLP tasks like named entity recognition, a major obstacle stems from the inevitable noisy distant labels tagged unsupervisedly. A few past works approach this problem by adopting a self-training framework with a sample-selection mechanism. In this work, we innovatively identify two types of biases that were omitted by prior work, and these biases lead to inferior performance of the distant-supervised NER setup. First, we characterize the noise concealed in the distant labels as highly structural rather than fully randomized. Second, the self-training framework would ubiquitously introduce an inherent bias that causes erroneous behavior in both sample selection and eventually prediction.
Overlapping Spaces for Compact Graph Representations
Various non-trivial spaces are becoming popular for embedding structured data such as graphs, texts, or images. Following spherical and hyperbolic spaces, more general product spaces have been proposed. However, searching for the best configuration of a product space is a resource-intensive procedure, which reduces the practical applicability of the idea. We generalize the concept of product space and introduce an overlapping space that does not have the configuration search problem. The main idea is to allow subsets of coordinates to be shared between spaces of different types (Euclidean, hyperbolic, spherical).
TweetNERD - End to End Entity Linking Benchmark for Tweets
Named Entity Recognition and Disambiguation (NERD) systems are foundational for information retrieval, question answering, event detection, and other natural language processing (NLP) applications. We introduce TweetNERD, a dataset of 340K Tweets across 2010-2021, for benchmarking NERD systems on Tweets. This is the largest and most temporally diverse open sourced dataset benchmark for NERD on Tweets and can be used to facilitate research in this area. We describe evaluation setup with TweetNERD for three NERD tasks: Named Entity Recognition (NER), Entity Linking with True Spans (EL), and End to End Entity Linking (End2End); and provide performance of existing publicly available methods on specific TweetNERD splits.
An ensemble diversity approach to supervised binary hashing
Binary hashing is a well-known approach for fast approximate nearest-neighbor search in information retrieval. Much work has focused on affinity-based objective functions involving the hash functions or binary codes. These objective functions encode neighborhood information between data points and are often inspired by manifold learning algorithms. They ensure that the hash functions differ from each other through constraints or penalty terms that encourage codes to be orthogonal or dissimilar across bits, but this couples the binary variables and complicates the already difficult optimization. We propose a much simpler approach: we train each hash function (or bit) independently from each other, but introduce diversity among them using techniques from classifier ensembles.