Information Retrieval
The Infinite Index: Information Retrieval on Generative Text-To-Image Models
Deckers, Niklas, Fröbe, Maik, Kiesel, Johannes, Pandolfo, Gianluca, Schröder, Christopher, Stein, Benno, Potthast, Martin
Conditional generative models such as DALL-E and Stable Diffusion generate images based on a user-defined text, the prompt. Finding and refining prompts that produce a desired image has become the art of prompt engineering. Generative models do not provide a built-in retrieval model for a user's information need expressed through prompts. In light of an extensive literature review, we reframe prompt engineering for generative models as interactive text-based retrieval on a novel kind of "infinite index". We apply these insights for the first time in a case study on image generation for game design with an expert. Finally, we envision how active learning may help to guide the retrieval of generated images.
The Recent Advances in Automatic Term Extraction: A survey
Tran, Hanh Thi Hong, Martinc, Matej, Caporusso, Jaya, Doucet, Antoine, Pollak, Senja
Automatic term extraction (ATE) is a Natural Language Processing (NLP) task that eases the effort of manually identifying terms from domain-specific corpora by providing a list of candidate terms. As units of knowledge in a specific field of expertise, extracted terms are not only beneficial for several terminographical tasks, but also support and improve several complex downstream tasks, e.g., information retrieval, machine translation, topic detection, and sentiment analysis. ATE systems, along with annotated datasets, have been studied and developed widely for decades, but recently we observed a surge in novel neural systems for the task at hand. Despite a large amount of new research on ATE, systematic survey studies covering novel neural approaches are lacking. We present a comprehensive survey of deep learning-based approaches to ATE, with a focus on Transformer-based neural models. The study also offers a comparison between these systems and previous ATE approaches, which were based on feature engineering and non-neural supervised learning algorithms.
9 "Best" SEO Tools (January 2023) - Channel969
SEO (Search Engine Optimization) requires a multifaceted strategy that includes researching competition, analyzing what keywords are capable of driving traffic, creating an external and internal link building strategy, and optimizing page loading speed. Below we feature the best SEO tools to increase your odds of ranking high in Google. This powerful SEO platform offers a range of tools that replaces the functionality of other products that includes Google Trends, MOZ, Hootsuite and SimilarWeb. Traffic Analysis – Benchmark your website traffic against competitors to see where you stand. See their estimated total traffic, top traffic sources, bounce rate, time on page, and more to inform your next strategy.
Microsoft's ChatGPT investment could create 'game-changer' AI
Microsoft (MSFT) is going all in on ChatGPT, an artificial intelligence (AI) technology that could power a new search engine that could disrupt the dominance of Google (GOOG). News site Semafor reported on Tuesday that Microsoft is investing $10bn (£8.2bn) in OpenAI, the artificial intelligence firm that launched the AI generative tool ChatGPT in November 2022. This will value the San Francisco-based firm at $29bn, and industry analysts say that Google should pay close attention to the deal. Microsoft spends billions of dollars every year to try to compete with Google's search engine dominance, but with comparatively low user interaction on Bing they have failed for over a decade. Microsoft has so far failed to replicate the algorithm that powers Google search but if they incorporate the AI generating power of ChatGPT into Bing, or a new search engine, this could be "a game changer", an industry commentator has suggested.
Heuristic for Diverse Kemeny Rank Aggregation based on Quantum Annealing
Fiergolla, Sven, Goergen, Kevin, Neises, Patrick, Wolf, Petra
The Kemeny Rank Aggregation (KRA) problem is a well-studied problem in the field of Social Choice with a variety of applications in many different areas like databases and search engines. Intuitively, given a set of votes over a set of candidates, the problem asks to find an aggregated ranking of candidates that minimizes the overall dissatisfaction concerning the votes. Recently, a diverse version of KRA was considered which asks for a sufficiently diverse set of sufficiently good solutions. The framework of diversity of solutions is a young and thriving topic in the field of artificial intelligence. The main idea is to provide the user with not just one, but with a set of different solutions, enabling her to pick a sufficiently good solution that satisfies additional subjective criteria that are hard or impossible to model. In this work, we use a quantum annealer to solve the KRA problem and to compute a representative set of solutions. Quantum annealing is a meta search heuristic that does not only show promising runtime behavior on currently existing prototypes but also samples the solutions space in an inherently different way, making use of quantum effects. We describe how KRA instances can be solved by a quantum annealer and provide an implementation as well as experimental evaluations. As existing quantum annealers are still restricted in their number of qubits, we further implement two different data reduction rules that can split an instance into a set of smaller instances. In our evaluation, we compare classical heuristics that allow to sample multiple solutions such as simulated annealing and local search with quantum annealing performed on a physical quantum annealer. We compare runtime, quality of solution, and diversity of solutions, with and without applying preceding data reduction rules.
KAER: A Knowledge Augmented Pre-Trained Language Model for Entity Resolution
Fang, Liri, Li, Lan, Liu, Yiren, Torvik, Vetle I., Ludäscher, Bertram
Entity resolution has been an essential and well-studied task in data cleaning research for decades. Existing work has discussed the feasibility of utilizing pre-trained language models to perform entity resolution and achieved promising results. However, few works have discussed injecting domain knowledge to improve the performance of pre-trained language models on entity resolution tasks. In this study, we propose Knowledge Augmented Entity Resolution (KAER), a novel framework named for augmenting pre-trained language models with external knowledge for entity resolution. We discuss the results of utilizing different knowledge augmentation and prompting methods to improve entity resolution performance. Our model improves on Ditto, the existing state-of-the-art entity resolution method. In particular, 1) KAER performs more robustly and achieves better results on "dirty data", and 2) with more general knowledge injection, KAER outperforms the existing baseline models on the textual dataset and dataset from the online product domain. 3) KAER achieves competitive results on highly domain-specific datasets, such as citation datasets, requiring the injection of expert knowledge in future work.
An introduction to NLP and its importance in today's technology landscape
Natural Language Processing (NLP) is a branch of artificial intelligence (AI) that deals with the interaction between computers and human language. The goal of NLP is to develop algorithms and models that can understand, interpret, and generate human language. NLP has a wide range of applications, from language translation to sentiment analysis, and is critical in today's technology landscape. One of the most notable areas where NLP has had a significant impact is in the field of search engines. Search engines use NLP algorithms to understand the intent behind a user's query and match it with relevant results. NLP also plays a crucial role in information retrieval, which is the process of finding relevant information from a large collection of documents.
The Dangers Of ChatGPT To Search Engines - AI Summary
LLMs are great but they won't replace search engines anytime soon. The biggest reason is that chat-based search interfaces lack the context and flexibility that users expect and need from a search engine. ChatBot and LLMs are great. Understanding user intent is fantastic. But search is here to stay.
How Data Scientists Review the Scholarly Literature
Mysore, Sheshera, Jasim, Mahmood, Song, Haoru, Akbar, Sarah, Randall, Andre Kenneth Chase, Mahyar, Narges
Keeping up with the research literature plays an important role in the workflow of scientists - allowing them to understand a field, formulate the problems they focus on, and develop the solutions that they contribute, which in turn shape the nature of the discipline. In this paper, we examine the literature review practices of data scientists. Data science represents a field seeing an exponential rise in papers, and increasingly drawing on and being applied in numerous diverse disciplines. Recent efforts have seen the development of several tools intended to help data scientists cope with a deluge of research and coordinated efforts to develop AI tools intended to uncover the research frontier. Despite these trends indicative of the information overload faced by data scientists, no prior work has examined the specific practices and challenges faced by these scientists in an interdisciplinary field with evolving scholarly norms. In this paper, we close this gap through a set of semi-structured interviews and think-aloud protocols of industry and academic data scientists (N = 20). Our results while corroborating other knowledge workers' practices uncover several novel findings: individuals (1) are challenged in seeking and sensemaking of papers beyond their disciplinary bubbles, (2) struggle to understand papers in the face of missing details and mathematical content, (3) grapple with the deluge by leveraging the knowledge context in code, blogs, and talks, and (4) lean on their peers online and in-person. Furthermore, we outline future directions likely to help data scientists cope with the burgeoning research literature.
Cross-Model Comparative Loss for Enhancing Neuronal Utility in Language Understanding
Zhu, Yunchang, Pang, Liang, Wu, Kangxi, Lan, Yanyan, Shen, Huawei, Cheng, Xueqi
Current natural language understanding (NLU) models have been continuously scaling up, both in terms of model size and input context, introducing more hidden and input neurons. While this generally improves performance on average, the extra neurons do not yield a consistent improvement for all instances. This is because some hidden neurons are redundant, and the noise mixed in input neurons tends to distract the model. Previous work mainly focuses on extrinsically reducing low-utility neurons by additional post- or pre-processing, such as network pruning and context selection, to avoid this problem. Beyond that, can we make the model reduce redundant parameters and suppress input noise by intrinsically enhancing the utility of each neuron? If a model can efficiently utilize neurons, no matter which neurons are ablated (disabled), the ablated submodel should perform no better than the original full model. Based on such a comparison principle between models, we propose a cross-model comparative loss for a broad range of tasks. Comparative loss is essentially a ranking loss on top of the task-specific losses of the full and ablated models, with the expectation that the task-specific loss of the full model is minimal. We demonstrate the universal effectiveness of comparative loss through extensive experiments on 14 datasets from 3 distinct NLU tasks based on 4 widely used pretrained language models, and find it particularly superior for models with few parameters or long input.