Goto

Collaborating Authors

 Information Retrieval


Important digital skill-what are the basics of SEO(search engine optimization)?

#artificialintelligence

With the growth of the internet, the world is becoming digital. Millions of websites are being created every day and digital content is being uploaded to them. Every site's purpose is to reach its potential audience at an earlier base. The search Engine Optimization (SEO) technique is used for this purpose. SEO is the method by which it brings organic traffic to your website.


Consistent Polyhedral Surrogates for Top-$k$ Classification and Variants

arXiv.org Artificial Intelligence

Top-$k$ classification is a generalization of multiclass classification used widely in information retrieval, image classification, and other extreme classification settings. Several hinge-like (piecewise-linear) surrogates have been proposed for the problem, yet all are either non-convex or inconsistent. For the proposed hinge-like surrogates that are convex (i.e., polyhedral), we apply the recent embedding framework of Finocchiaro et al. (2019; 2022) to determine the prediction problem for which the surrogate is consistent. These problems can all be interpreted as variants of top-$k$ classification, which may be better aligned with some applications. We leverage this analysis to derive constraints on the conditional label distributions under which these proposed surrogates become consistent for top-$k$. It has been further suggested that every convex hinge-like surrogate must be inconsistent for top-$k$. Yet, we use the same embedding framework to give the first consistent polyhedral surrogate for this problem.


MIA 2022 Shared Task Submission: Leveraging Entity Representations, Dense-Sparse Hybrids, and Fusion-in-Decoder for Cross-Lingual Question Answering

arXiv.org Artificial Intelligence

We describe our two-stage system for the Multilingual Information Access (MIA) 2022 Shared Task on Cross-Lingual Open-Retrieval Question Answering. The first stage consists of multilingual passage retrieval with a hybrid dense and sparse retrieval strategy. The second stage consists of a reader which outputs the answer from the top passages returned by the first stage. We show the efficacy of using a multilingual language model with entity representations in pretraining, sparse retrieval signals to help dense retrieval, and Fusion-in-Decoder. On the development set, we obtain 43.46 F1 on XOR-TyDi QA and 21.99 F1 on MKQA, for an average F1 score of 32.73. On the test set, we obtain 40.93 F1 on XOR-TyDi QA and 22.29 F1 on MKQA, for an average F1 score of 31.61. We improve over the official baseline by over 4 F1 points on both the development and test sets.


Heuristic-free Optimization of Force-Controlled Robot Search Strategies in Stochastic Environments

arXiv.org Artificial Intelligence

In both industrial and service domains, a central benefit of the use of robots is their ability to quickly and reliably execute repetitive tasks. However, even relatively simple peg-in-hole tasks are typically subject to stochastic variations, requiring search motions to find relevant features such as holes. While search improves robustness, it comes at the cost of increased runtime: More exhaustive search will maximize the probability of successfully executing a given task, but will significantly delay any downstream tasks. This trade-off is typically resolved by human experts according to simple heuristics, which are rarely optimal. This paper introduces an automatic, data-driven and heuristic-free approach to optimize robot search strategies. By training a neural model of the search strategy on a large set of simulated stochastic environments, conditioning it on few real-world examples and inverting the model, we can infer search strategies which adapt to the time-variant characteristics of the underlying probability distributions, while requiring very few real-world measurements. We evaluate our approach on two different industrial robots in the context of spiral and probe search for THT electronics assembly.


You.com raises $25M to fuel its AI-powered search engine – TechCrunch

#artificialintelligence

At least, that's the crux of the argument Richard Socher, the former chief scientist at Salesforce, likes to make. In 2020, Socher co-founded You, a search engine that uses AI to understand search queries, rank the results and parse the queries into different languages (including programming languages). You summarizes information from across the web and offers built-in apps, like search tools for Twitter, that allow users to complete tasks without having to leave the results page. It seems there's some truth to his words. Socher claims that You has hundreds of thousands of users, with 70% growth in sign-ups last month and 30% growth in unique searches month over month.


GrabQC: Graph based Query Contextualization for automated ICD coding

arXiv.org Artificial Intelligence

Automated medical coding is a process of codifying clinical notes to appropriate diagnosis and procedure codes automatically from the standard taxonomies such as ICD (International Classification of Diseases) and CPT (Current Procedure Terminology). The manual coding process involves the identification of entities from the clinical notes followed by querying a commercial or non-commercial medical codes Information Retrieval (IR) system that follows the Centre for Medicare and Medicaid Services (CMS) guidelines. We propose to automate this manual process by automatically constructing a query for the IR system using the entities auto-extracted from the clinical notes. We propose \textbf{GrabQC}, a \textbf{Gra}ph \textbf{b}ased \textbf{Q}uery \textbf{C}ontextualization method that automatically extracts queries from the clinical text, contextualizes the queries using a Graph Neural Network (GNN) model and obtains the ICD Codes using an external IR system. We also propose a method for labelling the dataset for training the model. We perform experiments on two datasets of clinical text in three different setups to assert the effectiveness of our approach. The experimental results show that our proposed method is better than the compared baselines in all three settings.


Parameter-Efficient Prompt Tuning Makes Generalized and Calibrated Neural Text Retrievers

arXiv.org Artificial Intelligence

Prompt tuning attempts to update few task-specific parameters in pre-trained models. It has achieved comparable performance to fine-tuning of the full parameter set on both language understanding and generation tasks. In this work, we study the problem of prompt tuning for neural text retrievers. We introduce parameter-efficient prompt tuning for text retrieval across in-domain, cross-domain, and cross-topic settings. Through an extensive analysis, we show that the strategy can mitigate the two issues -- parameter-inefficiency and weak generalizability -- faced by fine-tuning based retrieval methods. Notably, it can significantly improve the out-of-domain zero-shot generalization of the retrieval models. By updating only 0.1% of the model parameters, the prompt tuning strategy can help retrieval models achieve better generalization performance than traditional methods in which all parameters are updated. Finally, to facilitate research on retrievers' cross-topic generalizability, we curate and release an academic retrieval dataset with 18K query-results pairs in 87 topics, making it the largest topic-specific one to date.


Paid search ways you'll want to maximize ROI in a good financial system - Channel969

#artificialintelligence

As we enter financial uncertainty, many leaders are on the lookout for methods to take advantage of out of their advertising and marketing budgets. Paired with buyer conduct turning into more and more turbulent and fewer predictable, you at the moment are confronted with maximizing your ROI. We'll spotlight case research of shoppers who've already reaped the advantage of sharpening their paid search technique together with Sage and the way they achieved a 75% lower in CPCs. Register right now for "Paid Search Ways You Have to Maximize ROI in a Tight Economic system," offered by Adthena. Cynthia Ramsaran is director of customized content material at Third Door Media, publishers of Search Engine Land and MarTech.


ParaNames: A Massively Multilingual Entity Name Corpus

arXiv.org Artificial Intelligence

We introduce ParaNames, a multilingual parallel name resource consisting of 118 million names spanning across 400 languages. Names are provided for 13.6 million entities which are mapped to standardized entity types (PER/LOC/ORG). Using Wikidata as a source, we create the largest resource of this type to-date. We describe our approach to filtering and standardizing the data to provide the best quality possible. ParaNames is useful for multilingual language processing, both in defining tasks for name translation/transliteration and as supplementary data for tasks such as named entity recognition and linking. We demonstrate an application of ParaNames by training a multilingual model for canonical name translation to and from English. Our resource is released under a Creative Commons license (CC BY 4.0) at https://github.com/bltlab/paranames.


Measurement and applications of position bias in a marketplace search engine

arXiv.org Artificial Intelligence

Search engines intentionally influence user behavior by picking and ranking the list of results. Users engage with the highest results both because of their prominent placement and because they are typically the most relevant documents. Search engine ranking algorithms need to identify relevance while incorporating the influence of the search engine itself. This paper describes our efforts at Thumbtack to understand the impact of ranking, including the empirical results of a randomization program. In the context of a consumer marketplace we discuss practical details of model choice, experiment design, bias calculation, and machine learning model adaptation. We include a novel discussion of how ranking bias may not only affect labels, but also model features. The randomization program led to improved models, motivated internal scenario analysis, and enabled user-facing scenario tooling.