Information Retrieval
Learning to Search Efficiently Using Comparisons
Chumbalov, Daniyar, Maystre, Lucas, Grossglauser, Matthias
We consider the problem of searching in a set of items by using pairwise comparisons. We aim to locate a target item $t$ by asking an oracle questions of the form "Which item from the pair $(i,j)$ is more similar to t?". We assume a blind setting, where no item features are available to guide the search process; only the oracle sees the features in order to generate an answer. Previous approaches for this problem either assume noiseless answers, or they scale poorly in the number of items, both of which preclude practical applications. In this paper, we present a new scalable learning framework called learn2search that performs efficient comparison-based search on a set of items despite the presence of noise in the answers. Items live in a space of latent features, and we posit a probabilistic model for the oracle comparing two items $i$ and $j$ with respect to a target $t$. Our algorithm maintains its own representation of the space of items, which it learns incrementally based on past searches. We evaluate the performance of learn2search on both synthetic and real-world data, and show that it learns to search more and more efficiently, over time matching the performance of a scheme with access to the item features.
AI Enabling Technologies: A Survey
Gadepally, Vijay, Goodwin, Justin, Kepner, Jeremy, Reuther, Albert, Reynolds, Hayley, Samsi, Siddharth, Su, Jonathan, Martinez, David
Artificial Intelligence (AI) has the opportunity to revolutionize the way the United States Department of Defense (DoD) and Intelligence Community (IC) address the challenges of evolving threats, data deluge, and rapid courses of action. Developing an end-to-end artificial intelligence system involves parallel development of different pieces that must work together in order to provide capabilities that can be used by decision makers, warfighters and analysts. These pieces include data collection, data conditioning, algorithms, computing, robust artificial intelligence, and human-machine teaming. While much of the popular press today surrounds advances in algorithms and computing, most modern AI systems leverage advances across numerous different fields. Further, while certain components may not be as visible to end-users as others, our experience has shown that each of these interrelated components play a major role in the success or failure of an AI system. This article is meant to highlight many of these technologies that are involved in an end-to-end AI system. The goal of this article is to provide readers with an overview of terminology, technical details and recent highlights from academia, industry and government. Where possible, we indicate relevant resources that can be used for further reading and understanding.
Cyber-All-Intel: An AI for Security related Threat Intelligence
Mittal, Sudip, Joshi, Anupam, Finin, Tim
Keeping up with threat intelligence is a must for a security analyst today. There is a volume of information present in `the wild' that affects an organization. We need to develop an artificial intelligence system that scours the intelligence sources, to keep the analyst updated about various threats that pose a risk to her organization. A security analyst who is better `tapped in' can be more effective. In this paper we present, Cyber-All-Intel an artificial intelligence system to aid a security analyst. It is a system for knowledge extraction, representation and analytics in an end-to-end pipeline grounded in the cybersecurity informatics domain. It uses multiple knowledge representations like, vector spaces and knowledge graphs in a 'VKG structure' to store incoming intelligence. The system also uses neural network models to pro-actively improve its knowledge. We have also created a query engine and an alert system that can be used by an analyst to find actionable cybersecurity insights.
Digital Marketing in 2018 Multilingual Search Engine Optimization
It does appear that digital marketing today will mostly follow the trends and patterns that have been growing over the past few years. This means that SEO will still be evolving as search engines continue to expand and account for social media, video marketing, and the like. What follows is an A.I prediction along with digital marketing possibilities that will help to dominate and make it a continuation in the gradual change in emphasis for small and large online companies. The first prediction is an easy one, a greater emphasis by online businesses when it comes to inbound and outbound marketing techniques. The differences however, will be in the tactics they use to get more customers.
Personalized Query Auto-Completion Through a Lightweight Representation of the User Context
Kannadasan, Manojkumar Rangasamy, Aslanyan, Grigor
Query Auto-Completion (QAC) is a widely used feature in many domains, including web and eCommerce search, suggesting full queries based on a prefix typed by the user. QAC has been extensively studied in the literature in the recent years, and it has been consistently shown that adding personalization features can significantly improve the performance of QAC. In this work we propose a novel method for personalized QAC that uses lightweight embeddings learnt through fastText. We construct an embedding for the user context queries, which are the last few queries issued by the user. We also use the same model to get the embedding for the candidate queries to be ranked. We introduce ranking features that compute the distance between the candidate queries and the context queries in the embedding space. These features are then combined with other commonly used QAC ranking features to learn a ranking model. We apply our method to a large eCommerce search engine (eBay) and show that the ranker with our proposed feature significantly outperforms the baselines on all of the offline metrics measured, which includes Mean Reciprocal Rank (MRR), Success Rate (SR), Mean Average Precision (MAP), and Normalized Discounted Cumulative Gain (NDCG). Our baselines include the Most Popular Completion (MPC) model as well as a ranking model without our proposed features. The ranking model with the proposed features results in a $20-30\%$ improvement over the MPC model on all metrics. We obtain up to a $5\%$ improvement over the baseline ranking model for all the sessions, which goes up to about $10\%$ when we restrict to sessions that contain the user context. Moreover, our proposed features also significantly outperform text based personalization features studied in the literature before, and adding text based features on top of our proposed embedding based features results only in minor improvements.
Efficient Discrete Supervised Hashing for Large-scale Cross-modal Retrieval
Yao, Tao, Kong, Xiangwei, Yan, Lianshan, Tang, Wenjing, Tian, Qi
Supervised cross-modal hashing has gained increasing research interest on large-scale retrieval task owning to its satisfactory performance and efficiency. However, it still has some challenging issues to be further studied: 1) most of them fail to well preserve the semantic correlations in hash codes because of the large heterogenous gap; 2) most of them relax the discrete constraint on hash codes, leading to large quantization error and consequent low performance; 3) most of them suffer from relatively high memory cost and computational complexity during training procedure, which makes them unscalable. In this paper, to address above issues, we propose a supervised cross-modal hashing method based on matrix factorization dubbed Efficient Discrete Supervised Hashing (EDSH). Specifically, collective matrix factorization on heterogenous features and semantic embedding with class labels are seamlessly integrated to learn hash codes. Therefore, the feature based similarities and semantic correlations can be both preserved in hash codes, which makes the learned hash codes more discriminative. Then an efficient discrete optimal algorithm is proposed to handle the scalable issue. Instead of learning hash codes bit-by-bit, hash codes matrix can be obtained directly which is more efficient. Extensive experimental results on three public real-world datasets demonstrate that EDSH produces a superior performance in both accuracy and scalability over some existing cross-modal hashing methods.
9 Emerging Search Engine Optimization Trends For 2019 (infographic)
We all know that the only thing that kept humans alive for ages is evolution. If our ancestors didn't evolve when it was necessary, we might not have progressed as we are today. Just like humans, systems need to change to survive. There is a rat race out there in the digital world and to beat the rat race, you must use evolution to outsmart your competitors. When you understand your target market, there is a good chance that you'll be able to target the right market.
Text segmentation on multilabel documents: A distant-supervised approach
Manchanda, Saurav, Karypis, George
Segmenting text into semantically coherent segments is an important task with applications in information retrieval and text summarization. Developing accurate topical segmentation requires the availability of training data with ground truth information at the segment level. However, generating such labeled datasets, especially for applications in which the meaning of the labels is user-defined, is expensive and time-consuming. In this paper, we develop an approach that instead of using segment-level ground truth information, it instead uses the set of labels that are associated with a document and are easier to obtain as the training data essentially corresponds to a multilabel dataset. Our method, which can be thought of as an instance of distant supervision, improves upon the previous approaches by exploiting the fact that consecutive sentences in a document tend to talk about the same topic, and hence, probably belong to the same class. Experiments on the text segmentation task on a variety of datasets show that the segmentation produced by our method beats the competing approaches on four out of five datasets and performs at par on the fifth dataset. On the multilabel text classification task, our method performs at par with the competing approaches, while requiring significantly less time to estimate than the competing approaches.
Few-shot Learning: A Survey
The quest of `can machines think' and `can machines do what human do' are quests that drive the development of artificial intelligence. Although recent artificial intelligence succeeds in many data intensive applications, it still lacks the ability of learning from limited exemplars and fast generalizing to new tasks. To tackle this problem, one has to turn to machine learning, which supports the scientific study of artificial intelligence. Particularly, a machine learning problem called Few-Shot Learning (FSL) targets at this case. It can rapidly generalize to new tasks of limited supervised experience by turning to prior knowledge, which mimics human's ability to acquire knowledge from few examples through generalization and analogy. It has been seen as a test-bed for real artificial intelligence, a way to reduce laborious data gathering and computationally costly training, and antidote for rare cases learning. With extensive works on FSL emerging, we give a comprehensive survey for it. We first give the formal definition for FSL. Then we point out the core issues of FSL, which turns the problem from "how to solve FSL" to "how to deal with the core issues". Accordingly, existing works from the birth of FSL to the most recent published ones are categorized in a unified taxonomy, with thorough discussion of the pros and cons for different categories. Finally, we envision possible future directions for FSL in terms of problem setup, techniques, applications and theory, hoping to provide insights to both beginners and experienced researchers.
This em SNL /em Sketch About em Game of Thrones /em With Kit Harington, Ice-T, and Mariska Hargitay Should Be Catnip to Search Engines
The news industry is more dependent upon search-engine generated traffic than ever these days, which means when Saturday Night Live invites Game of Thrones star Kit Harington to host Saturday Night Live the week before Game of Thrones returns with the final season of Game of Thrones, then that same Kit Harington appears in a Saturday Night Live Game of Thrones sketch about upcoming Game of Thrones spinoffs that includes cameos from Law & Order: Special Victims Unit stars Mariska Hargitay and Ice T, Slate is going to make sure you Game of Thrones fans searching for Game of Thrones news find out about it, even if the sketch doesn't include plot details and spoilers for Game of Thrones' last season, a list of everyone who dies in the Game of Thrones finale, or confirmation that the Night King wins the Game of Thrones. We don't care if you use abbreviations like GoT or SNL or spell it Gaem of Throns: The important thing is that you typed something into a search bar and landed on this page of the internet. While you try to figure out why Google thought you'd find "Stairway to Heaven easy solo tablature" here, on a website that would not typically publish "Stairway to Heaven easy solo tablature," and indeed, has still not published "Stairway to Heaven easy solo tablature," perhaps you'd enjoy watching a Saturday Night Live sketch about Game of Thrones: Any sketch that lets Kyle Mooney deploy his 1990s sitcom delivery is a winner, but obviously Hodor's House is the spinoff to watch, because how could the second episode live up to the pilot? But there's a lot to love here for fans of Game of Thrones, Saturday Night Live, Pee-wee's Playhouse, Kit Harington, Ice T, Mariska Hargitay, HBO, Daria, Arya, the Game of Thrones finale, Beck Bennett, Heidi Gardner, the final season of Game of Thrones, Cecily Strong, Game of Thrones, Kyle Mooney, Game of Thrones final season spoilers, Pete Davidson, Game of Thrones, Game of Thrones, or even Game of Thrones. The final season of Game of Thrones begins on April 14.