Information Retrieval
Google's prototype Chinese search engine links users' activity to their phone numbers, report claims
Google's secretive plans in China are attracting renewed scrutiny from privacy advocates. The tech giant is said to be building a prototype version of a censored Chinese search engine that links users' activity to their personal phone number, according to the Intercept. In doing so, it would be able to comply with the Chinese government's censorship requirements, increasing the chances that such a product would launch there in the future. A bipartisan group of 16 US lawmakers asked Google if it would comply with China's internet censorship and surveillance policies should it re-enter the search engine market there While China is home to the world's largest number of internet users, a 2015 report by US think tank Freedom House found that the country had the most restrictive online use policies of 65 nations it studied, ranking below Iran and Syria. But China has maintained that its various forms of web censorship are necessary for protecting its national security.
Answering Science Exam Questions Using Query Rewriting with Background Knowledge
Musa, Ryan, Wang, Xiaoyan, Fokoue, Achille, Mattei, Nicholas, Chang, Maria, Kapanipathi, Pavan, Makni, Bassem, Talamadupula, Kartik, Witbrock, Michael
Open-domain question answering (QA) is an important problem in AI and NLP that is emerging as a bellwether for progress on the generalizability of AI methods and techniques. Much of the progress in open-domain QA systems has been realized through advances in information retrieval methods and corpus construction. In this paper, we focus on the recently introduced ARC Challenge dataset, which contains 2,590 multiple choice questions authored for grade-school science exams. These questions are selected to be the most challenging for current QA systems, and current state of the art performance is only slightly better than random chance. We present a system that rewrites a given question into queries that are used to retrieve supporting text from a large corpus of science-related text. Our rewriter is able to incorporate background knowledge from ConceptNet and -- in tandem with a generic textual entailment system trained on SciTail that identifies support in the retrieved results -- outperforms several strong baselines on the end-to-end QA task despite only being trained to identify essential terms in the original source question. We use a generalizable decision methodology over the retrieved evidence and answer candidates to select the best answer. By combining query rewriting, background knowledge, and textual entailment our system is able to outperform several strong baselines on the ARC dataset.
Visual search: The natural evolution in how we search for information
Imagine you're on the Tube and the person in front of you is wearing a really nice pair of trainers. To find them, you could search for "black suede trainers with off-white soles" and leaf through hundreds of possible results. Or, in a world of perfectly accurate visual search, you could find and buy the exact pair instantly from a picture. Three-quarters (74%) of consumers agree that text based keyword searches are inefficient in helping to find the right product online. This opportunity gap will be explored at Dmexco this week in a number of sessions dedicated to smarter search, and it emphasises that brands need to prepare themselves for visual search.
Exploiting local and global performance of candidate systems for aggregation of summarization techniques
Mehta, Parth, Majumder, Prasenjit
With an ever growing number of extractive summarization techniques being proposed, there is less clarity then ever about how good each system is compared to the rest. Several studies highlight the variance in performance of these systems with change in datasets or even across documents within the same corpus. An effective way to counter this variance and to make the systems more robust could be to use inputs from multiple systems when generating a summary. In the present work, we define a novel way of creating such ensemble by exploiting similarity between the content of candidate summaries to estimate their reliability. We define GlobalRank which captures the performance of a candidate system on an overall corpus and LocalRank which estimates its performance on a given document cluster. We then use these two scores to assign a weight to each individual systems, which is then used to generate the new aggregate ranking. Experiments on DUC2003 and DUC 2004 datasets show a significant improvement in terms of ROUGE score, over existing sate-of-art techniques.
Perturb and Combine to Identify Influential Spreaders in Real-World Networks
Tixier, Antoine J. -P., Rossi, Maria-Evgenia G., Malliaros, Fragkiskos D., Read, Jesse, Vazirgiannis, Michalis
Recent research has shown that graph degeneracy algorithms, which decompose a network into a hierarchy of nested subgraphs of decreasing size and increasing density, are very effective at detecting the good spreaders in a network. However, it is also known that degeneracy-based decompositions of a graph are unstable to small perturbations of the network structure. In Machine Learning, the performance of unstable classification and regression methods, such as fully-grown decision trees, can be greatly improved by using Perturb and Combine (P&C) strategies such as bagging (bootstrap aggregating). Therefore, we propose a P&C procedure for networks that (1) creates many perturbed versions of a given graph, (2) applies a node scoring function separately to each graph (such as a degeneracy-based one), and (3) combines the results. We conduct real-world experiments on the tasks of identifying influential spreaders in large social networks, and influential words (keywords) in small word co-occurrence networks. We use the k-core, generalized k-core, and PageRank algorithms as our vertex scoring functions. In each case, using the aggregated scores brings significant improvements compared to using the scores computed on the original graphs. Finally, a bias-variance analysis suggests that our P&C procedure works mainly by reducing bias, and that therefore, it should be capable of improving the performance of all vertex scoring functions, not only unstable ones.
DeFactoNLP: Fact Verification using Entity Recognition, TFIDF Vector Comparison and Decomposable Attention
Reddy, Aniketh Janardhan, Rocha, Gil, Esteves, Diego
In this paper, we describe DeFactoNLP, the system we designed for the FEVER 2018 Shared Task. The aim of this task was to conceive a system that can not only automatically assess the veracity of a claim but also retrieve evidence supporting this assessment from Wikipedia. In our approach, the Wikipedia documents whose Term Frequency-Inverse Document Frequency (TFIDF) vectors are most similar to the vector of the claim and those documents whose names are similar to those of the named entities (NEs) mentioned in the claim are identified as the documents which might contain evidence. The sentences in these documents are then supplied to a textual entailment recognition module. This module calculates the probability of each sentence supporting the claim, contradicting the claim or not providing any relevant information to assess the veracity of the claim. Various features computed using these probabilities are finally used by a Random Forest classifier to determine the overall truthfulness of the claim. The sentences which support this classification are returned as evidence. Our approach achieved a 0.4277 evidence F1-score, a 0.5136 label accuracy and a 0.3833 FEVER score.
What Should An AI-Driven Search Engine Be Able To Do?
Search has always been a key enterprise technology going back to the days of the first enterprise content management systems. This is hardly surprising given how important finding the right data is for any of the applications used by enterprises in their business processes. Since the rise of big data and the use of big data sets, search has become even more important. If enterprise data is the real wealth of a business, then search is the tool that uncovers that wealth. But what do you do with the increasingly large amounts of data that enterprises now have access to?
Leading Rights Groups Call on Google Not to Censor Its Search Engine in China
More than a dozen human rights groups have sent a letter to Google urging the company not to offer censored internet search in China, amid reports it is planning to again begin offering the service in the giant Asian market. The joint letter dated Tuesday calls on CEO Sundar Pichai to explain what Google is doing to safeguard users from the Chinese government's censorship and surveillance. It describes the censored search engine app, codenamed "Dragonfly", as representing "an alarming capitulation by Google on human rights. "The Chinese government extensively violates the rights to freedom of expression and privacy; by accommodating the Chinese authorities' repression of dissent, Google would be actively participating in those violations for millions of internet users in China," said the letter That follows a letter earlier this month signed by more than a thousand Google employees protesting the company's secretive plan to build a search engine that would comply with Chinese ...
Here's what we know about Google's mysterious search engine
President Trump thinks Google's search engine is "rigged." By featuring more mainstream news outlets and relatively fewer conservative sites in the results he sees, Trump tweeted Tuesday, Google is "suppressing" right-wing views on its platform. Trump escalated his attacks Tuesday afternoon in remarks from the Oval Office, warning that "Google and Twitter and Facebook, they are treading on very, very troubled territory and they have to be careful." It's easy to see how Trump arrived at this conclusion, because in many ways his experience mirrors that of millions of Americans who've awoken to the dominance of Google -- and Facebook, and Twitter -- in their everyday lives without being quite certain how it wound up there. We rely constantly on Google to find out what to buy, which restaurants to eat at and how to get from one place to another.
Advocacy groups criticize Google's 'alarming capitulation' over censored China search engine
More than a dozen human rights groups and other advocacy organizations urged Google to abandon any plans to build a censored version of its search engine in China. The project, said to be referred to internally as Dragonfly, 'would represent an alarming capitulation by Google on human rights,' argued a letter signed by 14 groups including Amnesty International, Human Rights Watch and Reporters Without Borders. The letter is addressed to Google CEO Sundar Pichai and comes after weeks of internal revolt at the company, wherein employees have expressed outrage over the firm's rumored plans to launch a censored search engine in China. While China is home to the world's largest number of internet users, a 2015 report by US think tank Freedom House found that the country had the most restrictive online use policies of 65 nations it studied, ranking below Iran and Syria. But China has maintained that its various forms of web censorship are necessary for protecting its national security.