Goto

Collaborating Authors

 Information Retrieval


Researchers use machine learning to search science data

#artificialintelligence

As scientific datasets increase in both size and complexity, the ability to label, filter and search this deluge of information has become a laborious, time-consuming and sometimes impossible task, without the help of automated tools. With this in mind, a team of researchers from Lawrence Berkeley National Laboratory (Berkeley Lab) and UC Berkeley are developing innovative machine learning tools to pull contextual information from scientific datasets and automatically generate metadata tags for each file. Scientists can then search these files via a web-based search engine for scientific data, called Science Search, that the Berkeley team is building. As a proof-of-concept, the team is working with staff at the Department of Energy's (DOE) Molecular Foundry, located at Berkeley Lab, to demonstrate the concepts of Science Search on the images captured by the facility's instruments. A beta version of the platform has been made available to Foundry researchers.


Berkeley Lab researchers use machine learning to search science data

#artificialintelligence

IMAGE: This is a screenshot of the Science Search interface. In this case, the user did an image search of nanoparticles. As scientific datasets increase in both size and complexity, the ability to label, filter and search this deluge of information has become a laborious, time-consuming and sometimes impossible task, without the help of automated tools. With this in mind, a team of researchers from Lawrence Berkeley National Laboratory (Berkeley Lab) and UC Berkeley are developing innovative machine learning tools to pull contextual information from scientific datasets and automatically generate metadata tags for each file. Scientists can then search these files via a web-based search engine for scientific data, called Science Search, that the Berkeley team is building.


Researchers Use Machine Learning to Search Science Data

#artificialintelligence

In this case, the user performed an image search for nanoparticles. As scientific datasets increase in both size and complexity, the ability to label, filter and search this deluge of information has become a laborious, time-consuming and sometimes impossible task, without the help of automated tools. With this in mind, a team of researchers from the Department of Energy's Lawrence Berkeley National Laboratory (Berkeley Lab) and UC Berkeley are developing innovative machine learning tools to pull contextual information from scientific datasets and automatically generate metadata tags for each file. Scientists can then search these files via a web-based search engine for scientific data, called Science Search, that the Berkeley team is building. As a proof-of-concept, the team is working with staff at Berkeley Lab's Molecular Foundry, to demonstrate the concepts of Science Search on the images captured by the facility's instruments.


How To Create Natural Language Semantic Search For Arbitrary Objects With Deep Learning

#artificialintelligence

The power of modern search engines is undeniable: you can summon knowledge from the internet at a moment's notice. There are many situations where search is relegated to strict keyword search, or when the objects aren't text, search may not be available. Furthermore, strict keyword search doesn't allow the user to search semantically, which means information is not as discoverable. Today, we share a reproducible, minimally viable product that illustrates how you can enable semantic search for arbitrary objects! Concretely, we will show you how to create a system that searches python code semantically -- but this approach can be generalized to other entities (such as pictures or sound clips).


Google Has Removed Over 80% of Hacked Sites from Search Results - Search Engine Journal

#artificialintelligence

Google has released new details about about its spam fighting efforts, revealing that more than 80% of hacked sites have been detected and removed from search results. The search giant plans to continue its efforts by working directly with popular content management systems to fight back against those who compromise forums and comment sections with spam. "Last year, we focused a great deal of effort on reducing the impact on users from hacked websites, and were able to detect and remove more than 80 percent of compromised sites from search results. We're also working closely with many providers of popular content management systems like WordPress and Joomla to help them fight spammers that abuse forums and comment sections." Here are some other notable stats from Google's recent announcement.


TrQuery: An Embedding-based Framework for Recommanding SPARQL Queries

arXiv.org Artificial Intelligence

In this paper, we present an embedding-based framework (TrQuery) for recommending solutions of a SPARQL query, including approximate solutions when exact querying solutions are not available due to incompleteness or inconsistencies of real-world RDF data. Within this framework, embedding is applied to score solutions together with edit distance so that we could obtain more fine-grained recommendations than those recommendations via edit distance. For instance, graphs of two querying solutions with a similar structure can be distinguished in our proposed framework while the edit distance depending on structural difference becomes unable. To this end, we propose a novel score model built on vector space generated in embedding system to compute the similarity between an approximate subgraph matching and a whole graph matching. Finally, we evaluate our approach on large RDF datasets DBpedia and YAGO, and experimental results show that TrQuery exhibits an excellent behavior in terms of both effectiveness and efficiency.


Supercharging your SEO with AI: Insights, automation and personalization - Search Engine Land

#artificialintelligence

Recently, I had the pleasure of presenting at SMX London on Supercharging your SEO with AI and thought I would share some of the insights with Search Engine Land readers. Google made global headlines with the demonstration of its new Duplex at this year's I/O developers conference. This artificial intelligence (AI) system can "converse" in natural language with people to schedule an appointment at a hair salon or book a table at a restaurant, for example. To pass the Turing Test, AI must behave in a manner indistinguishable from that of a human. To many, Google Duplex has proven that it can pass this test, but in truth, we are only seeing the beginnings of its future potential.


Using Search Queries to Understand Health Information Needs in Africa

arXiv.org Artificial Intelligence

The lack of comprehensive, high-quality health data in developing nations creates a roadblock for combating the impacts of disease. One key challenge is understanding the health information needs of people in these nations. Without understanding people's everyday needs, concerns, and misconceptions, health organizations and policymakers lack the ability to effectively target education and programming efforts. In this paper, we propose a bottom-up approach that uses search data from individuals to uncover and gain insight into health information needs in Africa. We analyze Bing searches related to HIV/AIDS, malaria, and tuberculosis from all 54 African nations. For each disease, we automatically derive a set of common search themes or topics, revealing a wide-spread interest in various types of information, including disease symptoms, drugs, concerns about breastfeeding, as well as stigma, beliefs in natural cures, and other topics that may be hard to uncover through traditional surveys. We expose the different patterns that emerge in health information needs by demographic groups (age and sex) and country. We also uncover discrepancies in the quality of content returned by search engines to users by topic. Combined, our results suggest that search data can help illuminate health information needs in Africa and inform discussions on health policy and targeted education efforts both on- and offline.


Consistent Position Bias Estimation without Online Interventions for Learning-to-Rank

arXiv.org Machine Learning

Presentation bias is one of the key challenges when learning from implicit feedback in search engines, as it confounds the relevance signal with uninformative signals due to position in the ranking, saliency, and other presentation factors. While it was recently shown how counterfactual learning-to-rank (LTR) approaches \cite{Joachims/etal/17a} can provably overcome presentation bias if observation propensities are known, it remains to show how to accurately estimate these propensities. In this paper, we propose the first method for producing consistent propensity estimates without manual relevance judgments, disruptive interventions, or restrictive relevance modeling assumptions. We merely require that we have implicit feedback data from multiple different ranking functions. Furthermore, we argue that our estimation technique applies to an extended class of Contextual Position-Based Propensity Models, where propensities not only depend on position but also on observable features of the query and document. Initial simulation studies confirm that the approach is scalable, accurate, and robust.


This Vietnamese Browser & Search Engine Is Daring Google To Step-Up Its Game

Forbes - Tech

Cốc Cốc's browser has gained significant market share in Vietnam This was the response I received when I started chatting with the only other customer at a restaurant in Hanoi, Vietnam exactly four days into my journey traveling and meeting with startups around the world. I had been contemplating how best to break into the Vietnamese startup ecosystem, and this chance meeting proved to be the answer. It turned out the tall, friendly Russian I was speaking with was Victor Lavrenko, CEO of Cốc Cốc, Vietnam's leading local browser and search engine! Over the next few days, I got to visit Cốc Cốc's office and learn more from Lavrenko and CMO Kristina Melentieva about the company and why it's worth keeping an eye on. Only once I left the Cốc Cốc office--with its ping pong table, vibrant color scheme, and full-wall ideation whiteboard--and ventured back onto the street did I fully appreciate the obvious: this company is not operating (and flourishing) in any number of internationally recognized technology startup hubs, but in the chaotic and volatile center of Hanoi.