Goto

Collaborating Authors

 Information Retrieval


DSC Data Science Search Engine

#artificialintelligence

AI BI: Fueling Deeper Business Insights and Edge - June 30 Today, businesses are being asked to do more with less, while finding new ways to not only stay competitive, but also outpace the competition. In this latest DSC webinar, hear from two organizations that are coupling their data with artificial intelligence (AI) to power their business intelligence (BI) forecasts and dashboards. AI BI: Fueling Deeper Business Insights and Edge - June 30 Today, businesses are being asked to do more with less, while finding new ways to not only stay competitive, but also outpace the competition. In this latest DSC webinar, hear from two organizations that are coupling their data with artificial intelligence (AI) to power their business intelligence (BI) forecasts and dashboards.


China and scientists dismiss study suggesting coronavirus spread in August 2019

The Japan Times

LONDON – Beijing dismissed as "ridiculous" a Harvard Medical School study of hospital traffic and search engine data that suggested the novel coronavirus may already have been spreading in China last August, and scientists said it offered no convincing evidence of when the outbreak began. The research, which has not been peer-reviewed by other scientists, used satellite imagery of hospital parking lots in Wuhan -- where the disease was first identified in late 2019 -- and data for symptom-related queries on search engines for terms such as "cough" and "diarrhea." The study's authors said increased hospital traffic and symptom search data in Wuhan preceded the documented start of the coronavirus pandemic, in December 2019. "While we cannot confirm if the increased volume was directly related to the new virus, our evidence supports other recent work showing that emergence happened before identification at the Huanan Seafood market (in Wuhan)," they said. Paul Digard, an expert in virology at the University of Edinburgh, said that using search engine data and satellite imagery of hospital traffic to detect disease outbreaks "is an interesting idea with some validity."


China pushes back against Harvard coronavirus study

Al Jazeera

Beijing has dismissed as "ridiculous" a Harvard Medical School study of hospital traffic and search engine data that suggested the new coronavirus may already have been spreading in China last August, and scientists said it offered no convincing evidence of when the outbreak began. Chinese Foreign Ministry spokeswoman Hua Chunying, asked about the research at a news briefing on Tuesday, said: "I think it is ridiculous, incredibly ridiculous, to come up with this conclusion based on superficial observations such as traffic volume." The research, which has not been peer-reviewed by other scientists, used satellite imagery of hospital parking lots in Wuhan - where the disease was first identified in late 2019 - and data for symptom-related queries on search engines for things such as "cough" and "diarrhoea". The study's authors said increased hospital traffic and symptom search data in Wuhan preceded the documented start of the coronavirus pandemic in December 2019. "While we cannot confirm if the increased volume was directly related to the new virus, our evidence supports other recent work showing that emergence happened before identification at the Huanan Seafood market (in Wuhan)," they said.


How Machine Learning in Search Works: Everything You Need to Know

#artificialintelligence

In the world of SEO, it's important to understand the system you're optimizing for. Another crucial area to understand is machine learning. Now, the term "machine learning" gets thrown around a lot these days. But how does machine learning actually impact search and SEO? This chapter will explore everything you need to know about how search engines use machine learning.


Balancing Reinforcement Learning Training Experiences in Interactive Information Retrieval

arXiv.org Artificial Intelligence

Interactive Information Retrieval (IIR) and Reinforcement Learning (RL) share many commonalities, including an agent who learns while interacts, a long-term and complex goal, and an algorithm that explores and adapts. To successfully apply RL methods to IIR, one challenge is to obtain sufficient relevance labels to train the RL agents, which are infamously known as sample inefficient. However, in a text corpus annotated for a given query, it is not the relevant documents but the irrelevant documents that predominate. This would cause very unbalanced training experiences for the agent and prevent it from learning any policy that is effective. Our paper addresses this issue by using domain randomization to synthesize more relevant documents for the training. Our experimental results on the Text REtrieval Conference (TREC) Dynamic Domain (DD) 2017 Track show that the proposed method is able to boost an RL agent's learning effectiveness by 22\% in dealing with unseen situations.


Differentiable reasoning over text

AIHub

We all rely on search engines to navigate the massive amount of online information published every day. Modern search engines not only retrieve a list of pages relevant to a query but often try to directly answer our questions by analyzing the content of those pages. One area they currently struggle at, however, is multi-hop Question Answering that requires reasoning with information taken from multiple documents to arrive at the answer. For example, suppose that we want to find out What is the size of the COVID-19 virus?. Systems that try to answer this question first need to identify the virus responsible for COVID-19 (SARS-CoV-2) and then look for the size of that virus (50-200 nm).


Research finds some AI advances are over-hyped

#artificialintelligence

Is it possible some instances of artificial intelligence are not as intelligent as we thought? Call it artificial artificial intelligence. A team of computer graduate students reports that a closer examination of several dozen information retrieval algorithms hailed as milestones in artificial research were in fact nowhere near as revolutionary as claimed. In fact, AI used in those algorithms were often merely minor tweaks of previously established routines. According to graduate student researcher Davis Blalock at the Massachusetts Institute of Technology, after his team examined 81 approaches to developing neural networks commonly believed to be superior to earlier efforts, the team could not confirm that any improvement, in fact, was ever achieved.


Query complexity of heavy hitter estimation

arXiv.org Machine Learning

We consider the problem of identifying the subset $\mathcal{S}^{\gamma}_{\mathcal{P}}$ of elements in the support of an underlying distribution $\mathcal{P}$ whose probability value is larger than a given threshold $\gamma$, by actively querying an oracle to gain information about a sequence $X_1, X_2, \ldots$ of $i.i.d.$ samples drawn from $\mathcal{P}$. We consider two query models: $(a)$ each query is an index $i$ and the oracle return the value $X_i$ and $(b)$ each query is a pair $(i,j)$ and the oracle gives a binary answer confirming if $X_i = X_j$ or not. For each of these query models, we design sequential estimation algorithms which at each round, either decide what query to send to the oracle depending on the entire history of responses or decide to stop and output an estimate of $\mathcal{S}^{\gamma}_{\mathcal{P}}$, which is required to be correct with some pre-specified large probability. We provide upper bounds on the query complexity of the algorithms for any distribution $\mathcal{P}$ and also derive lower bounds on the optimal query complexity under the two query models. We also consider noisy versions of the two query models and propose robust estimators which can effectively counter the noise in the oracle responses.


DSC Data Science Search Engine

#artificialintelligence

Embracing Responsible AI from Pilot to Production - May 27 On average, 80% of AI projects fail to make it to production. But it IS possible to successfully launch AI, at scale, that is built responsibly and works for everyone. How you scale from pilot to production is critical to ensuring AI success, while continuing to be a good corporate citizen through responsible productization.


#111 Machine Learning with TensorFlow with Chris Mattmann – Author / Manager, Chief Technology and Innovation Officer -- DATA FUTUROLOGY PODCAST

#artificialintelligence

Chris Mattmann is the Deputy Chief Technology and Innovation Officer at NASA Jet Propulsion Lab, where he has been recognised as JPL's first Principal Scientist in the area of Data Science. Chris has applied TensorFlow to challenges he's faced at NASA, including building an implementation of Google's Show & Tell algorithm for image captioning using TensorFlow. He was involved in the Mars rover landing mission, where he was working in a planetary data system engineering node, helping to build a data management framework called object-oriented data technology to support capturing, processing and sharing of data for NASA's scientific archives. He contributes to open source as a former Director at the Apache Software Foundation, and teaches graduate courses at USC in Content Detection and Analysis, and in Search Engines and Information Retrieval. In this episode, Chris opens the show discussing his interest in data.