Goto

Collaborating Authors

Information Retrieval


Dremio launches data lake service running on AWS cloud

#artificialintelligence

All the sessions from Transform 2021 are available on-demand now. Dremio today launched a cloud service that creates a data lake based on an in-memory SQL engine that launches queries against data stored in an object-based storage system. The goal is to make it easier for organizations to take advantage of the data lake, dubbed Dremio Cloud, without having to employ an internal IT team to manage it, said Tomer Shiran, chief product officer for Dremio. An organization can now start accessing Dremio Cloud in as little as five minutes, he said. Based on Dremio's existing SQL Lakehouse platform, the Dremio Cloud service runs on the Amazon Web Services (AWS) public cloud.


Alibaba Develops Search Engine Simulation AI That Uses Live Data

#artificialintelligence

In collaboration with academic researchers in China, Alibaba has developed a search engine simulation AI that uses real world data from the ecommerce giant's live infrastructure in order to develop new ranking models that are not hamstrung by'historic' or out-of-date information. The engine, called AESim, represents the second major announcement in a week to acknowledge the need for AI systems to be able to evaluate and incorporate live and current data, instead of just abstracting the data that was available at the time the model was trained. The earlier announcement was from Facebook, which last week unveiled the BlenderBot 2.0 language model, an NLP interface that features live polling of internet search results in response to queries. The objective of the AESim project is to provide an experimental environment for the development of new Learning-To-Rank (LTR) solutions, algorithms and models in commercial information retrieval systems. In testing the framework, the researchers found that it accurately reflected online performance within useful and actionable parameters.


TFIDF_From_Scratch

#artificialintelligence

Tf-Idf stands for term frequency-inverse document frequency, and the tf-idf weight is a weight often used in information retrieval and text mining. This weight is a statistical measure used to evaluate how important a word is to a document in a collection or corpus. The importance increases proportionally to the number of times a word appears in the document but is offset by the frequency of the word in the corpus. Variations of the tf-idf weighting scheme are often used by search engines as a central tool in scoring and ranking a document's relevance given a user query. One of the simplest ranking functions is computed by summing the tf-idf for each query term; many more sophisticated ranking functions are variants of this simple model. Tf-Idf can be successfully used for stop-words filtering in various subject fields including text summarization and classification.


Council Post: A Beginner's Guide To SEO Keyword Research In 2021

#artificialintelligence

Amine is the CEO of IronMonk, a digital marketing agency specializing in SEO & CMO at Regal Assets, an IRA company. There used to be a time when you could install a free Chrome browser plug-in, scrape all the competitive keywords you need, throw them into an article a couple of dozen times and then immediately rank for high-volume search terms after hitting "publish" on your WordPress site. Those days are no longer, and that's not such a bad thing. Google has gone to great lengths to improve the internet user experience over the past couple of decades. If you want to create rankable content these days, you need to provide exceptional value for your reader.


TFIDF_From_Scratch

#artificialintelligence

Tf-Idf stands for term frequency-inverse document frequency, and the tf-idf weight is a weight often used in information retrieval and text mining. This weight is a statistical measure used to evaluate how important a word is to a document in a collection or corpus. The importance increases proportionally to the number of times a word appears in the document but is offset by the frequency of the word in the corpus. Variations of the tf-idf weighting scheme are often used by search engines as a central tool in scoring and ranking a document's relevance given a user query. One of the simplest ranking functions is computed by summing the tf-idf for each query term; many more sophisticated ranking functions are variants of this simple model. Tf-Idf can be successfully used for stop-words filtering in various subject fields including text summarization and classification.


Brave's new privacy-focused search engine takes aim at Google

ZDNet

Chromium-based browser maker Brave has launched a beta of its Brave search engine in a bid to create a privacy-focused alternative to Google. The new search engine puts Brave into the category of firms that have both a browser and a search engine: Google, Microsoft, Yandex and Baidu are also among these companies with both. It's hard to fault Google's record on security and patching but privacy is another matter for the online ad giant. Brave acquired the search engine Tailcat in March and promised to take on Google by approaching online search with a greater focus on privacy. Brave said its search is built on top of a completely independent index, and doesn't track users, their searches, or their clicks. "Brave has its own search index for answering common queries privately without reliance on other providers," it said.


Leveraging Language to Learn Program Abstractions and Search Heuristics

#artificialintelligence

Inductive program synthesis, or inferring programs from examples of desired behavior, offers a general paradigm for building interpretable, robust, and generalizable machine learning systems. Effective program synthesis depends on two key ingredients: a strong library of functions from which to build programs, and an efficient search strategy for finding programs that solve a given task. We introduce LAPS (Language for Abstraction and Program Search), a technique for using natural language annotations to guide joint learning of libraries and neurally-guided search models for synthesis. When integrated into a state-of-the-art library learning system (DreamCoder), LAPS produces higher-quality libraries and improves search efficiency and generalization on three domains – string editing, image composition, and abstract reasoning about scenes – even when no natural language hints are available at test time.


Brave's privacy-focused search engine is available in beta

Engadget

You can now try Brave's search engine for yourself. Brave has launched a beta Search feature both as an option in all its browsers as well as through the web for everyone else. As you'd expect, it's billed as a privacy- and transparency-oriented platform that doesn't track your activity or use "secret" algorithms to curate results. You'll eventually have the option of an ad-free version if you're willing to pay, and Brave will make Search available for other engines. The site index is independent, although Brave noted that image searches and some other features will lean on Microsoft's Bing.


How to Extract Relevant Keywords with KeyBERT

#artificialintelligence

There are many powerful techniques that perform keywords extraction (e.g. However, they are mainly based on the statistical properties of the text and don't necessarily take into account the semantic aspects of the full document. KeyBERT is a minimal and easy-to-use keyword extraction technique that aims at solving this issue. It leverages the BERT language model and relies on the transformers library. So go check his repo (and clone it) if you're interested in using it.


Query Embedding on Hyper-relational Knowledge Graphs

arXiv.org Artificial Intelligence

Multi-hop logical reasoning is an established problem in the field of representation learning on knowledge graphs (KGs). It subsumes both one-hop link prediction as well as other more complex types of logical queries. Existing algorithms operate only on classical, triple-based graphs, whereas modern KGs often employ a hyper-relational modeling paradigm. In this paradigm, typed edges may have several key-value pairs known as qualifiers that provide fine-grained context for facts. In queries, this context modifies the meaning of relations, and usually reduces the answer set. Hyper-relational queries are often observed in real-world KG applications, and existing approaches for approximate query answering cannot make use of qualifier pairs. In this work, we bridge this gap and extend the multi-hop reasoning problem to hyper-relational KGs allowing to tackle this new type of complex queries. Building upon recent advancements in Graph Neural Networks and query embedding techniques, we study how to embed and answer hyper-relational conjunctive queries. Besides that, we propose a method to answer such queries and demonstrate in our experiments that qualifiers improve query answering on a diverse set of query patterns.