AITopics

2509.04979

Genre: Research Report > New Finding (0.67)

Industry: Information Technology > Security & Privacy (0.92)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Barrowclough, George, Andrecki, Marian, Shinner, James, Donghi, Daniele

Kamae: Bridging Spark and Keras for Seamless ML Preprocessing

arXiv.org Artificial IntelligenceJul-9-2025

In production recommender systems, feature preprocessing must be faithfully replicated across training and inference environments. This often requires duplicating logic between offline and online environments, increasing engineering effort and introducing risks of dataset shift. We present Kamae, an open-source Python library that bridges this gap by translating PySpark preprocessing pipelines into equivalent Keras models. Kamae provides a suite of configurable Spark transformers and estimators, each mapped to a corresponding Keras layer, enabling consistent, end-to-end preprocessing across the ML lifecycle. Framework's utility is illustrated on real-world use cases, including MovieLens dataset and Expedia's Learning-to-Rank pipelines. The code is available at https://github.com/ExpediaGroup/kamae.

artificial intelligence, machine learning, transformation, (14 more...)

2507.06021

Country:

Europe > United Kingdom > England (0.15)
North America > United States > Illinois (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Software (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.35)

arXiv.org Artificial IntelligenceOct-13-2023

EHI: End-to-end Learning of Hierarchical Index for Efficient Dense Retrieval

Kumar, Ramnath, Mittal, Anshul, Gupta, Nilesh, Kusupati, Aditya, Dhillon, Inderjit, Jain, Prateek

Dense embedding-based retrieval is now the industry standard for semantic search and ranking problems, like obtaining relevant web documents for a given query. Such techniques use a two-stage process: (a) contrastive learning to train a dual encoder to embed both the query and documents and (b) approximate nearest neighbor search (ANNS) for finding similar documents for a given query. These two stages are disjoint; the learned embeddings might be ill-suited for the ANNS method and vice-versa, leading to suboptimal performance. In this work, we propose End-to-end Hierarchical Indexing -- EHI -- that jointly learns both the embeddings and the ANNS structure to optimize retrieval performance. EHI uses a standard dual encoder model for embedding queries and documents while learning an inverted file index (IVF) style tree structure for efficient ANNS. To ensure stable and efficient learning of discrete tree-based ANNS structure, EHI introduces the notion of dense path embedding that captures the position of a query/document in the tree. We demonstrate the effectiveness of EHI on several benchmarks, including de-facto industry standard MS MARCO (Dev set and TREC DL19) datasets. For example, with the same compute budget, EHI outperforms state-of-the-art (SOTA) in by 0.6% (MRR@10) on MS MARCO dev set and by 4.2% (nDCG@10) on TREC DL19 benchmarks.

dataset, ehi, query, (14 more...)

2310.08891

Genre:

Research Report > Experimental Study (0.67)
Research Report > New Finding (0.67)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Artificial IntelligenceJan-6-2023

Multi-Agent Dynamic Pricing in a Blockchain Protocol Using Gaussian Bandits

Asseman, Alexis, Kornuta, Tomasz, Patel, Anirudh, Deible, Matt, Green, Sam

The Graph Protocol indexes historical blockchain transaction data and makes it available for querying. As the protocol is decentralized, there are many independent Indexers that index and compete with each other for serving queries to the Consumers. One dimension along which Indexers compete is pricing. In this paper, we propose a bandit-based algorithm for maximization of Indexers' revenue via Consumer budget discovery. We present the design and the considerations we had to make for a dynamic pricing algorithm being used by multiple agents simultaneously. We discuss the results achieved by our dynamic pricing bandits both in simulation and deployed into production on one of the Indexers operating on Ethereum. We have open-sourced both the simulation framework and tools we created, which other Indexers have since started to adapt into their own workflows.

bandit, machine learning, reinforcement learning, (17 more...)

2212.07942

Country:

North America > United States > California > Santa Clara County > Los Altos (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (1.00)

Industry: Banking & Finance > Trading (0.67)

Technology:

Information Technology > e-Commerce (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.97)

#artificialintelligenceJan-23-2022, 10:10:20 GMT

Joan Fontanals – Principal Engineer – Jina.AI

I had a pleasure to sit down with Joan Fontanals – Principal Engineer with Jina.AI -- framework with lots of capabilities to support your neural search journey. Listen to or watch the podcast and get a chance to win awesome swag from Jina.AI. As a special line of thank-yous, I'd like to mention Saurabh Rai, who kindly designed the Thumbnail of this episode!

buddypq, jina, paddlepaddle paddle, (7 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.54)
Information Technology > Artificial Intelligence > Machine Learning (0.34)

#artificialintelligenceSep-5-2021, 07:10:59 GMT

DSC Weekly Digest 31 August 2021

Programmers, when first learning their trade, spend a few weeks or months working on the basics - the syntax of the language, how to work with strings and numbers, how to assign variables, and how to create basic functions. About this time, they also encounter two of their first data structures: lists and dictionaries. Lists can be surprisingly complex structures, but in most cases, they consist of sequences of items with pointers (or links) from one item to the next. While navigation can be handled by traversing the linked list (also known as an array), most often this is shortcircuited by passing in a numeric index that can be given from 0 (or 1 in some languages) to the position of whatever item is required. A similar structure is known as a dictionary.

database, dsc weekly digest 31, query, (7 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.52)

#artificialintelligenceJan-16-2020, 23:09:08 GMT

Enhance Your Search Applications with Artificial Intelligence

Users expect to see that friendly search box in their applications. They seem to really like it, because it's so simple to use. You don't need a user manual to figure out search. In fact, if your application doesn't have search, you'll be pelted with negative reviews. No wonder you see search in so many applications. It's very difficult to implement. We all know it's more than just simple text matching. Those of us with database backgrounds know that searching for "prefix*" is a lot easier than searching for "*suffix". And users want to do all sorts of weird searches like "*run*", which should match ran, or shrunken or brunt, or--you get the idea. Quick search results and performance are important, as is accuracy and ranking.

application, azure search, indexer, (16 more...)

Country: South America > Brazil (0.04)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence (1.00)

Fiorini, Sandro Rama, Santos, Wallas Sousa dos, Mesquita, Rodrigo Costa, Lima, Guilherme Ferreira, Moreno, Marcio F.

General Fragment Model for Information Artifacts

arXiv.org Artificial IntelligenceSep-9-2019

The use of semantic descriptions in data intensive domains require a systematic model for linking semantic descriptions with their manifestations in fragments of heterogeneous information and data objects. Such information heterogeneity requires a fragment model that is general enough to support the specification of anchors from conceptual models to multiple types of information artifacts. While diverse proposals of anchoring models exist in the literature, they are usually focused in audiovisual information. We propose a generalized fragment model that can be instantiated to different kinds of information artifacts. Our objective is to systematize the way in which fragments and anchors can be described in conceptual models, without committing to a specific vocabulary.

artificial intelligence, information retrieval, natural language, (17 more...)

1909.04117

Genre: Research Report (0.40)

Technology:

Information Technology > Information Management (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (0.48)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.47)

#artificialintelligenceJan-11-2019, 19:06:43 GMT

Multi-modal topic inferencing from videos

Any organization that has a large media archive struggles with the same challenge – how can we transform our media archives into business value? Media content management is hard, and so is content discovery at scale. Content categorization by topics is an intuitive approach that makes it easier for people to search for the content they need. However, content categorization is usually deductive and doesn't necessarily appear explicitly in the video. For example, content that is focused on the topic of'healthcare' may not actually have the word'healthcare' presented in it, which makes the categorization an even harder problem to solve.

machine learning, natural language, video indexer, (19 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)

#artificialintelligenceSep-16-2018, 18:59:45 GMT

Video Indexer – General availability and beyond

Earlier today, we announced the general availability (GA) of Video Indexer. This means that our customers can count on all the metadata goodness of Video Indexer to always be available for them to use when running their business. However, this GA is not the only Video Indexer announcement we have for you. In the time since we released Video Indexer to public preview in May 2018, we never stopped innovating and added a wealth of new capabilities to make Video Indexer more insightful and effective for your video and audio needs. The Video Indexer portal already includes insights and timeline panes that enables our customers to easily review and evaluate media insights.

artificial intelligence, indexer, video indexer, (12 more...)

Technology: Information Technology > Artificial Intelligence > Speech (0.30)