indexer
Internet 3.0: Architecture for a Web-of-Agents with it's Algorithm for Ranking Agents
Krishnamachari, Rajesh Tembarai, Rajesh, Srividya
AI agents -- powered by reasoning-capable large language models (LLMs) and integrated with tools, data, and web search -- are poised to transform the internet into a \emph{Web of Agents}: a machine-native ecosystem where autonomous agents interact, collaborate, and execute tasks at scale. Realizing this vision requires \emph{Agent Ranking} -- selecting agents not only by declared capabilities but by proven, recent performance. Unlike Web~1.0's PageRank, a global, transparent network of agent interactions does not exist; usage signals are fragmented and private, making ranking infeasible without coordination. We propose \textbf{DOVIS}, a five-layer operational protocol (\emph{Discovery, Orchestration, Verification, Incentives, Semantics}) that enables the collection of minimal, privacy-preserving aggregates of usage and performance across the ecosystem. On this substrate, we implement \textbf{AgentRank-UC}, a dynamic, trust-aware algorithm that combines \emph{usage} (selection frequency) and \emph{competence} (outcome quality, cost, safety, latency) into a unified ranking. We present simulation results and theoretical guarantees on convergence, robustness, and Sybil resistance, demonstrating the viability of coordinated protocols and performance-aware ranking in enabling a scalable, trustworthy Agentic Web.
- Information Technology > Information Management > Search (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
Kamae: Bridging Spark and Keras for Seamless ML Preprocessing
Barrowclough, George, Andrecki, Marian, Shinner, James, Donghi, Daniele
In production recommender systems, feature preprocessing must be faithfully replicated across training and inference environments. This often requires duplicating logic between offline and online environments, increasing engineering effort and introducing risks of dataset shift. We present Kamae, an open-source Python library that bridges this gap by translating PySpark preprocessing pipelines into equivalent Keras models. Kamae provides a suite of configurable Spark transformers and estimators, each mapped to a corresponding Keras layer, enabling consistent, end-to-end preprocessing across the ML lifecycle. Framework's utility is illustrated on real-world use cases, including MovieLens dataset and Expedia's Learning-to-Rank pipelines. The code is available at https://github.com/ExpediaGroup/kamae.
- Europe > Czechia > Prague (0.06)
- Europe > United Kingdom > England > Greater London > London (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- (4 more...)
EHI: End-to-end Learning of Hierarchical Index for Efficient Dense Retrieval
Kumar, Ramnath, Mittal, Anshul, Gupta, Nilesh, Kusupati, Aditya, Dhillon, Inderjit, Jain, Prateek
Dense embedding-based retrieval is now the industry standard for semantic search and ranking problems, like obtaining relevant web documents for a given query. Such techniques use a two-stage process: (a) contrastive learning to train a dual encoder to embed both the query and documents and (b) approximate nearest neighbor search (ANNS) for finding similar documents for a given query. These two stages are disjoint; the learned embeddings might be ill-suited for the ANNS method and vice-versa, leading to suboptimal performance. In this work, we propose End-to-end Hierarchical Indexing -- EHI -- that jointly learns both the embeddings and the ANNS structure to optimize retrieval performance. EHI uses a standard dual encoder model for embedding queries and documents while learning an inverted file index (IVF) style tree structure for efficient ANNS. To ensure stable and efficient learning of discrete tree-based ANNS structure, EHI introduces the notion of dense path embedding that captures the position of a query/document in the tree. We demonstrate the effectiveness of EHI on several benchmarks, including de-facto industry standard MS MARCO (Dev set and TREC DL19) datasets. For example, with the same compute budget, EHI outperforms state-of-the-art (SOTA) in by 0.6% (MRR@10) on MS MARCO dev set and by 4.2% (nDCG@10) on TREC DL19 benchmarks.
- Research Report > Experimental Study (0.67)
- Research Report > New Finding (0.67)
- Information Technology > Information Management > Search (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
Multi-Agent Dynamic Pricing in a Blockchain Protocol Using Gaussian Bandits
Asseman, Alexis, Kornuta, Tomasz, Patel, Anirudh, Deible, Matt, Green, Sam
The Graph Protocol indexes historical blockchain transaction data and makes it available for querying. As the protocol is decentralized, there are many independent Indexers that index and compete with each other for serving queries to the Consumers. One dimension along which Indexers compete is pricing. In this paper, we propose a bandit-based algorithm for maximization of Indexers' revenue via Consumer budget discovery. We present the design and the considerations we had to make for a dynamic pricing algorithm being used by multiple agents simultaneously. We discuss the results achieved by our dynamic pricing bandits both in simulation and deployed into production on one of the Indexers operating on Ethereum. We have open-sourced both the simulation framework and tools we created, which other Indexers have since started to adapt into their own workflows.
- North America > United States > California > Santa Clara County > Los Altos (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Joan Fontanals – Principal Engineer – Jina.AI
I had a pleasure to sit down with Joan Fontanals – Principal Engineer with Jina.AI -- framework with lots of capabilities to support your neural search journey. Listen to or watch the podcast and get a chance to win awesome swag from Jina.AI. As a special line of thank-yous, I'd like to mention Saurabh Rai, who kindly designed the Thumbnail of this episode!
DSC Weekly Digest 31 August 2021
Programmers, when first learning their trade, spend a few weeks or months working on the basics - the syntax of the language, how to work with strings and numbers, how to assign variables, and how to create basic functions. About this time, they also encounter two of their first data structures: lists and dictionaries. Lists can be surprisingly complex structures, but in most cases, they consist of sequences of items with pointers (or links) from one item to the next. While navigation can be handled by traversing the linked list (also known as an array), most often this is shortcircuited by passing in a numeric index that can be given from 0 (or 1 in some languages) to the position of whatever item is required. A similar structure is known as a dictionary.
Enhance Your Search Applications with Artificial Intelligence
Users expect to see that friendly search box in their applications. They seem to really like it, because it's so simple to use. You don't need a user manual to figure out search. In fact, if your application doesn't have search, you'll be pelted with negative reviews. No wonder you see search in so many applications. It's very difficult to implement. We all know it's more than just simple text matching. Those of us with database backgrounds know that searching for "prefix*" is a lot easier than searching for "*suffix". And users want to do all sorts of weird searches like "*run*", which should match ran, or shrunken or brunt, or--you get the idea. Quick search results and performance are important, as is accuracy and ranking.
General Fragment Model for Information Artifacts
Fiorini, Sandro Rama, Santos, Wallas Sousa dos, Mesquita, Rodrigo Costa, Lima, Guilherme Ferreira, Moreno, Marcio F.
The use of semantic descriptions in data intensive domains require a systematic model for linking semantic descriptions with their manifestations in fragments of heterogeneous information and data objects. Such information heterogeneity requires a fragment model that is general enough to support the specification of anchors from conceptual models to multiple types of information artifacts. While diverse proposals of anchoring models exist in the literature, they are usually focused in audiovisual information. We propose a generalized fragment model that can be instantiated to different kinds of information artifacts. Our objective is to systematize the way in which fragments and anchors can be described in conceptual models, without committing to a specific vocabulary.
- South America > Brazil (0.04)
- North America > United States (0.04)
Multi-modal topic inferencing from videos
Any organization that has a large media archive struggles with the same challenge – how can we transform our media archives into business value? Media content management is hard, and so is content discovery at scale. Content categorization by topics is an intuitive approach that makes it easier for people to search for the content they need. However, content categorization is usually deductive and doesn't necessarily appear explicitly in the video. For example, content that is focused on the topic of'healthcare' may not actually have the word'healthcare' presented in it, which makes the categorization an even harder problem to solve.
Video Indexer – General availability and beyond
Earlier today, we announced the general availability (GA) of Video Indexer. This means that our customers can count on all the metadata goodness of Video Indexer to always be available for them to use when running their business. However, this GA is not the only Video Indexer announcement we have for you. In the time since we released Video Indexer to public preview in May 2018, we never stopped innovating and added a wealth of new capabilities to make Video Indexer more insightful and effective for your video and audio needs. The Video Indexer portal already includes insights and timeline panes that enables our customers to easily review and evaluate media insights.