Goto

Collaborating Authors

 qdrant


Assessing RAG and HyDE on 1B vs. 4B-Parameter Gemma LLMs for Personal Assistants Integretion

arXiv.org Artificial Intelligence

Resource efficiency is a critical barrier to deploying large language models (LLMs) in edge and privacy-sensitive applications. This study evaluates the efficacy of two augmentation strategies--Retrieval-Augmented Generation (RAG) and Hypothetical Document Embeddings (HyDE)--on compact Gemma LLMs of 1 billion and 4 billion parameters, within the context of a privacy-first personal assistant. We implement short-term memory via MongoDB and long-term semantic storage via Qdrant, orchestrated through FastAPI and LangChain, and expose the system through a React.js frontend. Across both model scales, RAG consistently reduces latency by up to 17\% and eliminates factual hallucinations when responding to user-specific and domain-specific queries. HyDE, by contrast, enhances semantic relevance--particularly for complex physics prompts--but incurs a 25--40\% increase in response time and a non-negligible hallucination rate in personal-data retrieval. Comparing 1 B to 4 B models, we observe that scaling yields marginal throughput gains for baseline and RAG pipelines, but magnifies HyDE's computational overhead and variability. Our findings position RAG as the pragmatic choice for on-device personal assistants powered by small-scale LLMs.


Recapping the Computer Vision Meetup -- December 2022

#artificialintelligence

Last week Voxel51 hosted the December 2022 Computer Vision Meetup. Our amazing speakers shared insightful presentations, the virtual room was packed, and the Q&A was vibrant! In this blog post we provide the recordings, recap presentation highlights and Q&A, as well as share the upcoming Meetup schedule so that you can join us at a future event. Hope to see you soon! In lieu of swag, we gave Meetup attendees the opportunity to help guide our monthly donation to charitable causes. The charity that received the highest number of votes was Children International.


GitHub - qdrant/qdrant: Qdrant - Vector Search Engine for the next generation of AI applications

#artificialintelligence

Qdrant (read: quadrant) is a vector similarity search engine. It provides a production-ready service with a convenient API to store, search, and manage points - vectors with an additional payload. Qdrant is tailored to extended filtering support. It makes it useful for all sorts of neural-network or semantic-based matching, faceted search, and other applications. Qdrant is written in Rust, which makes it fast and reliable even under high load.


Nearest Neighbor Embeddings Search with Qdrant and FiftyOne

#artificialintelligence

Neural network embeddings are a low-dimensional representation of input data that give rise to a variety of applications. Embeddings have some interesting capabilities, as they are able to capture the semantics of the data points. This is especially useful for unstructured data like images and videos, so you can not only encode pixel similarities but also some more complex relationships. Performing searches over these embeddings gives rise to a lot of use cases like classification, building up the recommendation systems, or even anomaly detection. One of the primary benefits of performing a nearest neighbor search on embeddings to accomplish these tasks is that there is no need to create a custom network for every new problem, you can often just use pre-trained models.