Goto

Collaborating Authors

 databrick workflow


Scalable Vector Search for AI Apps with Milvus and Databricks

#artificialintelligence

Multi-modal embeddings are all the rage these days. Everyone wants a piece of them because they give you a way to convert unstructured data to representations that are useful for understanding the semantic nature of unstructured assets -- across image, text, audio, video, etc. These representations are vectors that can be used for a variety of purposes across use cases which require models for image similarity, deduplication, anomaly detection, text similarity, audio classification, video understanding, etc. To top that off, you don't have to be a data scientist with deep ML expertise to build these systems, nor do you need to have large amounts of data to start leveraging them. This is fine until you run into actual "hands on the keyboard" work for production.


Sharing Context Between Tasks in Databricks Workflows - The Databricks Blog

#artificialintelligence

Databricks Workflows is a fully-managed service on Databricks that makes it easy to build and manage complex data and ML pipelines in your lakehouse without the need to operate complex infrastructure. Sometimes, a task in an ETL or ML pipeline depends on the output of an upstream task. An example would be to evaluate the performance of a machine learning model and then have a task determine whether to retrain the model based on model metrics. Since these are two separate steps, it would be best to have separate tasks perform the work. Previously, accessing information from a previous task required storing this information outside of the job's context, such as in a Delta table.