Goto

Collaborating Authors

 lakehouse


Safe, Untrusted, "Proof-Carrying" AI Agents: toward the agentic lakehouse

Tagliabue, Jacopo, Greco, Ciro

arXiv.org Artificial Intelligence

Starting from this prototype, we conclude by outlining practical next steps for a full agentic lakehouse. The paper is organized as follows. After reviewing agent-friendly abstractions (Section II), we address key safety objections for high-stakes scenarios (Section III). Once safety is established, we describe a ReAct [12] loop built on these abstractions (Section IV). We put forward our working prototype as a feasibility demonstration of safe-by-design data agents, not as a full-fledged experimental benchmark. We believe that sharing working code is of great value to the community, especially in times of quickly shifting mental models. However, it is important to remember that our fundamental insights - programmability and safety - can be replicated independently of the chosen APIs. For these reasons, we believe our paper to be valuable to a wide range of practitioners: on one hand, those looking for a new mental map of this uncharted territory; on the other, those looking to be inspired by tinkering with existing implementations and inspecting systems working at scale.


Deep Lake: a Lakehouse for Deep Learning

Hambardzumyan, Sasun, Tuli, Abhinav, Ghukasyan, Levon, Rahman, Fariz, Topchyan, Hrant, Isayan, David, McQuade, Mark, Harutyunyan, Mikayel, Hakobyan, Tatevik, Stranic, Ivo, Buniatyan, Davit

arXiv.org Artificial Intelligence

Traditional data lakes provide critical data infrastructure for analytical workloads by enabling time travel, running SQL queries, ingesting data with ACID transactions, and visualizing petabyte-scale datasets on cloud storage. They allow organizations to break down data silos, unlock data-driven decision-making, improve operational efficiency, and reduce costs. However, as deep learning usage increases, traditional data lakes are not well-designed for applications such as natural language processing (NLP), audio processing, computer vision, and applications involving non-tabular datasets. This paper presents Deep Lake, an open-source lakehouse for deep learning applications developed at Activeloop. Deep Lake maintains the benefits of a vanilla data lake with one key difference: it stores complex data, such as images, videos, annotations, as well as tabular data, in the form of tensors and rapidly streams the data over the network to (a) Tensor Query Language, (b) in-browser visualization engine, or (c) deep learning frameworks without sacrificing GPU utilization. Datasets stored in Deep Lake can be accessed from PyTorch, TensorFlow, JAX, and integrate with numerous MLOps tools.


The Emergence of the Composable Buyer Information Platform - Channel969

#artificialintelligence

This can be a collaborative put up between Databricks, Hightouch, and Snowplow. We thank Martin Lepka (Head of Business Options at Snowplow) and Alec Haase (Product Evangelist at Hightouch) for his or her contributions. There isn't any denying that one of many best belongings to the trendy digital group is first-party buyer information. The fast rise of the privacy-centric client has led to a monumental shift away from third-party monitoring strategies. Organizations are actually scrambling to implement a knowledge infrastructure that, leveraging first-party information, can allow the customized experiences that clients count on with each interplay.


Databricks targets retail vertical with its first industry-specific lakehouse

#artificialintelligence

Did you miss a session from the Future of Work Summit? San Francisco headquartered Databricks, a company that offers the capabilities of a data warehouse and data lake in a single "lakehouse" architecture, today announced its first industry-specific offering: Lakehouse for Retail. Designed specifically for enterprises dealing in the retail and consumer goods vertical, Databricks says Lakehouse for Retail is a fully integrated platform that aims to solve the most critical challenges retailers and their partners face while trying to leverage surging data volumes for AI and analytics projects. The solution, which is generally available as of today, has already seen early adoption from major retail enterprises including Walgreens, Columbia, H&M Group, Reckitt, Restaurant Brands International, 84.51, Co-Op Food, Gousto, and Acosta. "With hundreds of millions of prescriptions processed by Walgreens each year, Databricks' Lakehouse for Retail allows us to unify all of this data and store it in one place for a full range of analytics and ML workloads," said Luigi Guadagno, the VP of pharmacy and healthcare platform at Walgreens.


Databricks announces a new portal named Databricks Partner Connect

#artificialintelligence

Databricks, the Data and AI company and pioneer of the data lakehouse architecture, today announced Databricks Partner Connect, a one-stop portal for customers to quickly discover a broad set of validated data, analytics, and AI tools and easily integrate them with their Databricks lakehouse across multiple cloud providers. Integrations with Databricks partners Fivetran, Labelbox, Microsoft Power BI, Prophecy, Rivery, and Tableau are initially available to customers, with Airbyte, Blitzz, dbt Labs, and many more to come in the months ahead. Enterprises want to drive complexity out of their data infrastructure and adopt more open technologies to take better advantage of analytics and AI. The data lakehouse enabled by Databricks has put thousands of customers on this path, collectively processing multiple exabytes of data a day on a single platform for analytics and AI workloads. But, the data ecosystem is vast, and no one vendor can accomplish everything.


ThoughtSpot adds support for Databricks 'lakehouse' to analytics platform

#artificialintelligence

ThoughtSpot has expanded the number of backend data sources that can be accessed via its cloud-based analytics platform to include the Databricks cloud service based on the Apache Spark framework. A ThoughtSpot for Databricks offering now makes it possible to directly run queries through the ThoughtSpot search engine against a Databricks Lakehouse, a data architecture that combines the features of data lakes and data warehouses, according to Databricks. For nearly a decade, ThoughtSpot has been making the case for an alternative approach to analytics that eliminates the need to rely on a data analyst or IT professional to construct a dashboard. Instead, it presents end users with a search interface through which they can employ natural language to query multiple backend data repositories. That approach enables end users to interrogate data in a more interactive fashion that is not constrained by the limitations of how a dashboard was constructed, said Seann Gardiner, senior vice president of business development for ThoughtSpot.


What is a Lakehouse? - The Databricks Blog

#artificialintelligence

Over the past few years at Databricks, we've seen a new data management paradigm that emerged independently across many customers and use cases: the lakehouse. In this post we describe this new paradigm and its advantages over previous approaches. Data warehouses have a long history in decision support and business intelligence applications. Since its inception in the late 1980s, data warehouse technology continued to evolve and MPP architectures led to systems that were able to handle larger data sizes. But while warehouses were great for structured data, a lot of modern enterprises have to deal with unstructured data, semi-structured data, and data with high variety, velocity, and volume.


Why KPMG is treating employees who want to learn AI to a $450 million training center that feels like a luxury resort

#artificialintelligence

Promoting the company's culture was a top priority when KPMG was making preliminary plans for its new $450 million training facility. It emerges in different ways throughout the 800,000-square-foot facility, some more subtle than others. The hallway to the main conference space, for example, is lined with artifacts from KPMG's heritage, including a ledger from original founder James Marwick dating to 1898. In another area, a set of lights that hang over the cafeteria change colors -- a nod to the importance of diversity at the firm. "There are things that the physical representation here is designed to really reflect what we see as our core kind of cultural aspects," said chief financial officer David Turner.