AITopics | lakehouse

Collaborating Authors

lakehouse

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Trustworthy AI in the Agentic Lakehouse: from Concurrency to Governance

Tagliabue, Jacopo, Bianchi, Federico, Greco, Ciro

arXiv.org Artificial IntelligenceNov-21-2025

Even as AI capabilities improve, most enterprises do not consider agents trustworthy enough to work on production data. In this paper, we argue that the path to trustworthy agentic workflows begins with solving the infrastructure problem first: traditional lakehouses are not suited for agent access patterns, but if we design one around transactions, governance follows. In particular, we draw an operational analogy to MVCC in databases and show why a direct transplant fails in a decoupled, multi-language setting. We then propose an agent-first design, Bauplan, that reimplements data and compute isolation in the lakehouse. We conclude by sharing a reference implementation of a self-healing pipeline in Bauplan, which seamlessly couples agent reasoning with all the desired guarantees for correctness and trust.

artificial intelligence, lakehouse, pipeline, (17 more...)

arXiv.org Artificial Intelligence

2511.16402

Country: North America > United States > South Carolina > Charleston County (0.14)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence (0.89)

Add feedback

Safe, Untrusted, "Proof-Carrying" AI Agents: toward the agentic lakehouse

Tagliabue, Jacopo, Greco, Ciro

arXiv.org Artificial IntelligenceOct-13-2025

Starting from this prototype, we conclude by outlining practical next steps for a full agentic lakehouse. The paper is organized as follows. After reviewing agent-friendly abstractions (Section II), we address key safety objections for high-stakes scenarios (Section III). Once safety is established, we describe a ReAct [12] loop built on these abstractions (Section IV). We put forward our working prototype as a feasibility demonstration of safe-by-design data agents, not as a full-fledged experimental benchmark. We believe that sharing working code is of great value to the community, especially in times of quickly shifting mental models. However, it is important to remember that our fundamental insights - programmability and safety - can be replicated independently of the chosen APIs. For these reasons, we believe our paper to be valuable to a wide range of practitioners: on one hand, those looking for a new mental map of this uncharted territory; on the other, those looking to be inspired by tinkering with existing implementations and inspecting systems working at scale.

data mining, large language model, natural language, (21 more...)

arXiv.org Artificial Intelligence

2510.09567

Genre: Research Report (0.42)

Technology:

Information Technology > Data Science > Data Mining (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.65)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.49)

Add feedback

Deep Lake: a Lakehouse for Deep Learning

Hambardzumyan, Sasun, Tuli, Abhinav, Ghukasyan, Levon, Rahman, Fariz, Topchyan, Hrant, Isayan, David, McQuade, Mark, Harutyunyan, Mikayel, Hakobyan, Tatevik, Stranic, Ivo, Buniatyan, Davit

arXiv.org Artificial IntelligenceDec-13-2022

Traditional data lakes provide critical data infrastructure for analytical workloads by enabling time travel, running SQL queries, ingesting data with ACID transactions, and visualizing petabyte-scale datasets on cloud storage. They allow organizations to break down data silos, unlock data-driven decision-making, improve operational efficiency, and reduce costs. However, as deep learning usage increases, traditional data lakes are not well-designed for applications such as natural language processing (NLP), audio processing, computer vision, and applications involving non-tabular datasets. This paper presents Deep Lake, an open-source lakehouse for deep learning applications developed at Activeloop. Deep Lake maintains the benefits of a vanilla data lake with one key difference: it stores complex data, such as images, videos, annotations, as well as tabular data, in the form of tensors and rapidly streams the data over the network to (a) Tensor Query Language, (b) in-browser visualization engine, or (c) deep learning frameworks without sacrificing GPU utilization. Datasets stored in Deep Lake can be accessed from PyTorch, TensorFlow, JAX, and integrate with numerous MLOps tools.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2209.10785

Country:

Europe > Netherlands > North Holland > Amsterdam (0.05)
North America > United States > California > Santa Clara County > Mountain View (0.04)
Europe > Portugal > Lisbon > Lisbon (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Industry:

Information Technology (0.68)
Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

The Emergence of the Composable Buyer Information Platform - Channel969

#artificialintelligenceJun-24-2022, 23:45:29 GMT

This can be a collaborative put up between Databricks, Hightouch, and Snowplow. We thank Martin Lepka (Head of Business Options at Snowplow) and Alec Haase (Product Evangelist at Hightouch) for his or her contributions. There isn't any denying that one of many best belongings to the trendy digital group is first-party buyer information. The fast rise of the privacy-centric client has led to a monumental shift away from third-party monitoring strategies. Organizations are actually scrambling to implement a knowledge infrastructure that, leveraging first-party information, can allow the customized experiences that clients count on with each interplay.

cdp, composable cdp, information, (15 more...)

#artificialintelligence

Country: North America > United States > California > San Francisco County > San Francisco (0.04)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (0.34)

Add feedback

Databricks targets retail vertical with its first industry-specific lakehouse

#artificialintelligenceJan-13-2022, 14:06:17 GMT

Did you miss a session from the Future of Work Summit? San Francisco headquartered Databricks, a company that offers the capabilities of a data warehouse and data lake in a single "lakehouse" architecture, today announced its first industry-specific offering: Lakehouse for Retail. Designed specifically for enterprises dealing in the retail and consumer goods vertical, Databricks says Lakehouse for Retail is a fully integrated platform that aims to solve the most critical challenges retailers and their partners face while trying to leverage surging data volumes for AI and analytics projects. The solution, which is generally available as of today, has already seen early adoption from major retail enterprises including Walgreens, Columbia, H&M Group, Reckitt, Restaurant Brands International, 84.51, Co-Op Food, Gousto, and Acosta. "With hundreds of millions of prescriptions processed by Walgreens each year, Databricks' Lakehouse for Retail allows us to unify all of this data and store it in one place for a full range of analytics and ML workloads," said Luigi Guadagno, the VP of pharmacy and healthcare platform at Walgreens.

databrick, lakehouse, snowflake, (12 more...)

#artificialintelligence

Country:

North America > United States > California > San Francisco County > San Francisco (0.25)
North America > United States > Montana (0.16)

Industry:

Retail (1.00)
Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.54)
Information Technology > Data Science > Data Mining > Big Data (0.39)

Add feedback

Databricks announces a new portal named Databricks Partner Connect

#artificialintelligenceNov-19-2021, 14:00:27 GMT

Databricks, the Data and AI company and pioneer of the data lakehouse architecture, today announced Databricks Partner Connect, a one-stop portal for customers to quickly discover a broad set of validated data, analytics, and AI tools and easily integrate them with their Databricks lakehouse across multiple cloud providers. Integrations with Databricks partners Fivetran, Labelbox, Microsoft Power BI, Prophecy, Rivery, and Tableau are initially available to customers, with Airbyte, Blitzz, dbt Labs, and many more to come in the months ahead. Enterprises want to drive complexity out of their data infrastructure and adopt more open technologies to take better advantage of analytics and AI. The data lakehouse enabled by Databricks has put thousands of customers on this path, collectively processing multiple exabytes of data a day on a single platform for analytics and AI workloads. But, the data ecosystem is vast, and no one vendor can accomplish everything.

customer, lakehouse, partner connect, (13 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback

ThoughtSpot adds support for Databricks 'lakehouse' to analytics platform

#artificialintelligenceMay-26-2021, 21:20:25 GMT

ThoughtSpot has expanded the number of backend data sources that can be accessed via its cloud-based analytics platform to include the Databricks cloud service based on the Apache Spark framework. A ThoughtSpot for Databricks offering now makes it possible to directly run queries through the ThoughtSpot search engine against a Databricks Lakehouse, a data architecture that combines the features of data lakes and data warehouses, according to Databricks. For nearly a decade, ThoughtSpot has been making the case for an alternative approach to analytics that eliminates the need to rely on a data analyst or IT professional to construct a dashboard. Instead, it presents end users with a search interface through which they can employ natural language to query multiple backend data repositories. That approach enables end users to interrogate data in a more interactive fashion that is not constrained by the limitations of how a dashboard was constructed, said Seann Gardiner, senior vice president of business development for ThoughtSpot.

platform, query, thoughtspot, (9 more...)

#artificialintelligence

Industry: Information Technology > Services (0.57)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.57)

Add feedback

What is a Lakehouse? - The Databricks Blog

#artificialintelligenceFeb-3-2020, 04:09:22 GMT

Over the past few years at Databricks, we've seen a new data management paradigm that emerged independently across many customers and use cases: the lakehouse. In this post we describe this new paradigm and its advantages over previous approaches. Data warehouses have a long history in decision support and business intelligence applications. Since its inception in the late 1980s, data warehouse technology continued to evolve and MPP architectures led to systems that were able to handle larger data sizes. But while warehouses were great for structured data, a lot of modern enterprises have to deal with unstructured data, semi-structured data, and data with high variety, velocity, and volume.

data warehouse, lakehouse, warehouse, (12 more...)

#artificialintelligence

Industry: Information Technology > Security & Privacy (0.30)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (0.36)

Add feedback

Why KPMG is treating employees who want to learn AI to a $450 million training center that feels like a luxury resort

#artificialintelligenceJan-14-2020, 22:19:51 GMT

Promoting the company's culture was a top priority when KPMG was making preliminary plans for its new $450 million training facility. It emerges in different ways throughout the 800,000-square-foot facility, some more subtle than others. The hallway to the main conference space, for example, is lined with artifacts from KPMG's heritage, including a ledger from original founder James Marwick dating to 1898. In another area, a set of lights that hang over the cafeteria change colors -- a nod to the importance of diversity at the firm. "There are things that the physical representation here is designed to really reflect what we see as our core kind of cultural aspects," said chief financial officer David Turner.

kpmg, training center, turner, (3 more...)

#artificialintelligence

Industry:

Professional Services (1.00)
Consumer Products & Services (0.76)
Education > Educational Setting (0.71)

Technology: Information Technology > Artificial Intelligence (0.51)

Add feedback