AITopics | databrick

Collaborating Authors

databrick

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Rebuilding the data stack for AI

MIT Technology ReviewApr-27-2026, 13:00:00 GMT

Enterprise AI hinges on high-accuracy outputs, requiring better data context, unified architectures, and rigorous measurement frameworks, says Bavesh Patel, senior vice president at Databricks, and Rajan Padmanabhan, unit technology officer at Infosys. Artificial intelligence may be dominating boardroom agendas, but many enterprises are discovering that the biggest obstacle to meaningful adoption is the state of their data. While consumer-facing AI tools have dazzled users with speed and ease, enterprise leaders are discovering that deploying AI at scale requires something far less glamorous but far more consequential: data infrastructure that is unified, governed, and fit for purpose. That gap between AI ambition and enterprise readiness is becoming one of the defining challenges of this next phase of digital transformation. As Bavesh Patel, senior vice president of Databricks, puts it, "the quality of that AI and how effective that AI is, is really dependent on information in your ...

artificial intelligence, databrick, social media, (17 more...)

MIT Technology Review

Industry: Information Technology > Services (0.35)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.68)

Add feedback

The year of the 'hectocorn': the 100bn tech companies that could float in 2026

The GuardianJan-22-2026, 06:00:04 GMT

OpenAI could be valued at $1tn if it launches an initial public offering, Reuters said. OpenAI could be valued at $1tn if it launches an initial public offering, Reuters said. The year of the'hectocorn': the $100bn tech companies that could float in 2026 Y ou've probably heard of "unicorns" - technology startups valued at more than $1bn - but 2026 is shaping up to be the year of the " hectocorn ", with several US and European companies potentially floating on stock markets at valuations over $100bn (£75bn). OpenAI, Anthropic, SpaceX and Stripe are among the big names said to be considering an initial public offering (IPO) this year. The success of their flotations - whether the shares maintain their value, rise or fall - could shape concerns about the AI race and whether the resulting market mania is a bubble .

investor, openai, valuation, (15 more...)

The Guardian

Country:

Europe > United Kingdom (0.15)
Oceania > Australia (0.05)
North America > United States > California > San Francisco County > San Francisco (0.05)
(2 more...)

Industry:

Information Technology (1.00)
Banking & Finance > Trading (1.00)
Government > Regional Government > North America Government > United States Government (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.95)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.70)

Add feedback

The CEO Who Believes AGI Is Already Here

TIME - TechDec-2-2025, 14:32:48 GMT

Welcome back to, TIME's new twice-weekly newsletter about AI. If you're reading this in your browser, why not subscribe to have the next one delivered straight to your inbox? The three most valuable private companies in the U.S. have big reputations: OpenAI, SpaceX, and Anthropic. But the fourth, Databricks, flies a little more under the radar. This company, which is currently raising funds at a valuation of $134 billion according to reports this week, is a quiet workhorse of the AI revolution.

large language model, machine learning, natural language, (17 more...)

TIME - Tech

Country: North America > United States > California (0.05)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.35)

Add feedback

Databricks Has a Trick That Lets AI Models Improve Themselves

WIREDMar-25-2025, 20:00:11 GMT

Databricks, a company that helps big businesses build custom artificial intelligence models, has developed a machine learning trick that can boost the performance of an AI model without the need for clean labelled data. Jonathan Frankle, chief AI scientist at Databricks, spent the past year talking to customers about the key challenges they face in getting AI to work reliably. The problem, Frankle says, is dirty data. "Everybody has some data, and has an idea of what they want to do," Frankle says. But the lack of clean data makes it challenging to fine-tune a model to perform a specific task.. "Nobody shows up with nice, clean fine-tuning data that you can stick into a prompt or an [application programming interface]," for a model.

databrick, large language model, machine learning, (12 more...)

WIRED

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.53)

Add feedback

Meta's Open Source Llama 3 Is Already Nipping at OpenAI's Heels

WIREDApr-25-2024, 16:00:00 GMT

Jerome Pesenti has a few reasons to celebrate Meta's decision last week to release Llama 3, a powerful open source large language model that anyone can download, run, and build on. Pesenti used to be vice president of artificial intelligence at Meta and says he often pushed the company to consider releasing its technology for others to use and build on. But his main reason to rejoice is that his new startup will get access to an AI model that he says is very close in power to OpenAI's industry-leading text generator GPT-4, but considerably cheaper to run and more open to outside scrutiny and modification. "The release last Friday really feels like a game-changer," Pesenti says. His new company, Sizzle, an AI tutor, currently uses GPT-4 and other AI models, both closed and open, to craft problem sets and curricula for students.

llama 3, meta, openai, (14 more...)

WIRED

Genre: Instructional Material > Course Syllabus & Notes (0.56)

Industry: Information Technology (0.31)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.66)

Add feedback

Inside the Creation of DBRX, the World's Most Powerful Open Source AI Model

WIREDMar-27-2024, 12:00:00 GMT

This past Monday, about a dozen engineers and executives at data science and AI company Databricks gathered in conference rooms connected via Zoom to learn if they had succeeded in building a top artificial intelligence language model. The team had spent months, and about 10 million, training DBRX, a large language model similar in design to the one behind OpenAI's ChatGPT. But they wouldn't know how powerful their creation was until results came back from the final tests of its abilities. "We've surpassed everything," Jonathan Frankle, chief neural network architect at Databricks and leader of the team that built DBRX, eventually told the team, which responded with whoops, cheers, and applause emojis. Frankle usually steers clear of caffeine but was taking sips of iced latte after pulling an all-nighter to write up the results.

databrick, dbrx, powerful open source ai model, (10 more...)

WIRED

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.40)

Add feedback

What does Nancy know? Congresswoman Pelosi buys 5m in San Fran software company's stocks - adding to her hugely successful portfolio

Daily Mail - Science & techMar-25-2024, 18:03:19 GMT

Former House Speaker Nancy Pelosi has invested up to 5 million in a San Francisco-based company, adding to her successful portfolio of Big Tech. Documents revealed Pelosi's transaction with privately held Databricks, which is a software company based on AI technology, took place on March 3 and was disclosed on March 21. Databricks is just the latest newcomer to Pelosi's long list of companies, but there are eight major names that she has invested 16 million in since 2022. While she has not broken any laws by buying and selling stocks, many Americans and other government officials see the investments as conflicts of interest since she has access to confidential intelligence and the power to impact businesses. Documents revealed Pelosi's transaction with privately held Databricks, which is a software company based on AI technology, took place on March 3 and disclosed on March 21 Databricks is just the latest newcomer to Pelosi's long list of companies, but there are eight major names that she has invested up to 16.1 million in since 2022 Databricks, founded in 2013, raised 500 million last year based on a 43 billion valuation.

call option, pelosi, portfolio, (15 more...)

Daily Mail - Science & tech

Country: North America > United States > California > San Francisco County > San Francisco (0.25)

Industry:

Information Technology > Software (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Banking & Finance > Trading (1.00)

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback

High-throughput Cotton Phenotyping Big Data Pipeline Lambda Architecture Computer Vision Deep Neural Networks

Issac, Amanda, Ebrahimi, Alireza, Velni, Javad Mohammadpour, Rains, Glen

arXiv.org Artificial IntelligenceMay-9-2023

In this study, we propose a big data pipeline for cotton bloom detection using a Lambda architecture, which enables real-time and batch processing of data. Our proposed approach leverages Azure resources such as Data Factory, Event Grids, Rest APIs, and Databricks. This work is the first to develop and demonstrate the implementation of such a pipeline for plant phenotyping through Azure's cloud computing service. The proposed pipeline consists of data preprocessing, object detection using a YOLOv5 neural network model trained through Azure AutoML, and visualization of object detection bounding boxes on output images. The trained model achieves a mean Average Precision (mAP) score of 0.96, demonstrating its high performance for cotton bloom classification. We evaluate our Lambda architecture pipeline using 9000 images yielding an optimized runtime of 34 minutes. The results illustrate the scalability of the proposed pipeline as a solution for deep learning object detection, with the potential for further expansion through additional Azure processing cores. This work advances the scientific research field by providing a new method for cotton bloom detection on a large dataset and demonstrates the potential of utilizing cloud computing resources, specifically Azure, for efficient and accurate big data processing in precision agriculture.

artificial intelligence, data mining, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2305.05423

Country:

North America > United States > Georgia > Tift County > Tifton (0.28)
North America > United States > Georgia > Clarke County > Athens (0.14)
North America > United States > Texas > Lubbock County > Lubbock (0.04)
(2 more...)

Genre: Research Report > New Finding (0.66)

Industry:

Information Technology > Services (1.00)
Food & Agriculture > Agriculture (1.00)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.84)

Add feedback

Hello Dolly: Democratizing the magic of ChatGPT with open models

#artificialintelligenceApr-17-2023, 12:36:16 GMT

Update Apr 12, 2023: We have released Dolly 2.0, licensed for both research and commercial use. See the new blog post here. We show that anyone can take a dated off-the-shelf open source large language model (LLM) and give it magical ChatGPT-like instruction following ability by training it in 30 minutes on one machine, using high-quality training data. Surprisingly, instruction-following does not seem to require the latest or largest models: our model is only 6 billion parameters, compared to 175 billion for GPT-3. We open source the code for our model (Dolly) and show how it can be re-created on Databricks.

chatgpt, dolly, language model, (14 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Towards Better Instruction Following Language Models for Chinese: Investigating the Impact of Training Data and Evaluation

Ji, Yunjie, Gong, Yan, Deng, Yong, Peng, Yiping, Niu, Qiang, Ma, Baochang, Li, Xiangang

arXiv.org Artificial IntelligenceApr-16-2023

Recently, significant public efforts have been directed towards developing low-cost models with capabilities akin to ChatGPT, thereby fostering the growth of open-source conversational models. However, there remains a scarcity of comprehensive and in-depth evaluations of these models' performance. In this study, we examine the influence of training data factors, including quantity, quality, and linguistic distribution, on model performance. Our analysis is grounded in several publicly accessible, high-quality instruction datasets, as well as our own Chinese multi-turn conversations. We assess various models using a evaluation set of 1,000 samples, encompassing nine real-world scenarios. Our goal is to supplement manual evaluations with quantitative analyses, offering valuable insights for the continued advancement of open-source chat models. Furthermore, to enhance the performance and training and inference efficiency of models in the Chinese domain, we extend the vocabulary of LLaMA - the model with the closest open-source performance to proprietary language models like GPT-3 - and conduct secondary pre-training on 3.4B Chinese words. We make our model, data, as well as code publicly available.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2304.07854

Country:

Asia > Middle East > Jordan (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback