AITopics | ksql

Machine Learning Tutorial with Python, Jupyter, KSQL and TensorFlow

#artificialintelligenceApr-12-2023, 07:15:19 GMT

When Michelangelo started, the most urgent and highest impact use cases were some very high scale problems, which led us to build around Apache Spark (for large-scale data processing and model training) and Java (for low latency, high throughput online serving). This structure worked well for production training and deployment of many models but left a lot to be desired in terms of overhead, flexibility, and ease of use, especially during early prototyping and experimentation [where Notebooks and Python shine]. Uber expanded Michelangelo "to serve any kind of Python model from any source to support other Machine Learning and Deep Learning frameworks like PyTorch and TensorFlow [instead of just using Spark for everything]." So why did Uber (and many other tech companies) build its own platform and framework-independent machine learning infrastructure? The posts How to Build and Deploy Scalable Machine Learning in Production with Apache Kafka and Using Apache Kafka to Drive Cutting-Edge Machine Learning describe the benefits of leveraging the Apache Kafka ecosystem as a central, scalable, and mission-critical nervous system. It allows real-time data ingestion, processing, model deployment, and monitoring in a reliable and scalable way. This post focuses on how the Kafka ecosystem can help solve the impedance mismatch between data scientists, data engineers, and production engineers. By leveraging it to build your own scalable machine learning infrastructure and also make your data scientists happy, you can solve the same problems for which Uber built its own ML platform, Michelangelo.

data scientist, engineer, python, (13 more...)

#artificialintelligence

Industry: Information Technology (0.88)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Machine Learning Tutorial with Python, Jupyter, KSQL and TensorFlow

#artificialintelligenceMar-22-2020, 19:26:02 GMT

When Michelangelo started, the most urgent and highest impact use cases were some very high scale problems, which led us to build around Apache Spark (for large-scale data processing and model training) and Java (for low latency, high throughput online serving). This structure worked well for production training and deployment of many models but left a lot to be desired in terms of overhead, flexibility, and ease of use, especially during early prototyping and experimentation [where Notebooks and Python shine]. Uber expanded Michelangelo "to serve any kind of Python model from any source to support other Machine Learning and Deep Learning frameworks like PyTorch and TensorFlow [instead of just using Spark for everything]." So why did Uber (and many other tech companies) build its own platform and framework-independent machine learning infrastructure? The posts How to Build and Deploy Scalable Machine Learning in Production with Apache Kafka and Using Apache Kafka to Drive Cutting-Edge Machine Learning describe the benefits of leveraging the Apache Kafka ecosystem as a central, scalable, and mission-critical nervous system. It allows real-time data ingestion, processing, model deployment, and monitoring in a reliable and scalable way. This post focuses on how the Kafka ecosystem can help solve the impedance mismatch between data scientists, data engineers, and production engineers. By leveraging it to build your own scalable machine learning infrastructure and also make your data scientists happy, you can solve the same problems for which Uber built its own ML platform, Michelangelo.

artificial intelligence, data scientist, machine learning, (16 more...)

#artificialintelligence

Industry: Information Technology (0.88)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

ML in KSQL

#artificialintelligenceMay-21-2019, 17:49:55 GMT

At HomeAway, we use Apache Kafka as the backbone for our streaming architecture. We also like to deploy machine learning models to make realtime predictions on our data streams. Confluent KSQL provides an easy to use and interactive SQL interface for performing stream processing on Kafka. Below we show how to build a model in Python and use the model in KSQL to make predictions based on a stream of data in Kafka. We use Predictive Model Markup Language (PMML) to enable the ability to train the model using the Python library Scikit-learn, but perform model inference in Java-based KSQL.

artificial intelligence, ksql, machine learning, (17 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Machine Learning With Python, Jupyter, KSQL, and TensorFlow - DZone AI

#artificialintelligenceMar-2-2019, 23:44:18 GMT

Uber expanded Michelangelo "to serve any kind of Python model from any source to support other Machine Learning and Deep Learning frameworks like PyTorch and TensorFlow [instead of just using Spark for everything]." So why did Uber (and many other tech companies) build its own platform and framework-independent machine learning infrastructure? The posts How to Build and Deploy Scalable Machine Learning in Production with Apache Kafka and Using Apache Kafka to Drive Cutting-Edge Machine Learning describe the benefits of leveraging the Apache Kafka ecosystem as a central, scalable, and mission-critical nervous system. It allows real-time data ingestion, processing, model deployment, and monitoring in a reliable and scalable way. This post focuses on how the Kafka ecosystem can help solve the impedance mismatch between data scientists, data engineers, and production engineers. By leveraging it to build your own scalable machine learning infrastructure and also make your data scientists happy, you can solve the same problems for which Uber built its own ML platform, Michelangelo. Based on what I've seen in the field, an impedance mismatch between data scientists, data engineers, and production engineers is the main reason why companies struggle to bring analytic models into production to add business value.

artificial intelligence, data scientist, machine learning, (16 more...)

#artificialintelligence

Industry: Information Technology (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

KSQL in Action: Real-Time Streaming ETL from Oracle Transactional Data

@machinelearnbotMar-21-2018, 12:17:16 GMT

In this post I'm going to show what streaming ETL looks like in practice. My first job from university was building a data warehouse for a retailer in the UK. Back then, it was writing COBOL jobs to load tables in DB2. We waited for all the shops to close and do their end of day system processing, and send their data back to the central mainframe. From there it was checked and loaded, and then reports generated on it.

artificial intelligence, information fusion, real time system, (17 more...)

@machinelearnbot

Country: Europe > United Kingdom (0.24)

Industry: Retail (0.34)

Technology:

Information Technology > Data Science > Data Integration (0.63)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.63)
Information Technology > Architecture > Real Time Systems (0.55)

Add feedback

Getting Started Analyzing Twitter Data in Apache Kafka through KSQL

@machinelearnbotOct-18-2017, 06:15:09 GMT

KSQL is the open source streaming SQL engine for Apache Kafka. It lets you do sophisticated stream processing on Kafka topics, easily, using a simple and interactive SQL interface. In this short article we'll see how easy it is to get up and running with a sandbox for exploring it, using everyone's favourite demo streaming data source: Twitter. We'll go from ingesting the raw stream of tweets, through to filtering it with predicates in KSQL, to building aggregates such as counting the number of tweets per user per hour.

apache kafka, artificial intelligence, natural language, (4 more...)

@machinelearnbot

Industry: Information Technology > Services (0.50)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.50)

Add feedback

Getting Started Analyzing Twitter Data in Apache Kafka through KSQL

@machinelearnbotOct-11-2017, 00:43:31 GMT

You'll probably get a screenful of results; this is because KSQL is actually emitting the aggregation values for the given hourly window each time it updates. Since we've set KSQL to read all messages on the topic (SET'auto.offset.reset' 'earliest';) it's reading all of these messages at once and calculating the aggregation updates as it goes. Our inbound stream of tweets is just that--a stream. But now that we are creating aggregates, we have actually created a table. A table is a snapshot of a given key's values at a given point in time.

analyzing twitter data, artificial intelligence, natural language, (3 more...)

@machinelearnbot

Industry: Information Technology > Services (0.40)

Technology: