Collaborating Authors

Build an Apache Kafka streaming app using IBM Streams


This is part of the Learning path: Get started with IBM Streams. In this developer code pattern, we walk you through the basics of creating a streaming application powered by Apache Kafka, one of the most popular open source distributed event-streaming platforms used for creating real-time data pipeline and streaming apps. The application will be built using IBM Streams on IBM Cloud Pak for Data. In this pattern, we walk you through the basics of creating a streaming application powered by Apache Kafka. Our app will be built using IBM Streams on IBM Cloud Pak for Data.

Top 7 Data Streaming Tools For Real-Time Analytics


Data streaming is the next wave in the analytics and machine learning landscape as it assists organisations in quick decision-making through real-time analytics. With the increased adoption of cloud computing, data streaming in the cloud is on the rise as it provides agility in data pipeline for various applications and caters to different business needs. Understanding the importance of data streaming, organisations are embracing hybrid platforms in a way that they can leverage the advantages of both batch and streaming data analytics. To assist firms in determining the best data streaming tools, Analytics India Magazine has compiled the most feature-rich tools for instant analytics. Through Amazon Kinesis, organisations can build streaming applications using SQL editor, and open-source Java libraries.

Top 18 Open Source and Commercial Stream Analytics Platforms - Predictive Analytics Today


Stream Analytics helps to develop and deploy solutions to gain real time insights from devices, sensors, and applications by real time stream processing in the cloud. Stream Analytics enables to perform real time analytics for Internet of Things solutions, stream millions of events per second, provide mission critical reliability and performance, also deliver real time dashboards and alerts over data from devices and applications, correlate across multiple streams of data and use SQL based language for development. Stream Analytics customers deploy and monitor streaming jobs. Applications of stream analytics includes personalized, real-time stock-trading analysis and alerts offered by financial services companies, real-time fraud detection; data and identity protection services, analysis of data generated by sensors and actuators, web clickstream analytics, customer relationship management (CRM) alerts, supply chain alerts, transportation alerts. Apache Flink is an open source platform for distributed stream and batch data processing.

Detecting Irregular Patterns in IoT Streaming Data for Fall Detection Artificial Intelligence

Abstract-- Detecting patterns in real time streaming data has been an interesting and challenging data analytics problem. With the proliferation of a variety of sensor devices, real-time analytics of data from the Internet of Things (IoT) to learn regular and irregular patterns has become an important machine learning problem to enable predictive analytics for automated notification and decision support. In this work, we address the problem of learning an irregular human activity pattern, fall, from streaming IoT data from wearable sensors. We present a deep neural network model for detecting fall based on accelerometer data giving 98.75 percent accuracy using an online physical activity monitoring dataset called "MobiAct", which was published by Vavoulas et al. The initial model was developed using IBM Watson studio and then later transferred and deployed on IBM Cloud with the streaming analytics service supported by IBM Streams for monitoring real-time IoT data. We also present the systems architecture of the real-time fall detection framework that we intend to use with Mbientlab's wearable health monitoring sensors for real time patient monitoring at retirement homes or rehabilitation clinics.

Machine Learning Tutorial with Python, Jupyter, KSQL and TensorFlow


When Michelangelo started, the most urgent and highest impact use cases were some very high scale problems, which led us to build around Apache Spark (for large-scale data processing and model training) and Java (for low latency, high throughput online serving). This structure worked well for production training and deployment of many models but left a lot to be desired in terms of overhead, flexibility, and ease of use, especially during early prototyping and experimentation [where Notebooks and Python shine]. Uber expanded Michelangelo "to serve any kind of Python model from any source to support other Machine Learning and Deep Learning frameworks like PyTorch and TensorFlow [instead of just using Spark for everything]." So why did Uber (and many other tech companies) build its own platform and framework-independent machine learning infrastructure? The posts How to Build and Deploy Scalable Machine Learning in Production with Apache Kafka and Using Apache Kafka to Drive Cutting-Edge Machine Learning describe the benefits of leveraging the Apache Kafka ecosystem as a central, scalable, and mission-critical nervous system. It allows real-time data ingestion, processing, model deployment, and monitoring in a reliable and scalable way. This post focuses on how the Kafka ecosystem can help solve the impedance mismatch between data scientists, data engineers, and production engineers. By leveraging it to build your own scalable machine learning infrastructure and also make your data scientists happy, you can solve the same problems for which Uber built its own ML platform, Michelangelo.