For more on Apache Kafka, Apache Pulsar, Apache Spark, and other data technologies, attend the "Data Engineering & Architecture" sessions at the Strata Data Conference in New York City, September 23-26, 2019. With companies producing data from an increasing number of systems and devices, messaging and event streaming solutions--particularly Apache Kafka--have gained widespread adoption. Over the past year, we've been tracking the progress of Apache Pulsar (Pulsar), a less well-known but highly capable open source solution originated by Yahoo. Pulsar is designed to intelligently process, analyze, and deliver data from an expanding array of services and applications, and thus it fits nicely into modern data platforms. Pulsar is also designed to ease the operational burdens normally associated with complex, distributed systems.
This article is a post in a series on bringing continuous integration and deployment (CI/CD) practices to machine learning. Check back to The New Stack for future installments. The pipeline runs from ingesting and cleaning data, through feature engineering and model selection in an interactive workbench environment, to training and experiments, usually with the option to share results, to deploying the trained model, to serving results like predictions and classifications. The machine learning development and deployment pipelines are often separate, but unless the model is static, it will need to be retrained on new data or updated as the world changes, and updated and versioned in production, which means going through several steps of the pipeline again and again. Managing the complexity of these pipelines is getting harder, especially when you're trying to use real-time data and update models frequently.
The day when armies of business analysts can query incoming data in real time may be drawing closer. Supporting such continuous interactive queries is a goal of KSQL, software put forward this week by the Kafka data-streaming software originators at Confluent Inc. KSQL is a SQL engine that directly handles Apache Kafka data streams. She also said KSQL is intended to broaden the use of Kafka beyond Java and Python, opening up Kafka programming to developers familiar with SQL; although, the form of SQL Confluent is using here is a dialect, one the company has developed to deal with the unique architecture of Kafka streaming. The software is appearing first as a developer preview, and it will be available under an Apache 2.0 license, according to the company. Created at LinkedIn, Kafka began life as a publish-and-subscribe messaging system that focused on handling log files as system events.
Before you open up the presents under the tree, I've got some geekier gifts. In response to execs and luminaries from across the world of data and analytics sharing their predictions for the next year, I've dutifully compiled and stitched them together. Gather round, and soak up this year's batch, which focus on artificial intelligence, data regulation, data governance, the state of the Hadoop market, open source and "the edge." Predictions about artificial intelligence (AI) are all over the map. They range from optimistic and starry-eyed to a bit more skeptical and jaded.
Before you open up the presents under the tree, I've got some geekier gifts. In response to execs and luminaries from across the world of data and analytics sharing their predictions for the next year, I've dutifully compiled and stitched them together. Also read: Analytics in 2018: AI, IoT and multi-cloud, or bust Also read: Big Data's 2017: Can more meta thinking free us from current malaise? Gather round, and soak up this year's batch, which focus on artificial intelligence, data regulation, data governance, the state of the Hadoop market, open source and "the edge." Intelligent predictions about artificial intelligence Predictions about artificial intelligence (AI) are all over the map.