Collaborating Authors


How to Build Scalable Real-time Applications on a Databricks Lakehouse with Confluent


For many organizations, real-time data collection and data processing at scale can provide immense advantages for business and operational insights. The need for real-time data introduces technical challenges that require skilled expert experience to build custom integration for a successful real-time implementation. For customers looking to implement streaming real-time applications, our partner Confluent recently announced a new Databricks Connector for Confluent Cloud. This new fully-managed connector is designed specifically for the data lakehouse and provides a powerful solution to build and scale real-time applications such as application monitoring, internet of things (IoT), fraud detection, personalization and gaming leaderboards. Organizations can now use an integrated capability that streams legacy and cloud data from Confluent Cloud directly into the Databricks Lakehouse for business intelligence (BI), data analytics and machine learning use cases on a single platform.

Forecasting: theory and practice Machine Learning

Forecasting has always been at the forefront of decision making and planning. The uncertainty that surrounds the future is both exciting and challenging, with individuals and organisations seeking to minimise risks and maximise utilities. The large number of forecasting applications calls for a diverse set of forecasting methods to tackle real-life challenges. This article provides a non-systematic review of the theory and the practice of forecasting. We provide an overview of a wide range of theoretical, state-of-the-art models, methods, principles, and approaches to prepare, produce, organise, and evaluate forecasts. We then demonstrate how such theoretical concepts are applied in a variety of real-life contexts. We do not claim that this review is an exhaustive list of methods and applications. However, we wish that our encyclopedic presentation will offer a point of reference for the rich work that has been undertaken over the last decades, with some key insights for the future of forecasting theory and practice. Given its encyclopedic nature, the intended mode of reading is non-linear. We offer cross-references to allow the readers to navigate through the various topics. We complement the theoretical concepts and applications covered by large lists of free or open-source software implementations and publicly-available databases.

Roadmap on Signal Processing for Next Generation Measurement Systems Artificial Intelligence

Signal processing is a fundamental component of almost any sensor-enabled system, with a wide range of applications across different scientific disciplines. Time series data, images, and video sequences comprise representative forms of signals that can be enhanced and analysed for information extraction and quantification. The recent advances in artificial intelligence and machine learning are shifting the research attention towards intelligent, data-driven, signal processing. This roadmap presents a critical overview of the state-of-the-art methods and applications aiming to highlight future challenges and research opportunities towards next generation measurement systems. It covers a broad spectrum of topics ranging from basic to industrial research, organized in concise thematic sections that reflect the trends and the impacts of current and future developments per research field. Furthermore, it offers guidance to researchers and funding agencies in identifying new prospects.

Appliance-Level Monitoring with Micro-Moment Smart Plugs Artificial Intelligence

Human population are striving against energy-related issues that not only affects society and the development of the world, but also causes global warming. A variety of broad approaches have been developed by both industry and the research community. However, there is an ever increasing need for comprehensive, end-to-end solutions aimed at transforming human behavior rather than device metrics and benchmarks. In this paper, a micro-moment-based smart plug system is proposed as part of a larger multi-appliance energy efficiency program. The smart plug, which includes two sub-units: the power consumption unit and environmental monitoring unit collect energy consumption of appliances along with contextual information, such as temperature, humidity, luminosity and room occupancy respectively. The plug also allows home automation capability. With the accompanying mobile application, end-users can visualize energy consumption data along with ambient environmental information. Current implementation results show that the proposed system delivers cost-effective deployment while maintaining adequate computation and wireless performance.

ELT with Amazon Redshift – An Overview


If you've been in Data Engineering, or what we once referred to as Business Intelligence, for more than a few years you've probably spent time building an ETL process. With the advent of (relatively) cheap storage and processing power in data warehouses, the majority of bulk data processing today is designed as ELT instead. Though this post speaks specifically to Amazon Redshift, most of the content is relevant to other similar data warehouse architectures such as Azure SQL Data Warehouse, Snowflake and Google BigQuery. First, ETL stands for "Extract-Transform-Load", while ELT just switches to order to "Extract-Load-Transform". Both are approaches to batch data processing used to feed data to a data warehouse and make it useful to analysts and reporting tools.

A 20-Year Community Roadmap for Artificial Intelligence Research in the US Artificial Intelligence

Decades of research in artificial intelligence (AI) have produced formidable technologies that are providing immense benefit to industry, government, and society. AI systems can now translate across multiple languages, identify objects in images and video, streamline manufacturing processes, and control cars. The deployment of AI systems has not only created a trillion-dollar industry that is projected to quadruple in three years, but has also exposed the need to make AI systems fair, explainable, trustworthy, and secure. Future AI systems will rightfully be expected to reason effectively about the world in which they (and people) operate, handling complex tasks and responsibilities effectively and ethically, engaging in meaningful communication, and improving their awareness through experience. Achieving the full potential of AI technologies poses research challenges that require a radical transformation of the AI research enterprise, facilitated by significant and sustained investment. These are the major recommendations of a recent community effort coordinated by the Computing Community Consortium and the Association for the Advancement of Artificial Intelligence to formulate a Roadmap for AI research and development over the next two decades.

Putting the Power of Kafka into the Hands of Data Scientists


Over a year ago, my fellow data infrastructure engineers and I broke ground on a total rewrite of our event delivery infrastructure. Our mission was to build a robust, centralized data integration platform tailored to the needs of our Data Scientists. The platform would be fully self-service, so as to maximize the Data Scientists' autonomy and give them complete control over their event data. Ultimately, we delivered a platform that is revolutionizing the way Data Scientists interact with Stitch Fix's data. In two parts, this post peeks into Stitch Fix's Data Science culture and delves into how it drove the fundamental decisions we made in our lowest levels of data infrastructure. Part 1 discusses our design process, explains our guiding philosophy around self-service tooling and explores our data integration platform concept. Part 2 is a technical dive into the decisions we made and a walk-through of the whole architecture.

The devil we know: A new wave of change forces the data revolution to adapt or perish


"Our planning and movements systems are still cumbersome. . . We have been schooled on planning by using big matrices, with every cell to be filled before moving forward. We have learned a passion for detail, but not necessarily how to compromise for the sake of urgency. Surely with all modern capabilities, we can be much more timely in deployment planning, as well as operational analyses and preparations." Rarely, and probably never, in history has the old guard, with its antiquated methods, laid down its arms and surrendered without a vigorous fight. History, if it's kind to us, will eventually record that the greatest technological disruptions were caused by the resistors to change rather than the revolutionaries. This is the story of the setup for such a defense.

[video] Real-Time Data Integration @CloudExpo @StriimTeam #IoT #AI #DX #Analytics #SmartCities


"The Striim platform is a full end-to-end streaming integration and analytics platform that is middleware that covers a lot of different use cases," explained Steve Wilkes, Founder and CTO at Striim, in this Cloud computing is now being embraced by a majority of enterprises of all sizes. Yesterday's debate about public vs. private has transformed into the reality of hybrid cloud: a recent survey shows that 74% of enterprises have a hybrid cloud strategy. Meanwhile, 94% of enterprises are using some form of XaaS - software, platform, and infrastructure as a service. With major technology companies and startups seriously embracing Cloud strategies, now is the perfect time to attend 21st Cloud Expo, October 31 - November 2, 2017, at the Santa Clara Convention Center, CA, and June 12-14, 2018, at the Javits Center in New York City, NY, and learn what is going on, contribute to the discussions, and ensure that your enterprise is on the right path to Digital Transformation.