Goto

Collaborating Authors

stream processing


Alibaba Cloud Releases Alink Machine Learning Platform on GitHub CDOTrends

#artificialintelligence

Alibaba Cloud (Alibaba) has released the source code its Alink machine learning platform on GitHub. Developed by Alibaba, Alink offers a broad range of algorithm libraries that support both batch and stream processing, vital for machine learning tasks such as online product recommendation and intelligent customer services. According to Alibaba, Alink was developed based on Flink, a unified distributed computing engine. With seamless unification of batch and stream processing, Alibaba says Alink offers a more effective platform for developers to perform data analytics and machine learning tasks. The platform supports open-source data storage such as Kafka, HDFS and HBase, as well as Alibaba's proprietary data storage format.


Alibaba Cloud Releases Machine Learning Algorithm Platform on Github

#artificialintelligence

Alibaba Cloud, the data intelligence backbone of Alibaba Group, announced that the core codes of Alink, its self-developed algorithm platform, have been made available via open source on Github, the world's largest developer community. The platform offers a broad range of algorithm libraries that support both batch and stream processing, which is critical for machine learning tasks such as online product recommendation and intelligent customer services. Data analysts and software developers can access the codes on Github to build their own software, facilitating tasks such as statistics analysis, machine learning, real-time prediction, personalized recommendation and abnormality detection. "As a platform that consists of various algorithms combining learning in various data processing patterns, Alink can be a valuable option for developers looking for robust big data and advanced machine learning tools," said Yangqing Jia, President and Senior Fellow of Data Platform at Alibaba Cloud Intelligence. "As one of the top ten contributors to Github, we are committed to connecting with the open source community as early as possible in our software development cycles.


Alibaba Cloud Releases Machine Learning Algorithm Platform on GitHub

#artificialintelligence

Alibaba Cloud, the data intelligence backbone of Alibaba Group, today announced that the core codes of Alink, its self-developed algorithm platform, have been made available via open source on GitHub, the world's largest developer community. The platform offers a broad range of algorithm libraries that support both batch and stream processing, which is critical for machine learning tasks such as online product recommendation and intelligent customer services. Data analysts and software developers can access the codes on GitHub (https://github.com/alibaba/alink) to build their own software, facilitating tasks such as statistics analysis, machine learning, real-time prediction, personalized recommendation and abnormality detection. "As a platform that consists of various algorithms combining learning in various data processing patterns, Alink can be a valuable option for developers looking for robust big data and advanced machine learning tools," said Yangqing Jia, President and Senior Fellow of Data Platform at Alibaba Cloud Intelligence. "As one of the top ten contributors to GitHub, we are committed to connecting with the open source community as early as possible in our software development cycles.


Apache Spark Streaming Tutorial for Beginners

#artificialintelligence

In a world where we generate data at an extremely fast rate, the correct analysis of the data and providing useful and meaningful results at the right time can provide helpful solutions for many domains dealing with data products. We can apply this in Health Care and Finance to Media, Retail, Travel Services and etc. some solid examples include Netflix providing personalized recommendations at real-time, Amazon tracking your interaction with different products on its platform and providing related products immediately, or any business that needs to stream a large amount of data at real-time and implement different analysis on it. One of the amazing frameworks that can handle big data in real-time and perform different analysis, is Apache Spark. In this blog, we are going to use spark streaming to process high-velocity data at scale. Apache Spark is a lightning-fast cluster computing technology, designed for fast computation.


Python Pandas at Extreme Performance

#artificialintelligence

Today we all choose between the simplicity of Python tools (pandas, Scikit-learn), the scalability of Spark and Hadoop, and the operation readiness of Kubernetes. We end up using them all. We keep separate teams of Python-oriented data scientists, Java and Scala Spark masters, and an army of devops to manage those siloed solutions. Data scientists explore with pandas. Then other teams of data engineers re-code the same logic and make it work at scale, or make it work with live streams using Spark.


Özcep

AAAI Conferences

Mapping percepts to actions is at the heart of the agents metaphor. However, little work has investigated this mapping from a stream-based perspective. Inspired by previous work on the foundations of stream processing, we analyze properties of window-based percept processing and identify properties of stream-processing agents with representation theorems. The resulting axiomatizations help us to deepen the understanding of agents that can be safely implemented.


Towards Foundations of Agents Reasoning on Streams of Percepts

AAAI Conferences

Mapping percepts to actions is at the heart of the agents metaphor. However, little work has investigated this mapping from a stream-based perspective. Inspired by previous work on the foundations of stream processing, we analyze properties of window-based percept processing and identify properties of stream-processing agents with representation theorems. The resulting axiomatizations help us to deepen the understanding of agents that can be safely implemented.


How companies around the world apply machine learning

#artificialintelligence

Check out the full lineup of training courses, tutorials, and sessions at the Strata Data Conference in London, May 21-24, 2018. Companies continue to use data to improve decision-making (business intelligence and analytics) and for automation (machine learning and AI). At the Strata Data Conference in London, we've assembled a program that introduces technologies and techniques, showcases use cases across many industries, and highlights the importance of ethics, privacy, and security. We are bringing back the Strata Business Summit, and this year, we have two days of executive briefings. Data Science and Machine Learning sessions will cover tools, techniques, and case studies.


BigSR: an empirical study of real-time expressive RDF stream reasoning on modern Big Data platforms

arXiv.org Artificial Intelligence

The trade-off between language expressiveness and system scalability (E&S) is a well-known problem in RDF stream reasoning. Higher expressiveness supports more complex reasoning logic, however, it may also hinder system scalability. Current research mainly focuses on logical frameworks suitable for stream reasoning as well as the implementation and the evaluation of prototype systems. These systems are normally developed in a centralized setting which suffer from inherent limited scalability, while an in-depth study of applying distributed solutions to cover E&S is still missing. In this paper, we aim to explore the feasibility of applying modern distributed computing frameworks to meet E&S all together. To do so, we first propose BigSR, a technical demonstrator that supports a positive fragment of the LARS framework. For the sake of generality and to cover a wide variety of use cases, BigSR relies on the two main execution models adopted by major distributed execution frameworks: Bulk Synchronous Processing (BSP) and Record-at-A-Time (RAT). Accordingly, we implement BigSR on top of Apache Spark Streaming (BSP model) and Apache Flink (RAT model). In order to conclude on the impacts of BSP and RAT on E&S, we analyze the ability of the two models to support distributed stream reasoning and identify several types of use cases characterized by their levels of support. This classification allows for quantifying the E&S trade-off by assessing the scalability of each type of use case \wrt its level of expressiveness. Then, we conduct a series of experiments with 15 queries from 4 different datasets. Our experiments show that BigSR over both BSP and RAT generally scales up to high throughput beyond million-triples per second (with or without recursion), and RAT attains sub-millisecond delay for stateless query operators.


Spark for Data Science with Python Big Data Training Simpliv

@machinelearnbot

This team has decades of practical experience in working with Java and with billions of rows of data. Get your data to fly using Spark for analytics, machine learning and data science Let's parse that. If you are an analyst or a data scientist, you're used to having multiple systems for working with data. With Spark, you have a single engine where you can explore and play with large amounts of data, run machine learning algorithms and then use the same system to productionize your code. Analytics: Using Spark and Python you can analyze and explore your data in an interactive environment with fast feedback.