Goto

Collaborating Authors

 hortonwork


Why Best-of-Breed is a Better Choice than All-in-One Platforms for Data Science

#artificialintelligence

So you need to redesign your company's data infrastructure. Do you buy a solution from a big integration company like IBM, Cloudera, or Amazon? Do you engage many small startups, each focused on one part of the problem? We see trends shifting towards focused best-of-breed platforms. That is, products that are laser-focused on one aspect of the data science and machine learning workflows, in contrast to all-in-one platforms that attempt to solve the entire space of data workflows.


Future of Data: Princeton, New Jersey (Princeton, NJ)

#artificialintelligence

In this talk I will show data engineers and architects how to run real-time TensorFlow Inception Image Recognition on images captured by remote sensors and images in tweets. In the same flow I will also demonstrate how to apply real-time sentiment analysis and intelligent routing of data to Phoenix, Email and Slack. I will elaborate on a number of different sentiment analysis frameworks available for use within Apache NiFi including Python NLTK, Stanford CoreNLP, Python SpaCy and Python TextBlob. This talk will be a deep dive into how to manage complex dataflow pipelines ingesting from multiple streaming sources including social, public open data feeds, logs, drones, RDBMS and IoT with transformations, deep learning, machine learning and business rules. Data engineers will be shown the power of Apache NiFi for loading diverse sources of data, applying transformations in-stream, routing based on attributes, adding sentiment data to workflows, running deep learning algorithms in stream and storing data into Apache Phoenix on HBase.


Cloudera and Hortonworks combo to push CDP, machine learning

#artificialintelligence

A would-be data management juggernaut got its first public airing as Cloudera -- a combination of formerly separate Hadoop pioneers Cloudera and Hortonworks -- as the newly stand-alone vendor's leaders publicly mapped the road it intends to take forward. "The combination has made sense for many years," said Tom Reilly, CEO of the combined companies, who held a similar role at the former Cloudera. Others agreed these leaders in open-source-oriented big data tooling -- built along lines drawn by big web companies, such as Google and Yahoo -- are better together than apart and can offer users a unified big data platform. Reilly spoke as part of a prerecorded webcast heralding the new company, which came after confirmation that shareholders of Cloudera and Hortonworks had approved a merger of the firms -- a deal first disclosed last October. Cloudera faces distinct challenges, as it moves data applications to the cloud and tries to convey users to the fast-growing new world of machine learning and AI.


Can data analytics inform new business model development?

#artificialintelligence

Information technology has enabled the upheaval of longstanding business models, whether in the globalization of product supply chains, the outsourcing of manufacturing, delivery and support processes or the use of efficient, just-in-time manufacturing. While the changes have many benefits, they have also served to make the product development process more complicated, with one notable exception: the supply chain simplification from disintermediation. Disintermediation, namely the elimination of middlemen, took off with online commerce, but the notion of more directly connecting customers and suppliers has started to seep into almost every business sector. While disintermediation has historically involved streamlining a linear supply chain, the next stage in business evolution entails creating bidirectional connections between various stakeholders in the product development, delivery and purchase process. That's the thesis of Hortonworks' CEO Rob Bearden who characterizes the evolution of linear supply chains that follow a set of procedural processes into a mesh of connected communities comprised of customers, suppliers, producers, manufacturers and service providers as Hyperbolic?


Dataworks Summit - Big Data meets multi-cloud

#artificialintelligence

'The network is the computer' was the mantra of the early days of connected systems, but it took the Internet to fully realize the concept. In today's era of smart sensors, cheap storage and sophisticated algorithms, an apt aphorism might be'the data is the business' in that business decisions, new services and product strategies are fueled by the analysis of massive amounts of mundane data. The ability to collect, store and analyze such routine data as transaction records, system logs, sensor readings and location information with increasing granularity has the potential to turn what was formerly lost or ignored information into valuable business assets. The organizations that are most adept at spinning the digital straw into gold find themselves at a significant competitive advantage. Aside from the advances in core infrastructure, perhaps nothing has been as responsible for the rise of data-inspired business decisions as the Hadoop ecosystem of open source distributed data storage and processing software.


Deep Learning and GPU Acceleration in Hadoop 3.0 - Hortonworks

#artificialintelligence

Other hot Machine Learning examples that Jim mentioned were fraud detection, customer service (by understanding customer sentiment and recommending the next best action, e.g. the right person or product), deep insights on asset and supply management, smart cities and drone delivery. And what he really stressed, was that before any model training can take place, data preparation and data organization are critical, pointing to the Hortonworks Data Flow (HDF) and Hortonworks Data Platform (HDP) that together manage the entire data lifecycle, from the edge all the way to the data center, on-prem, in the cloud, or in a hybrid architecture of the two. Only then applying Nvidia's powerful GPUs to train Deep Learning models to drive new insights makes sense.


Apache Hadoop 3.1- a Giant Leap for Big Data - Hortonworks

@machinelearnbot

When we are in the outdoors, many of us often feel the need for a camera- that is intelligent enough to follow us, adjust to the terrain heights and visually navigate through the obstacles, while capturing panoramic videos. Here, I am talking about autonomous self-flying drones, very similar to cars on auto pilot. The difference is that we are starting to see proliferation of artificial intelligence into affordable, everyday use cases, compared to relatively expensive cars. This helps them distinguish between objects and get better with more data. Recently, Roni Fontaine at Hortonworks published a blog titled "How Apache Hadoop 3 Adds Value Over Apache Hadoop 2", capturing the high-level themes.


The Emergence of Data Marketplaces - Hortonworks

@machinelearnbot

When we talk about the exponential growth of data in today's digital world, the word "exponential" seems to be such an understatement. In 2017 alone, more data was generated than in the past 5000 years combined, and this will rise tenfold in less than a decade. A significant contributor to that data growth will be IoT. An IDC 2017 report predicts that by 2025, more than a quarter of data created across the world will be real time in nature, and real-time IoT data will make up more than 95% of this. In essence, the need for gaining access to the correct set of data at the correct time and context is very essential to take corrective actions within the opportunity window.


Intro to Machine Learning with Apache Spark and Apache Zeppelin - Hortonworks

#artificialintelligence

In this tutorial, we will introduce you to Machine Learning with Apache Spark. The hands-on lab for this tutorial is an Apache Zeppelin notebook that has all the steps necessary to ingest and explore data, train, test, visualize, and save a model. We will cover a basic Linear Regression model that will allow us perform simple predictions on a sample data. This model can be further expanded and modified to fit your needs. Most importantly, by the end of this tutorial, you will understand how to create an end-to-end pipeline for setting up and training simple models in Spark.


These Three Companies Are Tops In Artificial Intelligence

#artificialintelligence

Computer software giants Salesforce.com (CRM) and Microsoft (MSFT) are top software picks to prosper from artificial intelligence -- along with a much smaller company, Hortonworks (HDP) -- according to one Wall Street brokerage. "Microsoft is positioned to be the biggest beneficiary to AI in our coverage due to its cloud-computing infrastructure (Azure)," Barclays analyst Raimo Lenschow said in a note to clients. "Salesforce.com is still in the early stages of rolling out Einstein across its entire product portfolio. Hortonworks, along with its partnership with IBM (IBM), gives customers a path to manage and analyze Big Data." At a basic level, artificial intelligence is the use of computer algorithms to attempt to replicate the human ability to learn, reason and make decisions.