Results


Introducing Social Hash Partitioner, a scalable distributed hypergraph partitioner

#artificialintelligence

As a single host has limited storage and compute resources, our storage systems shard data items over multiple hosts and our batch jobs execute over clusters of thousands of workers, to scale and speed-up the computation. Our VLDB'17 paper, Social Hash Partitioner: A Scalable Distributed Hypergraph Partitioner, describes a new method for partitioning bipartite graphs while minimizing fan-out. We describe the resulting framework as a Social Hash Partitioner (SHP) because it can be used as the hypergraph partitioning component of the Social Hash framework introduced in our earlier NSDI'16 paper. The fan-out reduction model is applicable to many infrastructure optimization problems at Facebook, like data sharding, query routing and index compression.


China Pushes Breadth-First Search Across Ten Million Cores

#artificialintelligence

There is increasing interplay between the worlds of machine learning and high performance computing (HPC). This began with a shared hardware and software story since many supercomputing tricks of the trade play well into deep learning, but as we look to next generation machines, the bond keeps tightening. Many supercomputing sites are figuring out how to work deep learning into their existing workflows, either as a pre- or post-processing step, while some research areas might do away with traditional supercomputing simulations altogether eventually. While these massive machines were designed with simulations in mind, the strongest supers have architectures that parallel the unique requirements of training and inference workloads. One such system in the U.S. is the future Summit supercomputer coming to Oak Ridge National Lab later this year, but many of the other architectures that are especially sporting for machine learning are in China and Japan--and feature non-standard processing elements.


Apache Spark: A Unified Engine for Big Data Processing

#artificialintelligence

Analyses performed using Spark of brain activity in a larval zebrafish: embedding dynamics of whole-brain activity into lower-dimensional trajectories. This open source computing framework unifies streaming, batch, and interactive big data workloads to unlock new applications. The growth of data volumes in industry and research poses tremendous opportunities, as well as tremendous computational challenges. As data sizes have outpaced the capabilities of single machines, users have needed new systems to scale out computations to multiple nodes. As a result, there has been an explosion of new cluster programming models targeting diverse computing workloads.1,4,7,10


Intel's latest Xeon chips based on Skylake due next year

PCWorld

The company will release new Xeon server chips based on Skylake in mid-2017, and they will boast big performance increases, said Barry Davis, general manager for the accelerated workload group at Intel. The move toward machine learning is even driving changes in server configurations, with more customers buying servers with graphics processors. The new chips will boast advanced processing features that will bring big performance gains to AI tasks, Davis said. Another feature is on-chip support for Intel OmniPath, a proprietary high-speed interconnect that links servers, storage, networking, and other data-center hardware.


Apache Spark: A Unified Engine for Big Data Processing

@machinelearnbot

Analyses performed using Spark of brain activity in a larval zebrafish: embedding dynamics of whole-brain activity into lower-dimensional trajectories. The growth of data volumes in industry and research poses tremendous opportunities, as well as tremendous computational challenges. As data sizes have outpaced the capabilities of single machines, users have needed new systems to scale out computations to multiple nodes. As a result, there has been an explosion of new cluster programming models targeting diverse computing workloads.1,4,7,10 At first, these models were relatively specialized, with new models developed for new workloads; for example, MapReduce4 supported batch processing, but Google also developed Dremel13 for interactive SQL queries and Pregel11 for iterative graph algorithms.


Apache Spark

Communications of the ACM

Analyses performed using Spark of brain activity in a larval zebrafish: embedding dynamics of whole-brain activity into lower-dimensional trajectories. The growth of data volumes in industry and research poses tremendous opportunities, as well as tremendous computational challenges. As data sizes have outpaced the capabilities of single machines, users have needed new systems to scale out computations to multiple nodes. As a result, there has been an explosion of new cluster programming models targeting diverse computing workloads.1,4,7,10 At first, these models were relatively specialized, with new models developed for new workloads; for example, MapReduce4 supported batch processing, but Google also developed Dremel13 for interactive SQL queries and Pregel11 for iterative graph algorithms.


Why Intel Is Tweaking Xeon Phi For Deep Learning

#artificialintelligence

The Knights Landing Xeon Phi chips, which have been shipping in volume since June, deliver a peak performance of 3.46 teraflops at double precision and 6.92 teraflops at single precision, but do not support half precision math like the Pascal GPUs do. The Pascal chips, which run at 300 watts, would still deliver better performance per watt – specifically, 70.7 gigaflops per watt compared to the hypothetical Knights Mill chip based on Knights Landing we are talking about above, which would deliver 56 gigaflops per watt. The "Knights Corner" chip from 2013 was rated at a slightly more than 2 teraflops single precision, and the Knights Landing chip from this year is rated at 6.92 teraflops single precision. Thus, we have a strong feeling that the chart above is not to scale, or that Intel showed half precision for the Knights Mill part and single precision for the Knights Corner and Knights Landing parts.


Tech Leaders Unite to Enable New Cloud Datacenter Server Designs for Big Data, Machine Learning, Analytics, and Other Emerging Workloads

#artificialintelligence

SAN JOSE, CA--(Marketwired - Oct 14, 2016) - Technology leaders AMD, Dell EMC, Google, Hewlett Packard Enterprise, IBM, Mellanox Technologies, Micron, NVIDIA and Xilinx today announced a new, open specification that can increase datacenter server performance by up to 10x, enabling corporate and cloud data centers to speed up big data, machine learning, analytics, and other emerging workloads. AMD "AMD is supporting OpenCAPI to bring high-performance accelerators from the Radeon Technologies Group into the datacenter, consistent with our work to establish open standards for accelerators that work across multiple processor architectures and suppliers," said Greg Stoner, AMD senior director, Radeon Open Compute. "Open standards and the open collaborations between companies and organization is key to develop the needed technology for the next generation cloud, Web 2.0, high performance, machine learning, big data, storage and more infrastructures." Xilinx "Xilinx is fully committed to bring high performance accelerators to market," said Gaurav Singh, vice president of Architecture at Xilinx.


Machine Learning Plays Role in Reinvention of Database Science - DATAVERSITY

#artificialintelligence

Deep Engine from Deep Information Sciences "is an adaptive database kernel and information orchestration system that leverages machine learning to completely transform how scalability and performance are achieved," says chief strategy officer Chad Jones. Welcome to the Deep Engine CASSI (Continuous Adaptive Sequential Summarization of Information) algorithm, Deep Engine's evolved approach to structured and unstructured databases for handling Big Data in all its aspects. That includes separating algorithm behavior from data structure; splitting memory and storage into independent structures; introducing kernel scheduling techniques to utilize hardware; introducing a layer to observe and adapt to workloads/resources; using machine learning to define structure and schedule resources; having dynamic and continuous online calibration; and embedding metadata (cardinality, counts, cost and so on) in data. With CASSI, databases can observe and analyze the host hardware and workloads and then use machine learning algorithms to predict behavior and dynamically adapt to ever-changing scenarios, without going offline.


Partnerships Will Drive NVIDIA Corporation (NASDAQ:NVDA) Stock higher

#artificialintelligence

International Business Machines (NYSE:IBM) announced three new servers built for cognitive computing, Artificial Intelligence (AI) and Machine Learning (ML). One of the new servers, the Power System S822LC for High-Performance Computing, leverages the power of NVIDIA's (NSDQ:NVDA) Tesla P100 Graphical Processing Units (GPUs) and NVLink to deliver high-performance analytics and enable deep learning applications for Big Data. The tight coupling of IBM and NVIDIA technology enables five times faster internal data flow, accelerating critical applications such as advanced analytics, deep learning and AI. In fact, while large tech companies like IBM compete to deliver faster servers and AI applications for Big Data, NVIDIA is pursuing a pick-and-shovel strategy - selling key components to all the participants in the race - in the niche of specialized hardware accelerators for cognitive computing and AI, where it is the leader and has only a few competitors.