Results


Introducing Social Hash Partitioner, a scalable distributed hypergraph partitioner

#artificialintelligence

As a single host has limited storage and compute resources, our storage systems shard data items over multiple hosts and our batch jobs execute over clusters of thousands of workers, to scale and speed-up the computation. Our VLDB'17 paper, Social Hash Partitioner: A Scalable Distributed Hypergraph Partitioner, describes a new method for partitioning bipartite graphs while minimizing fan-out. We describe the resulting framework as a Social Hash Partitioner (SHP) because it can be used as the hypergraph partitioning component of the Social Hash framework introduced in our earlier NSDI'16 paper. The fan-out reduction model is applicable to many infrastructure optimization problems at Facebook, like data sharding, query routing and index compression.


Apache Spark: A Unified Engine for Big Data Processing

#artificialintelligence

Analyses performed using Spark of brain activity in a larval zebrafish: embedding dynamics of whole-brain activity into lower-dimensional trajectories. This open source computing framework unifies streaming, batch, and interactive big data workloads to unlock new applications. The growth of data volumes in industry and research poses tremendous opportunities, as well as tremendous computational challenges. As data sizes have outpaced the capabilities of single machines, users have needed new systems to scale out computations to multiple nodes. As a result, there has been an explosion of new cluster programming models targeting diverse computing workloads.1,4,7,10


Apache Spark: A Unified Engine for Big Data Processing

@machinelearnbot

Analyses performed using Spark of brain activity in a larval zebrafish: embedding dynamics of whole-brain activity into lower-dimensional trajectories. The growth of data volumes in industry and research poses tremendous opportunities, as well as tremendous computational challenges. As data sizes have outpaced the capabilities of single machines, users have needed new systems to scale out computations to multiple nodes. As a result, there has been an explosion of new cluster programming models targeting diverse computing workloads.1,4,7,10 At first, these models were relatively specialized, with new models developed for new workloads; for example, MapReduce4 supported batch processing, but Google also developed Dremel13 for interactive SQL queries and Pregel11 for iterative graph algorithms.


Apache Spark

Communications of the ACM

Analyses performed using Spark of brain activity in a larval zebrafish: embedding dynamics of whole-brain activity into lower-dimensional trajectories. The growth of data volumes in industry and research poses tremendous opportunities, as well as tremendous computational challenges. As data sizes have outpaced the capabilities of single machines, users have needed new systems to scale out computations to multiple nodes. As a result, there has been an explosion of new cluster programming models targeting diverse computing workloads.1,4,7,10 At first, these models were relatively specialized, with new models developed for new workloads; for example, MapReduce4 supported batch processing, but Google also developed Dremel13 for interactive SQL queries and Pregel11 for iterative graph algorithms.


Boston Limited Introduces New Deep Learning Platform at ISC 2016

#artificialintelligence

Boston Limited have introduced the latest weapon to their machine learning armoury in the guise of the Boston ANNA Pascal, a new NVIDIA Tesla GPU-based solution at ISC 2016 in Frankfurt, Germany. By introducing four ground-breaking technologies, the appliance enables the system to deliver lightning fast, absolute performance to HPC and deep learning workloads with infinite computing needs. Deep learning is the fastest-growing field within this sphere and today's advanced deep neural networks use algorithms, big data, and the computational power of GPUs to reduce time-to-solution and to improve the accuracy of results. "With support from the Tesla platform based on our new innovative Pascal architecture, Boston is delivering innovative, high-powered solutions to tackle the most demanding HPC and artificial intelligence workloads."


Intel Launches 'Knights Landing' Phi Family for HPC, Machine Learning

#artificialintelligence

From ISC 2016 in Frankfurt, Germany, this week, Intel Corp. launched the second-generation Xeon Phi product family, formerly code-named Knights Landing, aimed at HPC and machine learning workloads. "We're not just a specialized programming model," said Intel's General Manager, HPC Compute and Networking, Barry Davis in a hand-on technical demo held at ISC. "Knights Landing" also puts integrated on-package memory in a processor, which benefits memory bandwidth and overall application performance. The Pascal P100 GPU for NVLink-optimized servers offers 5.3 teraflops of double-precision floating point performance, and the PCIe version supports 4.7 teraflops of double-precision.


Google Takes Unconventional Route with Homegrown Machine Learning Chips

#artificialintelligence

At the tail end of Google's keynote speech at its developer conference Wednesday, Sundar Pichai, Google's CEO mentioned that Google had built its own chip for machine learning jobs that it calls a Tensor Processing Unit, or TPU. The boast was that the TPU offered "an order of magnitude" improvement in the performance per watt for machine learning. Any company building a custom chip for a dedicated workload is worth noting, because building a new processor is a multimillion-dollar effort when you consider hiring a design team, the cost of getting a chip to production and building the hardware and software infrastructure for it. However, Google's achievement with the TPU may not be as earth shattering or innovative as it might seem given the coverage in the press. To understand what Google has done, it's important to understand a bit about how machine learning works and the demands it makes on a processor.


HPE Chases Deep Learning With GPU Laden Apollo Systems

#artificialintelligence

The SL6500s, which were dense machines designed explicitly to have lots of GPU accelerators hanging off Xeon CPUs, rolled out shortly after that and were updated last in November 2012. With the Apollo 6500, which has more modern "Haswell" Xeon E5 v3 and soon "Broadwell" Xeon E5 v4 processors, HPE is again adding two PCI-Express switches to the XL270d hybrid server node that was created for the Apollo 6500. Aside from supporting the most recent Xeon processors, the XL270d sleds used in the Apollo 6500 hybrid nodes can support up to 1 TB of memory per node and those eight GPUs per sled can run as high as 350 watts each. "We are seeing that there is an insatiable appetite for GPU computing for deep learning workloads," explains Ram, which explains why HPE went back to the drawing board and came up with a better design that could provide the power and cooling to support accelerators that run hotter and provide a lot more performance on floating point work.