## SSRLabs - Deployment Guide

You need to improve performance of your data center, accelerate Big Data applications and cut execution time for your HPC users? You are not alone. SSRLabs has accelerators for Big Data and HPC as well as advanced memories that enable you to improve performance while reducing operating costs. Don't fall for software solutions when hardware is required. Do you remember Syncronys SoftRAM RAM Size Doublers from 1995?

We are your partner in your quest to offer better solutions in Big Data, HPC and Cloud Computing for your customers.

**Big Data:** Big Data is all about keeping data in memory - not on disk. If your supplier tells you that they have SSDs instead of hard disks, you know that they don't understand your requirements. Ask us for our advice on how to improve the efficiency and performance of your data center. Most Big Data applications benefit from very large main memory, so naturally we suggest deploying our vlcRAM in conjunction with an UHP-enabled processor. SSRLabs has the right solution for your Big Data applications that are not served well by MapReduce schemes.

**Cloud Computing:** Ultimately, Cloud Computing refers to an interconnected set of data centers storing your data and executing your applications. Behind every data center there are lots of servers - and the more energy-efficient and instruction-efficient they can be made, the better it is for every single user. SSRLabs has the right solutions for numerically intensive applications, for Big Data applications and for any Artificial Intelligence, Machine Learning and Deep Learning applications in your private, public or hybrid cloud.

**Machine Learning, Deep Learning and Artificial Intelligence:** These applications fall into one or both of two categories. The first part is the training part. Training requires a decent of amount of more or less traditional HPC such as matrix math and tensor math. This can be solved with SSRLabs' Floating-Point accelerator and the vlcRAM. The second part of above applications is the inference part. If all the data center does is infer then the (neuromorphic) convolutional neural network accelerator and the vlcRAM would be good choices as they perform better than a SIMD GPU and a traditional CPU at much lower levels of power consumption. We provide openCL for the Floating-Point coprocessor so that no code rewrite is needed, and TensorFlow and Caffe2 for the neuromorphic convolutional neural network accelerator.

**HPC:** A quick look at the world's fastest supercomputers reveals a number of issues. Peak theoretical performance and measured performance differ quite substantially. For BLAS, an embarrassingly parallel problem, the efficiency on Tianhe-2 is a mere 62%, and for other computational workloads the efficiency is even lower. However, among supercomputers this is one of the better levels of efficiency. Other supercomputers fare far worse - particularly those that deploy SIMD accelerators such as GPGPUs. Simple meshes inside accelerators don't work well either, as Tilera's lack of success has demonstrated. If we have a look where we are at today and where the DOE's ExaFLOPS Challenge wants the HPC industry to be, let's just look at the numbers. Today's highest performing supercomputer is Tianhe-2 with about 34 PFLOPS of numeric performance at a power consumption of roughly 18 MW. That turns out to deliver about 1.889 GFLOPS/W. In other words, Tianhe-2 delivers nearly 2 Billion floating-point operations per second per Watt of electricity it consumes, running BLAS as a benchmark. The DOE asks for 1 ExaFLOPS (that is 10^18 floating-point operations per second) at a total allowable power consumption of 20 MW, and presumably for a more normalized mix of benchmarks. That boils down to 50 GFLOPS/W. In other words, the energy-efficiency of today's supercomputers must improve by a factor of more than 25 to fulfill the ExaFLOPS Challenge. Not even Moore's Law - if we assume it will continue to be true - will afford us that until the 2020 deadline. It is clear that simply banking on Moore's Law won't get us there. Architectural changes are required, and that is what SSRLabs does.