SSRLabs - Products

SSRLabs has developed dedicated massively parallel coprocessors to improve a variety of computational tasks in modern servers. We support applications heavy in floating-point requirements for traditional High Performance Computing (HPC) at better performance per Dollar spent - both in capital expenses and in lifetime operating costs - and better energy-efficiency than existing solutions. We provide the first coprocessor chipset that can successfully and easily solve n-body problems. Our parallel processing approach scales down to a single core or up to a massive number of them in a linearly scaling fashion. Those coprocessors are intended to analyze huge amounts of data, either in a streaming fashion or offline. They can be used for analysis of single- or multi-dimensional data, including image analysis, and for tracking objects as well as for real-time path prediction.

These comprehensive subsystems include dedicated coprocessors, firmware, software and APIs as well as SDK plugins and highly dense and high-performance 3D stacked memory ASICs for even better performance. openCL and openACC APIs are available for the floating-point coprocessor. We fully support heterogenous computing.

The areas of deployment are

  • Traditional high performance compute (HPC)
  • "Big Data" applications to find structure in unstructured data
  • Non-traditional HPC such as solving any n-body problem
  • In-Memory Compute applications
  • Weather forecast & climate modeling
  • Protein folding in drug discovery
  • Research in cancer and Alzheimers as well as the Autism Spectrum Disorder
  • Financial modeling and high-frequency trading
  • Graph Search
  • Deep Learning, Machine Learning and Artificial Intelligence
  • Encryption/Decryption/Cryptanalysis
  • and many more. They accelerate any system based on current single- or multi-core general-purpose host processors. Unlike other accelerators our coprocessors have the I/O and memory bandwidth to sustain their performance.

    Neural Net Coprocessor

    The Neural Net Coprocessor is a massively parallel accelerator for graph search and image and video analysis (vision systems), Deep Learning and Machine Learning as well as AI applications. Due to its internal architecture and its unparalleled bisection bandwidth as well as the high-bandwidth interfaces to and from memory and to host or other processors they scale out in a more linear fashion than any other Convolutional Neural Net accelerator. We have equipped the Convolutional Neural Net Coprocessor with an easy-to-use API for the most common operations. They are accessible via open source application programming interfaces (APIs) such as TensorFlow and Caffe.

    Floating-Point Coprocessor

    The Floating-Point Coprocessor is a massively parallel coprocessor that accelerates all openCL and openACC applications, replacing CUDA and handwritten C or assembly code for better performance, easier portability and drastically simplified maintenance while providing better performance and scalability, along with higher precision and accuracy. If your CPU, GPGPU or DSP does not get matrix multiplications or FFTs done at the level of throughput you'd like then we will. Our coprocessors can be used in General Purpose HPC such as FEA/FEM, solving n-body problems, mechanical, thermal and electrical simulations including signal integrity and power integrity problems, modeling and a variety of other applications. Its I/O bandwidth outclasses any other accelerator.

    Very Large Capacity Memory

    SSRLabs has developed a Very Large Capacity Memory that supports all host processors irrespective of how many cores they contain (multiple, many or massively parallel cores) with our Universal Host Port (UHP). All connections are point-to-point connections and not multi-drop buses, and thus they maintain performance with a full-duplex bandwidth of 60 GB/s per channel and port.

    Our Very Large Capacity Memory is going to be available in 128 GB and 512 GB versions with an internal bandwidth exceeding any other memory module, and four ports of our UHP for superior external bandwidth of 60 GB/s per port of full-duplex I/O. It connects directly to our coprocessors and accelerators, and it can be connected to any processor via a DDR3/4 to UHP adapter to any processor with four or more of the legacy interfaces.

    Accelerator Appliance

    For demonstration purposes we have started to develop an Accelerator Appliance for SSRLabs' coprocessors and memory modules. The appliance can be equipped with up to four boards with four coprocessors and sixteen of our Very Large Capacity Memory modules each.

    Licensable IP

    While developing SSRLabs' pScale™ Coprocessors we have created non-core IP and synthesizable building blocks that we are interested in licensing out or selling. These building blocks are related to I/O, our Universal Host Port (UHP) and other components.


    We are working on an automated support infrastructure for downloads, updates, FAQs and general customer support as well as for a bug reporting system.