BLIS

From Parts
Jump to: navigation, search

Overview

BLIS is the intellectual successor to GotoBLAS but in the form of a ground-up rewrite and redesign that enables rapid development on new architectures. Unlike GotoBLAS, it is not required to write assembly language to achieve a large fraction of peak performance on some architectures, although vector intrinsics in C may be required.

Source

See Google Code.

The Blue Gene/Q (BGQ) code is in a private repo for now. The merge complexity comes from the threading design features used in both the BGQ and MIC code that are not yet part of the main branch.

People

Bold type indicates people actually doing the work.

Papers

Workshop

http://www.cs.utexas.edu/users/flame/BLISRetreat/index.html

Performance

Blue Gene/Q

These results are extremely preliminary and must be considered to be work-in-progress. Figures do not correspond to a single version of the code and thus may not agree with each other.

The y-axis of the figures is usually normalized to theoretical peak.

Single Core

This is single-core performance running on all four hardware threads (which are pinned to the same core) in c16 mode. The numbers here are lower than when running c1 mode, but the most likely usage of BLAS on 4 threads of 1 core is c16 mode.

BLIS-perf1.png

These are numbers measured in c1 mode.

BLIS-perf6.png

Single Node

Smaller

The BLIS red line is the general-purpose code while the BLIS green line specializes for matrices that fit into L2.

BLIS-perf4.png

Bigger

BLIS-perf5.png

New Microkernels

Fortran

See https://github.com/jeffhammond/blis-old/tree/fortran.