BLIS is the intellectual successor to GotoBLAS but in the form of a ground-up rewrite and redesign that enables rapid development on new architectures. Unlike GotoBLAS, it is not required to write assembly language to achieve a large fraction of peak performance on some architectures, although vector intrinsics in C may be required.
See Google Code.
The Blue Gene/Q (BGQ) code is in a private repo for now. The merge complexity comes from the threading design features used in both the BGQ and MIC code that are not yet part of the main branch.
Bold type indicates people actually doing the work.
- Robert van de Geijn - Supreme Leader.
- Field van Zee - Lead developer of BLIS and libflame.
- Tyler Smith - Author of Blue Gene/Q optimized kernels (among other platforms).
- Jeff Hammond - Blue Gene/Q fanboy and BLIS cheerleader.
- John Gunnels Top500 and Gordon Bell Prize god at IBM.
- BLIS: A Modern Alternative to the BLAS
- BLIS: A Framework for Generating BLAS-like Libraries (FLAME Working Note #66)
- Implementing Level-3 BLAS with BLIS: Early Experience (FLAME Working Note #69)
- Opportunities for Parallelism in Matrix Multiplication (FLAME Working Note #71)
- Anatomy of High-Performance Many-Threaded Matrix Multiplication (IPDPS preprint derived from FLAME Working Note #71)
These results are extremely preliminary and must be considered to be work-in-progress. Figures do not correspond to a single version of the code and thus may not agree with each other.
The y-axis of the figures is usually normalized to theoretical peak.
This is single-core performance running on all four hardware threads (which are pinned to the same core) in c16 mode. The numbers here are lower than when running c1 mode, but the most likely usage of BLAS on 4 threads of 1 core is c16 mode.
These are numbers measured in c1 mode.
The BLIS red line is the general-purpose code while the BLIS green line specializes for matrices that fit into L2.