Tuning Transpose is a tutorial I've used in ALCF workshops in the past.
- Fortran information
- How To Write Fast Numerical Code: A Small Introduction
- Is Parallel Programming Hard, And, If So, What Can You Do About It?
- Fast Barrier for x86 Platforms
- Concurrency Kit
- C99 restrict:
- Kaz's links for low-level hackers
- Agner Fog's Software optimization resources
- The GotoBLAS/BLIS Approach to Optimizing Matrix-Matrix Multiplication
- LLNL's Introduction to Parallel Computing
- John McCalpin's blog
- Roscoe Bartlett's links on C++ Software Engineering
NWChem is a massively parallel quantum chemistry code that supports a wide variety of methods.
MPQC is a massively parallel quantum chemistry code that uses portable software (MPI and Pthreads) and supports a limited range of methods (DFT and MP2). MPQC also supports explicitly correlated ("R12") methods.
LAMMPS is a widely-used, general purpose molecular dynamics code that runs on supercomputers.
Coupled cluster methods are of great interest to me. I started writing this page for someone who wanted to learn about them.
Elemental is a modern, parallel, dense linear algebra library.
Eigen is a modern, serial, dense linear algebra library.
Performance Tools and Debuggers
- My favorite tool is the Kernighan debugger ("The most effective debugging tool is still careful thought, coupled with judiciously placed print statements." - Brian Kernighan in "Unix for Beginners"), not because I like print statements, but because it is too easy to avoid careful thought when using fancy tools.
- I find there is still nothing better than GDB and Valgrind for debugging tricky errors.
- For performance measurement, I use TAU, gprof and HPM (relevant only on Blue Gene).
Programming Models and Runtimes
The standard models
MPI is far and away the most popular parallel programming and is used in more than 99% of the parallel applications that run on modern supercomputers and clusters.
Shared memory (between processes) of the POSIX variety is not widely used by application programmers but is widely used in system software.
While Charm++ is not widely used in the way that MPI is, NAMD is arguably the most widely used code in all of open science, hence one can argue that Charm++ is used by the tens of thousands of people running NAMD.
See PGAS for an overview. Note that I conflate PGAS with one-sided despite understanding the difference between the two. If a one-sided model takes remote addresses as an argument, that's close enough to PGAS for my purposes.
Implementing a Symmetric Heap is related to implementing PGAS models efficiently.
MPI3-RMA provides all of the communication features required by most PGAS models.
Pthreads is common for systems programming and for codes like MADNESS and MPQC that require a more dynamic threading model than OpenMP provides. Using Pthreads from C and C++ is straightforward but not so much from other languages (e.g. Fortran).
Intel Thread Building Blocks, or TBB, provides dynamic and static parallelism for C++, along with a number of helpful utility features, such as multidimensional iterators. Because TBB is available as OSS from Intel, it is quite portable and runs on a number of non-x86 platforms, including Blue Gene systems.
OpenCL is an industry-standard interface for GPUs. It's probably best described as a compiler target.
OpenMP is probably the most common threading model for scientific applications. It is primarily a fork-join model and well-suited for data parallelism. Implementing more complex parallel motifs can be more challenging.
OpenACC is a directive-based model that resembles OpenMP, but explicitly targets accelerators. At least some of the features of OpenACC will be part of OpenMP 4.
CUDA is the best way to get performance out of an NVIDIA GPU. Do not let anyone tell you otherwise.
Blue Gene systems are developed by IBM.
The K computer was developed by Fujitsu.
Recent Cray supercomputers include the XT, XE, XK, and XC series.
Intel MIC is not really a supercomputer but rather a powerful coprocessor that can be used to build supercomputers.
Allocations is my page on how to get access to supercomputers (for free).
Mac is not a supercomputer by any means but a lot of people use it for development.
A summary of modern CPU hardware is here.