How to compile HPL (LINPACK)

This guide will show you how to compile HPL (Linpack) and provide some tips for selecting the best input values for hpl.dat based on my experiences at the student cluster competitions.

This benchmark stresses the computers floating point operation capabilities.

Although just calculating FLOPs is not reflective of applications typically run on supercomputers, floating point is still important when precise calculations are required.

I assume a version of mpi, c/c++/fotran compilers, blas and whatever libraries you need are installed.

There are many versions of linpack for different archictures, ranging from an intel version to a CUDA version. The modifications for all versions are very similar. Below I have linked some of the different versions.

Compiling HPL

The first step is to make a copy of an existing makefile in the setup/ folder and place this in the root directory of HPL. I suggest Make.Linux_ATHLON_CBLAS, since that is the closest to generic systems. Call this file Make.[whatever] For CUDA and Intel, Make.CUDA and Make.intel64 are already created for you.

Here, you may or may not need to modify TOPdir. Typically you should specify the full path to your HPL directory.

Next is specifying the location of your MPI files and binaries. MPdir should specify the exact file path to the version of MPI you want to use, up to the root where include, lib, and bin are located.

MPinc should be the same as image above.

MPlib is similar to the image above, except libmpich.a would depend on the MPI version you installed.

** Note: if you are using a *.so instead of *.a, like the one in the image, then you need to add the library path to your environment.

vim ~/.bashrc

Add the following to the end of the file:

export $LD_LIBRARY_PATH=/path to mpi/lib:$LD_LIBRARY_PATH

Then, reload the file and now it can be found every time you log on to your system.

source ~/.bashrc

Next is linking the BLAS libraries. LAdir should specify the exact location of the BLAS binary.

LAinc should specify the include directory, if you need to include it

LAlib should specify the BLAS binary. If the blas file is a *.so file, you can follow the steps above to add it to your environment.

For HPL_OPTS, add -DHPL_DETAILED_TIMING, for better analysis of tuning the HPL.dat file.

Lastly, you can specify your compiler(cc) and compiler flags(ccflags).

Now to compile:

make arch=[whatever]

If you linked everything correctly, then in the bin/[whatever]/ directoy, there should be a .xhpl binary. Otherwise, you need to figure out which library was not linked properly.

Now navigate to bin/[whatever]/ to modify the HPL.dat file.

Modifying HPL.dat

The most important lines are:

  • Ns – size of matrix
  • Nbs – block sizes each process should operate on at a time
  • Ps and Qs – PxQ process you want to run the matrix on

N should typically be around 80-90% of the size of total memory.

N can be calculate by:
N = sqrt((Memory Size in Gbytes * 1024^3 * Number of Nodes) / Double Precison(8)) * percentage

Large Ns yield better results.


NBs is typically around 32 to 256. A small NBs is good for single node. NBs in the low 100 or 200 is good for multiple nodes. Larger NBs close to 1000 or more than 1000 is good for GPUs and accelerators.

I normally select NBs to be multiple of Ns, so there is no performance dropoff on towards the end of the calculation.

For the rest of the parameters, you can read about it here. You can find the CUDA tuning information in the CUDA HPL version. For Intel, you can find it here.

How to Install CUDA on NVIDIA Jetson TX1 [Deprecated]

Updated 2016 post – detailed version. (The method shown in this guide is outdated) This guide shows you how to install CUDA on the NVIDIA Jetson TX1. Currently, Nvidia’s Jetpack installer does not work properly. This blog post will show a work-around for getting CUDA to work on the TX1.

Download the following files inside a directory first. Here are the two links for the files that you will need to download beforehand:

Updating your apt-get sources

Navigate to the directory where you downloaded the files and type in:

dpkg -i cuda-repo-l4t-r23.1-7-0-local_7.0-71_armhf.deb

Next, you want to update the sources by typing in:

apt-get update

Now, your apt-get repositories will have all of the CUDA libraries and files you may need for any future modifications.

Installing CUDA dependencies

Next, go to the directory where you downloaded the Jetpack installer and make the file executable by typing in:

chmod +x

Now, run the file:


The .run file should have unpacked its contents into a new directory called “_installer”

Go into the _installer directory and type in:

./ ../cuda-repo-l4t-r23.1-7-0-local_7.0-71_armhf.deb 7.0 7-0

**Note that ../cuda-repo-l4t-r23.1-7-0-local_7.0-71_armhf.deb is the location of the .deb file you downloaded earlier.

Now, you have every CUDA dependency installed. However, there are a few more things you have to do.


NVCC as a global call

NVCC is not linked globally (nvcc -V gives an error), and you need to do a few more things to fix this. First, let’s edit the .bashrc file.

vim .bashrc

The screenshot below shots what should be appended to the .bashrc file after the installation. I also put a copy of the exports below.

“:$PATH” should be after “export PATH=/usr/local/cuda-7.0/bin”

Now, execute the .bashrc file.

source .bashrc

Now to make sure that everything is working, type in:

nvcc -V


Running Some CUDA Samples

Now, let’s run some CUDA samples and scale the GPU to max frequency.

Scaling GPU Frequency

You can find out your GPU rates by typing in:

cat /sys/kernel/debug/clock/gbus/possible_rates

Now, let’s set the GPU frequency to its maximum possible rate for some performance purposes.

echo 998400000 > /sys/kernel/debug/clock/override.gbus/rate
echo 1 > /sys/kernel/debug/clock/override.gbus/state
cat /sys/kernel/debug/clock/gbus/rate

You can lower the GPU frequency with the same steps above.

Running the nbody simulation and smoke particles

Navigate to the simulations (5_Simulations) directory containing both the nbody simulation and smoke particles samples:

cd /usr/local/cuda-7.0/samples/5_Simulations/

Now, navigate to the nbody directory and run make. Then, type in:

./nbody -benchmark -numbodies=65536

The results are approximately two times the performance of the previous generation Jetson, the TK1 @ 157 GFLOPS

Navigate to the smokeparticles directory and run make. Then type in:


Installing any GCC version

This guide shows you how to install GCC, specifically downgrading GCC versions. However, the methods shown here can also be used to update GCC versions. You just have to solve whatever library dependencies that arise with upgrading. Typically, each successive GCC version solves its predecessor’s dependencies.

For all of these methods, you will need to unlink your current GCC version and link the version you want. Towards the end of this guide, I will show you how to link gcc globally.

The first method of installing any gcc version is using the package manager native to your OS flavor. Some of these package managers are apt-get, yum, opkg, and ipkg.

 apt-get install gcc-[version]

Depending on the package manager you are using, the package manager may or may not install missing dependencies. In my example, apt-get doesn’t install dependencies. One alternative that I use sometimes is aptitude.

 aptitude install gcc-[version]


Another way to install GCC versions without a package manager is through compiling. This method requires more work.
i. Make a new directory for the GCC files
mkdir gcc-4.8-files
cd gcc-4.8-files

ii. Download GCC-4.8 from a mirror

You can find all GCC versions here.

wget _________

iii. Untar the file
tar -xvf

iv. Install some dependencies
sudo apt-get install libgmp-dev libmpfr-dev libmpc-dev libc6-dev

Note: there may be more dependencies you may need.

v. Compile the source files
./gcc-4.8.0/configure --prefix=/usr/bin/gcc/4.8.0

vi. Run the Makefile

vii. Install the source files
sudo make install

1. Now to symbolically link GCC-4.8. But first, let’s get rid of GCC-4.9 if that is still on your device.
apt-get remove gcc-4.9

2. Navigate to /usr/bin/ and GCC-4.8 should be there
cd /usr/bin/

3. Symbolically link GCC-4.8
ln -s gcc-4.8 gcc
ln -s gcc-4.8 cc

4. Check to verify GCC-4.8 is your current active compiler
gcc --version

How to Flash SD cards on Mac

This guide shows you how to flash SD cards to mount files, specifically .img files for Raspberry Pi, Android TV Shield, and any other device. Make sure your SD card is empty. If not, you can use Mac’s disk utility application to erase the contents of your SD card.

1. Plug in your SD card and verify that your SD card is attached by typing in:

diskutil list

Keep track of which path is your SD card because if you enter the wrong path, you can do irreparable damage to your computer.

In my case, “/dev/disk2” is the path to my SD card.


2. Now type the following command to unmount the SD card:

diskutil unmountdisk


3. Navigate to the directory of the file you want to flash and type in:

sudo dd if= of= bs=8m

note: bs is the block size of bytes to write or read. You can put (almost) any number here. I just did 8.

Also, your terminal will look like it’s not doing anything. This is normal, and you may have to wait a while before you get any output on your screen.

Recover Nvidia Android TV Shield Manually


Files needed:

Recovery Images

This tutorial is for restoring your entire TV Shield in the event something unfortunate happens. In order for this to work, your TV Shield has to be able to boot up and display the Nvidia logo on screen. Otherwise, your TV Shield is bricked, and there is nothing you can do about it.


1. Unplug your power cable and connect a USB OTG cable to your computer.

Now, here is the tricky part that requires perfect timing.


2. Connect your power cable and almost immediately hold the power button for roughly 3 seconds.

** This may take multiple attempts to get the timing down. You may want to hold the power button for slightly longer than 3 seconds, ~3.1/3.2 seconds.

If successful, you should see the bootloader screen.

Tap the power button to navigate the menu and hold the power button to select.


3. Now go to the folder where you downloaded the files and type in the following commands:

./fastboot flash recovery recovery.img
./fastboot flash boot boot.img
./fastboot flash system system.img
./fastboot flash userdata userdata.img
./fastboot flash staging blob
./fastboot flash dtb

** Your device may or may not need to restart after each command, which means repeating the steps above. Also, you may not need to flash all of the above depending on what files are damaged.

How to Root Nvidia Android TV Shield

To start, you will need a few tools and files.

Make sure to back up your device in the event something goes wrong and preserve any of your data.


1. Once your TV shield is up and running, head over to the settings.


2. Go to About (First Row) and scroll down to “Build Number.”


3. Here, click “Build Number” 7 times. This will enable the developer options.


4. Now go back to the previous screen (settings). You should now see the developer options in the second row at the end.

If not, you may need to wait a little and reload the settings screen.


6. Go to Developer options to enable USB debugging and plug in your OTG cable to your device and computer.

There will be a prompt to connect to your computer. Accept it.


7. Navigate to where your adb and fastboot folder and type in:

./adb devices

You should see your device.

8. Once that is working type in:

./adb reboot bootloader

A word of caution – the next steps will erase any data you may have.


9. Check if your device is connected.

./fastboot devices


10. Type in:

./fastboot oem unlock

A new prompt will be displayed. Click continue by tapping the power button on your shield TV to navigate to “continue” and holding the power button to select.

** On the pro (500 GB) version, the unlock takes more than an hour. For the 16GB version, it takes a few minutes at most.

You may need to repeat steps 1-9 to re-enable the debugger.


11. Now navigate to the directory where you downloaded SuperSu and type in:

./adb push /sdcard

Note ** if an error comes up for /sdcard, try sdcard/


12. Now type in:

./fastboot boot < path to twrp image>

OR to keep the stock recovery image or you could flash it with:

./fastboot flash boot


13. Select “Install” and navigate to the file you pushed onto the SD card. Click it.


14. Once that is done, click reboot.

Congratulations, your device is now rooted.

If your device becomes unbootable, but is able to boot up to the Nvidia logo screen, please follow this guide.