Graph500 on CUDA

I recently went to ISC16 for their Student Cluster Competition, and one of the challenges was to create our “own implementation of Graph500 to run on a cluster.”

If you don’t know about the Student Cluster Competition, they are cluster competitions where student teams work with vendors to create a cluster and optimize high performance scientific applications to be run under 3000 watts of power on real datasets.

Graph500 is a rating of supercomputer systems focused on data intensive loads. There are two main kernels.

The first kernel constructs an undirected graph. The second kernel performs breadth-first search of the graph. Both kernels are timed.

There are a bunch of other nitty-gritty requirements outlined in their full specifications page.

The Graph500 reference code and implementations only contain sequential, OpenMP, XMT, and MPI versions.

The original developers have provided CPU implementations, but where’s the CUDA version? No one wants to put their awesome versions of Graph500 open source!

 

Existing Open Source CUDA Graph500

While searching to find Graph500 on CUDA, we found only one open source version provided by the Suzumura Laboratory.

The Suzumura Laboratory has done a great contribution to the open source community on Graph500 with their papers, “Parallel Distributed Breadth First Search on GPU” and “Highly Scalable Graph Search for the Graph500 Benchmark” written by Koji Ueno and Toyotaro Suzumura.

Their version was created on June 2012.

First thing that we wanted our Graph500 to be was open source.

The HPC Advisory Council states that “Other implementations of Graph500 exist and likely to improve performance, however not freely obtainable.”

graph500-implementations

 

Our version of Graph500 on CUDA with MPI

We made a much simpler model for Graph500 that you may want to check for understanding the Graph500 specifications easier. We created ours on June 2016.

https://github.com/buhpc/isc16-graph500

We’ll be updating this post to explain how we created our version of Graph500 on a later day, but we hope that our source code will help you run Graph500 on your cluster with NVIDIA GPUs or create a version yourselves!

 

Testing our version of Graph500

We will test our version of Graph500 on a single NVIDIA Jetson TX1. Below are the NVIDIA Jetson TX1 specifications:

nvidia-jetson-tx1-specifications

The prequisite for running our version of Graph500 is having CUDA. Our NVIDIA Jetson TX1 already had CUDA and OpenMPI installed when we set up Ubuntu 14.04. To check if you have CUDA setup, run:

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2015 NVIDIA Corporation
Built on Thu_May__5_22:52:38_CDT_2016
Cuda compilation tools, release 7.0, V7.0.74

To check if you have MPI setup, run:

mpirun --version
mpirun (Open MPI) 1.6.5

Report bugs to http://www.open-mpi.org/community/help/

If you don’t have CUDA or MPI, you will need to Google how to do it for your respective operating system. Now, we have to install git to download our source code.

sudo apt-get install git -y

After you install git, you can git clone our repository.

git clone https://github.com/buhpc/isc16-graph500
cd isc16-graph500/

Make any changes to the Makefile to fit your location of CUDA.

vim Makefile

I have to make a small change to the LDFLAGS value. lib should be lib64 for me.

Original:

CC=mpicxx
FLAGS=-std=c++11
INCLUDE= -Iinclude -I/usr/local/cuda/include
LDFLAGS=-L/usr/local/cuda/lib
LIB=-lcudart
EXE=main
NVCC=nvcc
[email protected]

New:

CC=mpicxx
FLAGS=-std=c++11
INCLUDE= -Iinclude -I/usr/local/cuda/include
LDFLAGS=-L/usr/local/cuda/lib64
LIB=-lcudart
EXE=main
NVCC=nvcc
[email protected]
make
CXX src/constructGraph.cpp
CXX src/graph.cpp
CXX src/edgeList.cpp
CXX src/init.cpp
CXX src/breadthFirstSearch.cpp
CXX src/main.cpp
CXX src/generateKey.cpp
CXX src/validation.cpp
NVCC src/buildAdjMatrix.cu
NVCC src/bfsStep.cu
CXX main

The Graph500 binary created is called main.

[USAGE] ./main <config.ini> <scale> <edgefactor>

N
the total number of vertices, 2^SCALE.

M
the number of edges. M = edgefactor * N.

We will use main with mpirun and keep track of runtime with the time command. You may supply a hostfile. Check run.sh for a sample command to run the program.

In this example, we’ll use a SCALE of 6 and edgefactor of 1 for a result of 64 vertices and 64 edges. But first, how many cores we can use?

grep -c ^processor /proc/cpuinfo
4

Now, we will run Graph500 on a single node with a very small graph.

time mpirun -n 4 ./main config.ini 6 1
Constructing graph...
Done.

Running 64 BFSs...
Got 24291.5 TEPS
Got 29580.9 TEPS
Got 60708.3 TEPS
Got 35433.1 TEPS
Got 62827.2 TEPS
Got 36697.2 TEPS
Got 54054.1 TEPS
Got 51428.6 TEPS
Got 53892.2 TEPS
Got 38461.5 TEPS
Got 53491.8 TEPS
Got 54298.6 TEPS
Got 42452.8 TEPS
Got 41142.9 TEPS
Got 54628.2 TEPS
Got 32727.3 TEPS
Got 54711.2 TEPS
Got 35928.1 TEPS
Got 62827.2 TEPS
Got 39955.6 TEPS
Got 63492.1 TEPS
Got 36108.3 TEPS
Got 58919.8 TEPS
Got 70588.2 TEPS
Got 42007 TEPS
Got 34515.8 TEPS
Got 54380.7 TEPS
Got 36659.9 TEPS
Got 58631.9 TEPS
Got 63492.1 TEPS
Got 53973 TEPS
Got 53491.8 TEPS
Got 69632.5 TEPS
Got 41237.1 TEPS
Got 60200.7 TEPS
Got 36363.6 TEPS
Got 62283.7 TEPS
Got 42402.8 TEPS
Got 48913 TEPS
Got 40000 TEPS
Got 69632.5 TEPS
Got 42755.3 TEPS
Got 61120.5 TEPS
Got 40678 TEPS
Got 48192.8 TEPS
Got 37228.5 TEPS
Got 55900.6 TEPS
Got 46272.5 TEPS
Got 3831.42 TEPS
Got 41618.5 TEPS
Got 60301.5 TEPS
Got 40540.5 TEPS
Got 70312.5 TEPS
Got 42553.2 TEPS
Got 48979.6 TEPS
Got 49792.5 TEPS
Got 53175.8 TEPS
Got 53254.4 TEPS
Got 63380.3 TEPS
Got 2178.65 TEPS
Got 59308.1 TEPS
Done.

real 0m0.752s
user 0m1.000s
sys 0m0.680s

We measure TEPS, traversed edges per second. Let m be the number of input edge tuples within the component traversed by the search, counting any multiple edges and self-loops. Let timeK2(n) be the measured execution time for kernel 2.

TEPS(n) = m / timeK2(n)

Let us know if you have any questions. Feel free to fork our repository and improve our Graph500 code! Our output may not be exactly the same as the official Graph500 implementation, but it does fit the specifications as far as we know.

How to Setup CUDA 7.0 on NVIDIA Jetson TX1 with JetPack – Detailed

The most recent version of NVIDIA JetPack is 2.2., which supports the NVIDIA Jetson TX1 and Jetson TK1. The big news is that the latest version of JetPack 2.2 turns the userspace to 64 bit! In earlier versions of JetPack, the kernel was 64 bit, but the userspace was 32 bit apparently from what a source has told me.

Now with the userspace at 64 bit, you’ll have an easier time compiling and running arm64 libraries. Note that we’ll be flashing our NVIDIA Jetson TX1, so everything on it will be formatted. Remember to back up your files!

We made an earlier post last year on how to run CUDA 7.0 on NVIDIA Jetson TX1. In this post, we’ll outline very detailed instructions on setting up CUDA 7.0 for the NVIDIA Jetson TX1s from start to finish.

Requirements

  1. NVIDIA Jetson TX1, AC adapter, and WiFi antennas
  2. HDMI cable and monitor
  3. Computer with Ubuntu 14.04 or Laptop with VirtualBox
  4. Micro-B to USB Cable
  5. Keyboard

Step 1) We create an Ubuntu 14.04 x86 64-bit virtual machine with at least 15 GB of space to be safe.

I’m using VirtualBox to create the Ubuntu 14.04 x86 64-bit virtual machine. If you have an Ubuntu 14.04 x86 64-bit host operating system, you do not have to create the virtual machine. 15 GB of space will give you enough room for the Jetpack downloaded files. I set mine with 30 GB of space because I want to have other stuff on this VM for later.

ubuntu-virtual-machine-virtualbox

Step 2) On the virtual machine, download the latest Jetpack installer here. You will need to log in or create a new member account.

The latest version of Jetpack installer will be below. We are using JetPack Version 2.2.

https://developer.nvidia.com/embedded/jetpack

find-jetpack-download-file

 

log-in-or-create-nvidia-account

After logging in, hit the blue button and download JetPack.

hit-the-blue-button-to-download-jetpack

Step 3) You should have a file called JetPack-L4T-2.2-linux-x64.run. The name may be different, but we want to run it.

Open up a new terminal and go to the directory where JetPack was downloaded.

cd ~/Downloads

We want to change the permissions of JetPack, so that we can run it in the terminal.

chmod 755 JetPack-L4T-2.2-linux-x64.run

Now, we can run the program.

sudo ./JetPack-L4T-2.2-linux-x64.run

Step 4) Downloading JetPack packages.

After running the above terminal command, a JetPack window should pop up.

the-first-next

Hit Next a couple of times.

Select Jetson TX1 Development Kit (64-bit) and hit Next.

Select Custom because we don’t need half of the JetPack stuff.

jetpack-custom-installation

We will set most of these to no action by clicking underneath the Action column and setting the packages to no action.

set-most-to-no-action

The packages that we want are: CUDA Toolkit for Ubuntu 14.04, Linux for Tegra (TX1 64-Bit), Flash OS, CUDA Toolkit for L4T, and Compile CUDA Samples.

jetpack-what-you-need-to-download

We just don’t need most of the other stuff if you only want CUDA on your NVIDIA Jetson TX1. Pick and choose any other extra packages if you want them.

Step 5) Hit Next to initiate the download and wait.

Hit Next and Accept All Terms and Conditions.

accept-all-terms-and-conditions-jetpack

Depending on the component selection, please pay attention to the prompt embedded terminal. OK.

Sit back and relax because these download files are fairly big, so we’ll have to wait a while.

sit-back-jetpack-will-take-a-while

 

jetpack-download-speeds

JetPack Host installation will complete, and you can click Next to Proceed.

jetpack-installation-complete

The prompt will ask you about Network Layout. I chose Device accesses Internet via router/switch.

Please select the network interface on host that connects to the same router/switch as:

I put wlan0 because I will be using the antennas to access the Internet through Wi-Fi. Our host computer will be using the Internet to send files to our NVIDIA TX1 Jetson. Hit Next.

Step 6) Post Installation. We will have to put our NVIDIA TX1 Jetson in Force USB Recovery Mode.

jetpack-post-installation-steps

After hitting Next on this prompt, you will be brought to the Flash 64 Bit OS to TX1 device step.

jetpack-putting-nvidia-tegra-in-recovery-mode

The black terminal window says that we have to put the Jetson into Force USB Recovery Mode.

  1. Power down the Jetson.
  2. Connect the Micro-B to USB cable from the Jetson to your computer.
  3. Press the POWER button and let go. The Jetson powers up like normal. Press and hold the FORCE RECOVERY button, and while holding the FORCE RECOVERY button, press the RESET button and let go of the RESET button. After two more seconds, let go of the FORCE RECOVERY button.

 

Make sure that your virtual machine detects the NVIDIA Corp USB device. Go to the Devices tab at the top of the virtual machine, go to USB, and select NVIDIA Corp. APX.

nvidia-corp-detected-on-vm

Back at the black terminal window, press Enter, and the OS flashing starts. Now, you just wait.

nvidia-jetson-jetpack-doing-its-business

Flashing completes, and you press Enter in the black terminal window.

post-installation-completed

Step 7) After flashing completes, connect an HDMI cable to your monitor. Your Jetson should have booted into Ubuntu 14.04. Connect to Wi-Fi on your Jetson.

Your virtual machine wants to run the CUDA installation instructions to your Jetson, but it can’t find the Jetson’s IP address!

time-to-connect-the-tegra-to-wifi

If you connect your Jetson to a monitor, you will see that your Jetson should have booted into Ubuntu 14.04. Now, we connect the Jetson to Wi-Fi.

IMG_20160709_134508

The password for the ubuntu user is: ubuntu.

I’m only using the keyboard to maneuver. Press ALT + F1, press Enter, and search for “Network.” Use tabs to maneuver and open Wi-Fi. Connect to a WiFi network.

IMG_20160709_134847

Now, we want to open a terminal and press CTRL + ALT + T. With the terminal open, we type:

ifconfig

IMG_20160709_135658

We see that the given IP address for our Jetson is: 192.168.1.114

Now back on our virtual machine on this screen, we hit 2 and press Enter.

time-to-connect-the-tegra-to-wifi

A JetPack window will pop up and we can fill in the: Device IP address, User Name, and Password:

enter-device-ip

User Name and Password are both ubuntu. Hit Next, and you will be brought to Post Installation for CUDA for the Jetson.

Step 8) Post installation for CUDA.

Hit Next on this screen.

cuda-post-installation

JetPack will be copying CUDA files onto the Jetson through the Internet. It will also run CUDA installation commands on your Jetson.

post-installation-for-cuda

CUDA takes a really long time to copy and install, so you’ll be waiting a long while. After CUDA finishes, a JetPack window will pop up and Installation will be Complete.

jetpack-finishes

Step 9) Making sure that CUDA is installed on the Jetson.

Back to the Jetson, open a new terminal with CTRL + ALT + T.

cd ~/cuda-l4t

You can use cuda-l4t.sh to install CUDA 7.0. In this folder, there is also the .deb file for CUDA 7.0.

sudo ./cuda-l4t.sh .cuda-repo-l4t-7-0-local_7.0-76_arm64.deb 7.0 7-0

Hit Y and Enter on any prompt asking for permission. CUDA 7.0 should be installed, but its binaries haven’t been applied globally yet. An entry has been automatically added to ~/.bashrc, but you still need to reload the ~/.bashrc.

source ~/.bashrc

Now, check if CUDA 7.0 is installed.

nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2015 NVIDIA Corporation
Built on Thu_May__5 22:52:38_CDT_2016
Cuda compilation tools, release 7.0, V7.0.74

Step 10) Testing if CUDA 7.0 works on the Jetson.

JetPack has set up some CUDA samples that we can use to test.

cd ~/NVIDIA_CUDA-7.0_Samples/bin/aarch64/linux/release

We can run the Ocean Simulation sample. Cool!

./oceanFFT

IMG_20160709_143708

We can test the nbody sample to check our Jetson’s performance.

./nbody -benchmark -numbodies=65536

IMG_20160709_144011

I’m getting 264.744 single-precision GFLOP/s at 20 flops per interaction. In the past, we’ve gotten 318.763 single-precision GLOP/s at 20 flops per interaction.

But, we certainly know that CUDA 7.0 is working on the NVIDIA Jetson TX1! Leave any questions below, and run more CUDA samples for fun.

IMG_20160709_144913
./smokeParticles

How to Install CUDA on NVIDIA Jetson TX1 [Deprecated]

Updated 2016 post – detailed version. (The method shown in this guide is outdated) This guide shows you how to install CUDA on the NVIDIA Jetson TX1. Currently, Nvidia’s Jetpack installer does not work properly. This blog post will show a work-around for getting CUDA to work on the TX1.

Download the following files inside a directory first. Here are the two links for the files that you will need to download beforehand:

Updating your apt-get sources

Navigate to the directory where you downloaded the files and type in:

dpkg -i cuda-repo-l4t-r23.1-7-0-local_7.0-71_armhf.deb

Next, you want to update the sources by typing in:

apt-get update

Now, your apt-get repositories will have all of the CUDA libraries and files you may need for any future modifications.

Installing CUDA dependencies

Next, go to the directory where you downloaded the Jetpack installer and make the file executable by typing in:

chmod +x JetPack-L4T-2.0-linux-x64.run

Now, run the file:

./JetPack-L4T-2.0-linux-x64.run

The .run file should have unpacked its contents into a new directory called “_installer”

Go into the _installer directory and type in:

./cuda-l4t.sh ../cuda-repo-l4t-r23.1-7-0-local_7.0-71_armhf.deb 7.0 7-0

**Note that ../cuda-repo-l4t-r23.1-7-0-local_7.0-71_armhf.deb is the location of the .deb file you downloaded earlier.

Now, you have every CUDA dependency installed. However, there are a few more things you have to do.

 

NVCC as a global call

NVCC is not linked globally (nvcc -V gives an error), and you need to do a few more things to fix this. First, let’s edit the .bashrc file.

vim .bashrc

The screenshot below shots what should be appended to the .bashrc file after the installation. I also put a copy of the exports below.

“:$PATH” should be after “export PATH=/usr/local/cuda-7.0/bin”

Now, execute the .bashrc file.

source .bashrc

Now to make sure that everything is working, type in:

nvcc -V

 

Running Some CUDA Samples

Now, let’s run some CUDA samples and scale the GPU to max frequency.

Scaling GPU Frequency

You can find out your GPU rates by typing in:

cat /sys/kernel/debug/clock/gbus/possible_rates

Now, let’s set the GPU frequency to its maximum possible rate for some performance purposes.

echo 998400000 > /sys/kernel/debug/clock/override.gbus/rate
echo 1 > /sys/kernel/debug/clock/override.gbus/state
cat /sys/kernel/debug/clock/gbus/rate

You can lower the GPU frequency with the same steps above.

Running the nbody simulation and smoke particles

Navigate to the simulations (5_Simulations) directory containing both the nbody simulation and smoke particles samples:

cd /usr/local/cuda-7.0/samples/5_Simulations/

Now, navigate to the nbody directory and run make. Then, type in:

./nbody -benchmark -numbodies=65536

The results are approximately two times the performance of the previous generation Jetson, the TK1 @ 157 GFLOPS

Navigate to the smokeparticles directory and run make. Then type in:

./smokeParticles

Recover Nvidia Android TV Shield Manually

Prerequisites

Files needed:

Recovery Images

This tutorial is for restoring your entire TV Shield in the event something unfortunate happens. In order for this to work, your TV Shield has to be able to boot up and display the Nvidia logo on screen. Otherwise, your TV Shield is bricked, and there is nothing you can do about it.

 

1. Unplug your power cable and connect a USB OTG cable to your computer.

Now, here is the tricky part that requires perfect timing.

 

2. Connect your power cable and almost immediately hold the power button for roughly 3 seconds.

** This may take multiple attempts to get the timing down. You may want to hold the power button for slightly longer than 3 seconds, ~3.1/3.2 seconds.

If successful, you should see the bootloader screen.

Tap the power button to navigate the menu and hold the power button to select.

 

3. Now go to the folder where you downloaded the files and type in the following commands:

./fastboot flash recovery recovery.img
./fastboot flash boot boot.img
./fastboot flash system system.img
./fastboot flash userdata userdata.img
./fastboot flash staging blob
./fastboot flash dtb

** Your device may or may not need to restart after each command, which means repeating the steps above. Also, you may not need to flash all of the above depending on what files are damaged.

How to Root Nvidia Android TV Shield

To start, you will need a few tools and files.

Make sure to back up your device in the event something goes wrong and preserve any of your data.

 

1. Once your TV shield is up and running, head over to the settings.

 

2. Go to About (First Row) and scroll down to “Build Number.”

 

3. Here, click “Build Number” 7 times. This will enable the developer options.

 

4. Now go back to the previous screen (settings). You should now see the developer options in the second row at the end.

If not, you may need to wait a little and reload the settings screen.

 

6. Go to Developer options to enable USB debugging and plug in your OTG cable to your device and computer.

There will be a prompt to connect to your computer. Accept it.

 

7. Navigate to where your adb and fastboot folder and type in:

./adb devices

You should see your device.

8. Once that is working type in:

./adb reboot bootloader

A word of caution – the next steps will erase any data you may have.

 

9. Check if your device is connected.

./fastboot devices

 

10. Type in:

./fastboot oem unlock


A new prompt will be displayed. Click continue by tapping the power button on your shield TV to navigate to “continue” and holding the power button to select.

** On the pro (500 GB) version, the unlock takes more than an hour. For the 16GB version, it takes a few minutes at most.

You may need to repeat steps 1-9 to re-enable the debugger.

 

11. Now navigate to the directory where you downloaded SuperSu and type in:

./adb push supersu.zip /sdcard

Note ** if an error comes up for /sdcard, try sdcard/

 

12. Now type in:

./fastboot boot < path to twrp image>

OR to keep the stock recovery image or you could flash it with:

./fastboot flash boot

 

13. Select “Install” and navigate to the SuperSu.zip file you pushed onto the SD card. Click it.

 

14. Once that is done, click reboot.

Congratulations, your device is now rooted.

If your device becomes unbootable, but is able to boot up to the Nvidia logo screen, please follow this guide.