Comparison of parallel programming in Go and CUDA
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
crapStone a338e2617e
update slides
7 months ago
cuda fix argument parsing 7 months ago
doc update slides 7 months ago
go add slides 7 months ago add slides 7 months ago

Seminar work for KP

This is a seminar work for KP (Konzepte der Programmiersprachen) at the University of Applied Sciences Rosenheim. In this work I compare parallel programming in Go with CUDA.

This repository consists of three folders:

Both languages compute a matrix multiplication with the standard parallelism paradigms of the respective language. You can run them yourself by following the steps below.


CUDA runs only on Nvidia graphics cards so you need one of those and additionally to the driver for it also the CUDA libraries and tools. For instructions on how to install them see the official CUDA docs (or just install them via your package manager).

After you installed all required tools start by compiling the program:

cd cuda
nvcc --std=c++11 -o mmul # I used some stuff from the C++ 11 standard library

When the program is compiled you can start it:

start with a lower value than you see here, I used a 1080Ti for my test

./mmul -m 16384 -c -p

Following command line switches are available:

switch meaning
-c selects the kernel with shared cache (can be combined with -p)
-h displays usage information and exits
-m <matrix_size> takes an integer for the matrix size
-p selects the kernel with prefetching (can be combined with -c)
-q prints only the time the execution of the kernel on the GPU took in milliseconds

The program then performs the computations and outputs the time the internal kernel needed for the computation:

size of one matrix: 268435456 elements, 1048576 KiB
compute kernel execution time: 6369.57ms


If you haven't already Go installed start by installing it via your package manager or if you have Windows do it manually: official docs.

After that you can just execute the program with go run:

cd go
go run mmul.go

Because of restrictions of Go you have to set the matrix side length in the code. But Go compiles pretty fast so this shouldn't be a problem.

In the code there is also a naive linear implementation that is only there for completeness and gets never called. If you want to have a look at the difference between parallel execution time and this naive implementation just comment the goroutines function out and remove the slashes from naive.

Build PDF

cd doc
pandoc --toc -s -o doc.pdf