CUDA

Compute Unified Device Architecture is a parallel computing platform and application programming interfac

1. Graphics Processing Unit

SIMD
low frequency (1.4 GHz), but many cores (8k+)
powerful: 29.7 TFlops
high speed of DRAM: 760 GB/s (total)
mem. hierarchy:
- regester
- shared mem. : PBSM: per-block shared mem.
- global mem.
- const mem. ?

GPU

Host: CPU + CPU Mem.

Device: GPU + GPU Mem.

Kernel function

kernel = grid -> thread blocks -> threads

grid? how it combined with hardware

Compilation Process (by nvcc)

coalesced mem. access

Last update: April 19, 2022

Authors: Co1lin