Parallel Computing Model

flop: floating point operation

1 KFlops/s = 10^3 Flops/s

1. Hardware

SIMD: one controller, multiple processing unit working on multiple data
- vector extensions of Intel CPU
- Single Instr. Multiple Threads of NVIDIA: 1 Warp contains 32 Cores

Shared mem. : same address space; direct access

Unified Mem. Access
- limited scalability
Non-Unified Mem. Access
- remote mem. access cost is higher
- e.g. multi-way CPU server

Distributed mem. : independent address spaces; cannot directly access

Screen Shot 2022-03-03 at 3.03.17 PM

SMP: Symmetric Multiprocessor: shared mem.
CMP: Chip Multiprocessor: apply techniques in SMP into a single processor; e.g. Intel i7 (multi-core)

Threads: - shared data, indep. instructions - unit for scheduling

Processes: - indep. data - unit for allocating resources

CPU <-> GPU

SIMD: intel vector extensions

MapReduce: map, shuffle, reduce

Strong scaling

Weak scaling

Last update: March 7, 2022

Authors: Co1lin