Parallel Computing Model
flop: floating point operation
1 KFlops/s = 10^3 Flops/s
1. Hardware
1.1 Flynn's taxonomy
- SIMD: one controller, multiple processing unit working on multiple data
- vector extensions of Intel CPU
- Single Instr. Multiple Threads of NVIDIA: 1 Warp contains 32 Cores
1.2 Mem: shared or distributed
Shared mem. : same address space; direct access
-
Unified Mem. Access
- limited scalability
-
Non-Unified Mem. Access
- remote mem. access cost is higher
- e.g. multi-way CPU server
Distributed mem. : independent address spaces; cannot directly access
1.3 Processor's perspective
- SMP: Symmetric Multiprocessor: shared mem.
- CMP: Chip Multiprocessor: apply techniques in SMP into a single processor; e.g. Intel i7 (multi-core)
2. Programming Models
Threads: - shared data, indep. instructions - unit for scheduling
Processes: - indep. data - unit for allocating resources
2.1 Shared mem.
- shared var.
- threads
- models
- pthread
- OpenMP
2.2 Message Passing
- distributed mem.
- multi-processing
- Message Passing Interface
CPU <-> GPU
2.3 Data Parallel
- compatible hardware:
- SIMD
- Distributed mem.
SIMD: intel vector extensions
MapReduce: map, shuffle, reduce
3. Performance Index
Strong scaling
Weak scaling
Last update:
March 7, 2022
Authors: