Skip to content

Parallel Computing Model

flop: floating point operation

1 KFlops/s = 10^3 Flops/s

1. Hardware

1.1 Flynn's taxonomy

  • SIMD: one controller, multiple processing unit working on multiple data
    • vector extensions of Intel CPU
    • Single Instr. Multiple Threads of NVIDIA: 1 Warp contains 32 Cores

1.2 Mem: shared or distributed

Shared mem. : same address space; direct access

  • Unified Mem. Access

    • limited scalability
  • Non-Unified Mem. Access

    • remote mem. access cost is higher
    • e.g. multi-way CPU server

Distributed mem. : independent address spaces; cannot directly access

Screen Shot 2022-03-03 at 3.03.17 PM

1.3 Processor's perspective

  • SMP: Symmetric Multiprocessor: shared mem.
  • CMP: Chip Multiprocessor: apply techniques in SMP into a single processor; e.g. Intel i7 (multi-core)

2. Programming Models

Threads: - shared data, indep. instructions - unit for scheduling

Processes: - indep. data - unit for allocating resources

2.1 Shared mem.

  • shared var.
  • threads
  • models
    • pthread
    • OpenMP

2.2 Message Passing

  • distributed mem.
  • multi-processing
  • Message Passing Interface

CPU <-> GPU

2.3 Data Parallel

  • compatible hardware:
    • SIMD
    • Distributed mem.

SIMD: intel vector extensions

MapReduce: map, shuffle, reduce

3. Performance Index

Strong scaling

Weak scaling


Last update: March 7, 2022
Authors: Co1lin