Description:

Issues:

Work partitioning:
  • Divide the work so all cores have sth to do but also load balance them
  • Can be divided from a for loop if there is no dependencies, ex: a[i][j]=a[i][j-1]
Coordination and synchoronization:
  • Cache Coherency
  • Synchronizing for parallel programs
    • Problem: Race Condition
    • Atomic read & write memory operations:
      • Between read & write: no writes to that address
      • There are many atomic hardware primitives, ex Atomic Instructions
    • Critical Section:
      • Typically, some a section of the function or a loop uses the shared variable, its the critical section
      • Only 1 thread can be executing critical section at a time, other threads can wait
      • ex:

        double area, pi, x;  
        int i, n;  
          
        area = 0.0;  
        #pragma omp parallel for private(x)  
        for (i = 0; i < n; i++) {  
        	x = (i + 0.5)/n;  
        #pragma omp critical  
        	area += 4.0 / (1.0 + x*x);  
        }  
        pi = area / n;
        • the time spent in citical section can be reduced by minize the calculation
        double area, pi, x;  
        int i, n;  
          
        area = 0.0;  
        #pragma omp parallel for private(x)  
        for (i = 0; i < n; i++) {  
        	x = (i + 0.5)/n;  
        	tmp = 4.0 / (1.0 + x*x);  
        #pragma omp critical  
        	area += tmp
        }  
        pi = area / n;
      • Mutual exclusion lock:
    • How to write parallel programs
      • Threads and processes
      • Critical sections, race conditions, and mutexes
Communication overhead:

Parallel Programming