Introduction
Parallel programming is a type of computing architecture where multiple processes run simultaneously. This approach can significantly speed up computations by leveraging multi-core processors.
To understand the concepts of Parallel, it's fundamental that you know what are Threads and Multi Threads.
What is a Thread?
The smallest sequence of programmed instructions that can be executed directly in the processor. To improve the performance of a proccess, it's possible to run your process in multiple threads in parallel, this will improve the performance of your process, making it happen faster.
What is a Process?
Lets explain this topic with a simple example. When you open Java in your computer, the OS starts a new process which runs in parallel to other programs. Inside of each process its possible to use the multiple threads of your CPU, using more available cores.
Why should I worry myself with Parallel Programming?
In today’s world, the demand for faster and more efficient applications is increasing. Whether you’re working on data processing, game development, or web applications, the ability to run tasks concurrently can be a game-changer.
Case
Analyzing the performance difference between a Parallel and Sequential program, let's analyze the following code:
static const int nx = 128;
static const int ny = 128;
static const int nz = 128;
double start_time, run_time;
int nThreads = 1;
fftw_complex *input_array;
input_array = (fftw_complex*) fftw_malloc((nx*ny*nz) * sizeof(fftw_complex));
memcpy(input_array, Re.data(), (nx*ny*nz) * sizeof(fftw_complex));
fftw_complex *output_array;
output_array = (fftw_complex*) fftw_malloc((nx*ny*nz) * sizeof(fftw_complex));
start_time = omp_get_wtime();
clock_t start_time1 = clock();
fftw_init_threads();
fftw_plan_with_nthreads(nThreads); //omp_get_max_threads()
fftw_plan forward = fftw_plan_dft_3d(nx, ny, nz, input_array, output_array, FFTW_FORWARD, FFTW_ESTIMATE);
fftw_execute(forward);
fftw_destroy_plan(forward);
fftw_cleanup();
run_time = omp_get_wtime() - start_time;
clock_t end1 = clock();
cout << " Parallel Time in s: " << run_time << "s\n";
cout << "Serial Time in s: " << (double)(end1-start_time1) / CLOCKS_PER_SEC << "s\n";
memcpy(Im.data(),output_array, (nx*ny*nz) * sizeof(fftw_complex));
fftw_free(input_array);
fftw_free(output_array);
The provided code performs a 3D Fast Fourier Transform (FFT) in both parallel and sequential manners, and then prints the execution times for each approach. If i run the code above in a 2 thread CPU, we have the following results:
Parallel Time in s: 0.0132717s
Serial Time in s: 0.025434s
Insights over the results above
If we compare the Parallel Time (0.0132717s) with the Serial Time (0.025434s) it's possible to see that the time lapse it's almost 65%, exactly 63.38%.
The time difference demonstrates the potential benefit of parallel programming in this specific code snippet. The parallel execution ran almost twice as fast (63.38% faster) compared to the sequential version.
In complex applications, even a moderate speedup like 63.38% can translate to significant advantages:
- Reduced processing time
- Increased throughput
- Improved resource utilization
However, there are some caveats to consider:
- Benchmarking: This small code snippet might not represent the entire application's behavior. It's essential to benchmark the complete application with realistic workloads to assess the overall impact of parallelization.
- Overhead: Parallel programming introduces overhead due to task creation, synchronization, and communication between threads. In some cases, the overhead might outweigh the benefits, especially for small tasks.
- Problem suitability: Not all problems benefit equally from parallelization. The code needs to be designed to leverage multiple threads effectively for significant speedup.
Challenges in Parallel Programming
Race Conditions
Imagine two runners competing in a relay race, but without a clear baton passing order. In parallel programming, a race condition occurs when multiple threads access and modify the same shared data without proper synchronization. This can lead to unpredictable and often incorrect program behavior.
Deadlocks
Deadlocks happen when multiple threads are waiting for resources held by each other, creating a frustrating stalemate. Imagine two cars at a dead-end intersection, each needing to move forward but blocked by the other.
Debugging Complexity
The very essence of parallelism - multiple threads executing instructions simultaneously - makes debugging intricate. Traditional debugging techniques, which often rely on a step-by-step approach, become cumbersome when dealing with unpredictable thread interactions.
On Parallel Programming, we have the Heisenbugs. These elusive bugs only appear when running the program in parallel, disappearing in a single-threaded environment. Identifying and reproducing such bugs can be a significant challenge.
Well, that's enough for the first article, I hope you learned something new after finished this reading :D.
If you have any questions, please feel free to ask me.