Optimizing the Traveling Salesman Problem with Parallel Computing
Written on
Chapter 1: Introduction to High-Performance Computing
This technical document delves into the application of high-performance computing (HPC) within a cluster environment, utilizing the Message Passing Interface (MPI) to parallelize solutions for the Traveling Salesman Problem (TSP). The report includes a literature review, methodology, results, and conclusion.
The findings underscore MPI's effectiveness in addressing intricate optimization issues while emphasizing the advantages of HPC clusters in delivering timely and cost-efficient solutions to complex problems.
Literature Review
High-Performance Computing (HPC) is a specialized domain within computer science focused on developing and deploying powerful computer systems, such as clusters and supercomputers. These systems are engineered to tackle complex computational tasks, performing billions of calculations per second and proving beneficial across various applications, including scientific research, engineering, and data analytics.
To harness parallel computing capabilities, HPC integrates advanced hardware and software. Typically, these systems consist of interconnected high-performance computers that function as a unified entity. The hardware is meticulously designed for optimal performance, featuring numerous interconnected processors, fast interconnects, and substantial memory and storage. Additionally, tailored software, like MPI, orchestrates the collaboration among processors to collaboratively address complex computational challenges.
Section 1.1: The Traveling Salesman Problem Explained
The TSP, also known as the Traveling Salesman Problem, is a prominent optimization challenge in computer science. It involves identifying the shortest route that visits a specified set of cities exactly once. Classed as NP-hard, this problem is characterized by the impracticality of finding exact solutions for larger instances.
NP-hard problems cannot be efficiently resolved in polynomial time by any known algorithm, leading to exponentially increasing running times as the problem size grows. Consequently, approximate solutions or heuristics are often employed for resolution.
The TSP is significant in numerous domains, including operations research, logistics, and computer science, with real-world applications in vehicle routing, scheduling, and circuit design. Various algorithms have emerged to tackle the TSP, including exact algorithms, approximation algorithms, and heuristics.
Subsection 1.1.1: Branch and Bound Technique
The Branch and Bound (B&B) strategy is an efficient method for solving the Traveling Salesman Problem. It systematically navigates through potential solutions, effectively reducing the number of branches that need exploration.
This technique decomposes the problem into smaller sub-problems, employing recursion to search for solutions. During the recursive search, branches that exceed the best solution identified so far are eliminated as they are deemed unproductive. This process continues until all branches have been analyzed, resulting in the optimal solution. By trimming unproductive branches, B&B minimizes the number of explored paths.
Section 1.2: Objectives of the Study
- Task 1: Develop a serial algorithm for the Branch-and-Bound method.
- Task 2: Create a parallel algorithm for the Traveling Salesman Problem utilizing MPI.
- Task 3: Test the parallel program with 17 cities using various communication methods and optimizations.
- Task 4: Evaluate the performance of both serial and parallel implementations and analyze the results.
Chapter 2: Methodology and Infrastructure
The first video, "The Traveling Salesman Problem: When Good Enough Beats Perfect," discusses the intricacies of approximating solutions for the TSP and the compromises that can be made in pursuit of efficiency.
The HPC infrastructure employed for this research is Crescent, a facility at Cranfield University housing 1488 cores distributed across various node types, including 2U chassis and individual 1U and 2U nodes. These nodes are powered by Intel Sandy Bridge and Haswell CPUs, offering between 16 to 32 cores per node and shared memory capacities ranging from 64GB to 256GB. This setup is conducive for running MPI-based applications, providing abundant computational resources for our project.
The second video, "An Exact Parallel Algorithm for Traveling Salesman Problem," elucidates an efficient parallel algorithm for solving the TSP, showcasing how MPI can optimize performance.
In this initiative, we will utilize MPI to parallelize our code, maximizing the computational resources available within the HPC framework. MPI serves as a library that facilitates parallel programming in distributed systems by enabling communication between multiple processors, coordinating task execution.
Another model for parallel programming is OpenMP, primarily designed for shared memory systems where multiple threads can access the same memory. MPI is more appropriate for distributed systems, where each processor has its own memory, necessitating inter-processor communication for coordinated computation.
While parallelizing without shared memory (as in MPI) may introduce additional overhead from inter-process communication, it remains the ideal choice for large-scale systems where the number of processors exceeds the shared memory capacity.
Section 2.1: Distribution Management
Throughout this project, we will carefully manage task distribution to avoid weak distribution, which occurs when workload distribution does not yield a decrease in computation time. This often happens when communication time outstrips actual computation time. Our goal will be to minimize communication time while ensuring efficient task distribution.
Section 2.2: Algorithm Evaluation
For this research, we will assess two algorithms: the Brute Force algorithm and the Branch and Bound algorithm. The Brute Force method examines every path in the solution space, while the Branch and Bound approach selectively explores paths and prunes the search tree as efficiently as possible.
Both algorithms will employ a Depth First Search (DFS) strategy to navigate the tree of potential routes. The DFS will operate recursively, delving deeply into the tree from top to bottom and left to right.
Section 2.3: Parallel Implementation
To facilitate effective parallelization of the serial code, pre-defined paths will be generated and assigned to each process. These pre-paths will guide processes through distinct branches of the tree, enabling simultaneous computations and efficient workload distribution.
After generating the pre-paths, they will be shuffled to balance the workload across processes, preventing any single process from tackling a series of challenging paths that might cause bottlenecks.
The processes will then iterate through their assigned paths, applying the same Branch and Bound method as in the serial version, starting from their designated pre-paths. Following the exploration of their pre-paths, processes will engage in inter-process communication to share their calculated minimum cost values. Using MPI's "ReduceAll" function, processes will collectively determine the global minimum cost to enhance tree pruning in subsequent iterations.
Results will be monitored to optimize the balance between excessive and insufficient communication.
Section 2.4: Communication Strategies
The communication methods employed in this project will primarily utilize MPI's ReduceAll technique to retain only the minimum distance computed by each process. Additionally, non-blocking communication will be explored to leverage the HPC cluster's buffer for increased efficiency.
The software will also track idle time for each process to assess the impact of different parameters on implementation efficiency. Monitoring idle time will provide insights into optimizing communication strategies.
Chapter 3: Results and Discussion
This section will focus on tests and results pertaining to the case of 17 cities. To maintain consistency in our comparisons, we will initiate our search from city 1, although the final software will choose the starting city randomly. A graphical representation of the cities will be provided, illustrating the optimal path discovered.
The best route identified among the 17 cities is as follows: 1, 10, 4, 14, 15, 9, 6, 11, 13, 5, 8, 3, 16, 2, 17, 7, and 12, with a total cost of 278. This path aligns well with expectations, visually confirming its status as the shortest possible route.
Figures demonstrating the impact of pre-path shuffling on processor idle time and total computation time will be analyzed, highlighting the importance of load balancing in enhancing computational efficiency.
Finally, we will compare the performance of our Branch and Bound implementation against the Brute Force method, emphasizing the efficiency of our chosen algorithms and the significance of algorithm selection in solving computational problems effectively.
Chapter 4: Conclusion and Future Work
In summary, this project has deepened our understanding of parallel programming and the utilization of the Crescent HPC cluster. We navigated the challenges of developing an efficient parallel program, identifying the best methods for task distribution and communication management.
The primary bottleneck in parallel programming often lies in synchronization and communication, necessitating careful monitoring to mitigate overhead. Our results underscore the importance of selecting efficient algorithms to streamline computations before embarking on parallelization efforts.
Future improvements could involve the adoption of more efficient data structures to reduce memory allocation overhead and enhance overall program performance. Exploring the optimal configurations for task distribution and communication will further optimize the efficacy of our parallel implementation.
References
[1] Peter Pacheco (1997), Parallel Programming with MPI, Morgan Kaufmann.
[2] Peter Pacheco (2011), An Introduction to Parallel Programming, Elsevier.
[3] Dr. Irene Moulitsas (2022), High-Performance Technical Computing, Cranfield University.
[4] Amey Gohil, Manan Tayal, Tezan Sahu, and Vyankatesh Sawalpurkar (2021), Traveling Salesman Problem: Parallel Implementations & Analysis.
[5] Johnson, D. B., & McGeoch, L. A. (1997). The traveling salesman problem: A case study in local optimization.