The Travelling Salesman Problem

Abhijith Rao
7 min readJun 29, 2021

--

A brief introduction to the travelling salesman problem and a look at how it fits into the bigger picture of the theory of complexity classes

“It’s a Sunday, for crying out loud, I ain’t going nowhere”, screamed the salesman.

Who wouldn’t? Ask yourself this. If you were given the work to travel to every city of a said state, on a Sunday, to deliver something, wouldn’t you go mad?

“Either way”, said the salesman’s friend, “I could design you an algorithm that would find the order in which you would need to visit the cities so that you would have to travel the least distance.”

“You can do that?” asked the salesman.

“Can I?”, asked the friend, in a confident yet challenging tone, “Course I can, it’s simple.”

“Oh wait! I get it”, exclaimed the salesman, “All you have to do is write a code, that would find the closest city from my current location, and if it’s a city that I have not yet travelled to, I would go there, if not the code would look for the city that’s next closest to my current location and so on, until I have visited every city!”

At this the friend, burst out laughing.

“Oh dear”, the friend said, “you got an algorithm alright, but sadly it doesn’t guarantee the optimal result!”

“Optimal result?”, the salesman asked.

“Yes now look!”

“Say you start at location A, want to travel to location D, and for simplicity, let’s also assume for now, that you needn’t travel to all the cities. If we use your algorithm, we would rightfully travel through B, since it’s the path that’s shorter and then to D. We reach our final destination, and in the most efficient manner”, the friend explained.

“Okay, but doesn’t that make me right?”, the salesman asked.

“For this case? Yes. However, look again!”

“This time, although travelling to B instead of C is the shorter option, the total time you took to reach D, would be shorter if you had chosen C in the beginning instead of B”, explained the friend.

“Oh man! This is giving me a headache. Could you find the shortest path for me, so I can just get going?”

“Yeah sure! Just tell me how many cities you’re going to have to travel to, and I’ll start designing the algorithm!”

“20”

The friend, who was sipping hot coffee, spit the entire thing onto his computer, dumbfounded. And that my dear readers, is how the travelling salesman problem was discovered.

It is definitely inevitable, that at this point you would ask why the friend reacted the way he did? Why was he so surprised to hear “20”, from the salesman.

The answer to this question, is the same question that has haunted every computer scientist, ever since it was mathematically formulated, by William Rowan Hamilton, in the 1800s.

Sir William Rowan Hamilton

Before, we begin describing the details, as to why this problem is a nightmare to many in this field, let’s first understand the notion of computational complexity.

Simply put, it’s the number of steps a particular algorithm under consideration would take to solve the problem, given the size of the input to be n.

n here could mean a variety of things. It could be the size of input, the number of elements in the input (if it were something like an array), or in the case of the travelling salesman problem, the number of cities.

How the algorithm scales with an increase in the value of n, is what determines the efficiency of the algorithm, and the algorithm described by the friend, in which we find every possible combinations of the cities, scales horribly.

To see why, let’s first see how many possible ways there are to travel through all the cities in the case where there are say 2 cities.

In such a simple case, we could either start at one location and go to the other. Since there are two cities we could start at, there are two possible ways of traversing through the set of cities.

What about 3 cities? Say A, B and C. Again a little experimentation with combination shows us that there are 6 ways.

But what about 20 cities? Or even in the general case, n cities?

The answer is n!, where n! = n * (n-1) * (n-2) * …. * 3 * 2 * 1.

This simple equation shows us that there are approximately, 2.433 * 10¹⁸, arrangement of cities, when the total number of cities in the problem are 20. That’s a HUGE number of combinations of just 20 cities.

What if there were say a 100 cities? The number of combinations would be more than the number of atoms in the observable universe. HOW CRAZY IS THAT?

Even if the computer can find one combination of cities every nanosecond, the time required to find all the combinations would be 31.7 years. And since the salesman has to get the work done by the end of the weekend, the algorithm cannot provide any help.

The worst of this problem is yet to come. There is NO other way to solve it. At least none that we know of. There’s obviously a couple of other ways to solve it, like by recursion or by using dynamic programming. However, even then, the complexities are of exponential order and hence offer no real advantage.

Computer scientists have tried and failed numerous times in order to find a viable solution to the problem and yet the question remains open to this day.

But not all that came from this problem has been bad news. You see the travelling salesman problem has come to define a very important class of problems in the theory of complexity classes, the NP-Hard problems.

Before diving into the same any further, let’s take a look at what it means for a problem to be NP-Hard, or NP, or NP-Complete and suchlike.

According to the theory of complexity classes, P is the set of all problems, that we can solve in polynomial time, that is the algorithm scales with the size of the input as n², or n³ and so on.

It is to be noted that n is raised to some constant power in these cases. Either way, P is the simplest class of problems there are. Examples include, going through a random arrangement of cities from the Travelling salesman problem once. This would only take say 20 steps, if there are 20 cities.

The next class of problems include the NP problems. These are problems whose solutions you can verify in polynomial time.

Clearly, all problems in P are NP, as you can, if a solution is given to you, verify whether it actually is the solution to the problem, by simply finding the actual answer in polynomial time, and compare it with the one provided, hence verifying it.

However, the question begs, are there problems in NP which are not in P? This is, to date, the most important question in theoretical computer science. Why do we not know the answer to that? It’s simple, take a look at the following scenario.

Take the case of integer factorization. Clearly if a solution is given to you, you can easily verify it by dividing the number by the solution given, hence the problem is NP, however finding all possible factors isn’t similarly trivial. The problem lies in the fact that we do not know whether there exists a polynomial time solution to these problems, and if there does exist, then we would have P = NP, and if someone proves otherwise we would have P not equal to NP. Sadly, no one has done so.

Continuing, we next have the NP-Hard and NP-Complete problems.

NP-hard problems are informally defined as those that can’t be solved in polynomial time. In other words, the problems that are harder than P. The travelling salesman problem falls in this category. And finally, the NP-complete problems are the problems that are both NP-hard, and in NP.

Contrary to popular belief, the Travelling Salesman Problem (at least in the form presented here) is not an NP-Complete problem, simply because it’s in NP-Hard but not in NP.

Not in NP because, even if a combination of cities is given to us, we would have to look at all other possible combination of cities in order to verify whether it is the solution.

However, there’s a simple variation of TSP called “decision TSP” that turns it into an NP-Complete problem.

Imagine that instead of finding the shortest loop going through all cities, your goal is to determine if there exists any loop whose total length is less than some fixed number. For example, the question might be: is there a loop that goes through all of these cities, whose total distance is less than 100 km? In the negative case this is just as hard as regular TSP, because you’d end up testing all possible paths. But there’s an important difference: the solution can be verified in linear time by adding up all of the distances making up the path, and that’s what makes this variant part of NP.

With all the progress we have made so far in the field of computer science, we are still baffled by the complexity that these questions still provide us, and owing to that, we may yet build something beautiful.

Thanks for reading.

--

--

Abhijith Rao
Abhijith Rao

Written by Abhijith Rao

Undergraduate student at the National Institute of Technology, Calicut.

No responses yet