Reinforcement Learning for the Traveling Salesman Problem: Performance
Comparison of Three Algorithms
Abstract
TSP is one of the most famous problems in graph theory, as well as one
of the typical NP-hard problems in combinatorial optimization. Its
applications range from how to plan the most reasonable and efficient
road traffic to how to better set up nodes in the Internet environment
to facilitate information flow, among others. Reinforcement learning has
been widely regarded as an effective tool for solving combinatorial
optimization problems. This paper attempts to solve the TSP problem
using different reinforcement learning algorithms and evaluated the
performance of three RL algorithms (Q-learning, Sarsa, and Double
Q-Learning) under different reward functions, ε-greedy decay strategies,
and running times. The results show that the Double Q-Learning algorithm
is the best algorithm, as it could produce results closest to the
optimal solutions, and by analyzing the results, better reward
strategies and epsilon-greedy decay strategies are obtained.