Scott Barnes -

We present a decentralized advantage actor-critic algorithm that utilizes learning agents in parallel environments with synchronous gradient descent. This approach decorrelates agents' experiences, stabilizing observations and eliminating the need for a replay buffer, requires no knowledge of the other agents' internal state during training or execution, and runs on a single multi-core CPU.