In an intelligent IoT environment, an edge server needs to retrieve data from end devices to train the deep neural network deployed at edge. In order to achieve the higher training performance with limited resource, we propose a Bandit-based in-network training data retrieval scheme(Bandit-TDRetrieval). Specifically, we formulate data retrieval from end devices with a multi-armed bandit (MAB) model. A sequence of lever pulls of the arms indicate the options to retrieve date from different end devices, and follows a binomial distribution. To identify the relation between this binomial distribution and the rewards through continuous data retrieval from the corresponding devices, Thompson sampling is used. According to this relation, we design a training data retrieval paradigm in IoT to maximum the rewards to retrieve training data for learning at edge. Finally, the evaluation is carried out on the simulation platform, which can effectively improve the training efficiency of the deep neural networks at edge.