Fig 1: Generalized Telepresence Robot Framework
This study proposes a strategy for predicting teleoperator behaviour while controlling the telepresence robot by using a recurrent neural model built on the architecture of LSTM [6] and integrate it with the DDPG framework [7]. The goal is to create a single model that can handle all these distinct types of data from embedded sensors, whether they are raw data or not. Additionally, this model is used to demonstrate the significance of data considering the circumstances the telepresence robot will face. Thus, each entity should specify a control signal like angular and linear velocity.
Proposed Methodology: One of the deep learning approaches known as RNN automatically chooses the proper attributes from the practice cases. By storing a wealth of past data in its internal state, RNN is suitable for data processing and has exceptional potential in time-series forecasting. The basic configuration of an LSTM memory cell consisting of the long-term state component (\(C_{t}\)) and the short-term state component (\(h_{t}\)).
Input, forget, control, and output gates comprise LSTM’s basic architecture. The input gate is what decides which data to transmit towards the cell and is described in Equation (1):
\(i_{t}=\ \sigma W_{i}\ \ \times\ h_{(t-1)}x(t)+\ b_{i}\) (1)
The bias vector and weight matrix are represented by b and W in the above equation. Tanh is applied to level the values in the range of [-1 to 1]. The proposed approach is to increase the cumulative future reward \(R_{t}\), which is defined as Equation (2):
\(O_{t}=\ \sigma W_{o}\ \times\ h_{t-1}x\left(t\right)+b_{o}\)(2)
With gamma ranging from (0-1]. Under the state \(s_{t}\) and action\(a_{t}\), the estimation of \(R_{t}\) is defined as the value function in Equation (3):
\(R_{t}=\ r_{t}+\ \Upsilon.r_{t+1}+\ \Upsilon^{2}.r_{t+2}+\ldots=\ \sum_{k=0}^{\infty}{\Upsilon^{k}r_{t+k}}\)(3)
To find the best action value \(P^{*}(s_{t},\ a_{t})\), that is usually the major of all policies π. Afterwards, the optimal policy selects the action as Equation (4) to comprehensively train the optimal action value.
\(P^{\pi}\left(s_{t},\ a_{t}\right)=\ \mathbb{E}_{\pi}\left[R_{t}\middle|s_{t},\ a_{t}\right]=\ \mathbb{E}_{\pi}[\sum_{k=0}^{\infty}{\Upsilon^{k}r_{t+k}|s_{t},\ a_{t}]}\)(4)
This study recommends deep reinforcement learning for teleoperator-controlling behaviour prediction (DRL). Modified LSTM is primarily used to predict the transmitted commands’ linear and angular velocity and turning angle. The anticipated output of LSTM is then forwarded to the DDPG algorithm. The suggested workflow is shown in Figure 2.