Fig 1: Generalized Telepresence Robot Framework
This study proposes a strategy for predicting teleoperator behaviour
while controlling the telepresence robot by using a recurrent neural
model built on the architecture of LSTM [6] and integrate it with
the DDPG framework [7]. The goal is to create a single model that
can handle all these distinct types of data from embedded sensors,
whether they are raw data or not. Additionally, this model is used to
demonstrate the significance of data considering the circumstances the
telepresence robot will face. Thus, each entity should specify a control
signal like angular and linear velocity.
Proposed Methodology: One of the deep learning approaches known
as RNN automatically chooses the proper attributes from the practice
cases. By storing a wealth of past data in its internal state, RNN is
suitable for data processing and has exceptional potential in
time-series forecasting. The basic configuration of an LSTM memory cell
consisting of the long-term state component (\(C_{t}\)) and the
short-term state component (\(h_{t}\)).
Input, forget, control, and output gates comprise LSTM’s basic
architecture. The input gate is what decides which data to transmit
towards the cell and is described in Equation (1):
\(i_{t}=\ \sigma W_{i}\ \ \times\ h_{(t-1)}x(t)+\ b_{i}\) (1)
The bias vector and weight matrix are represented by b and W in the
above equation. Tanh is applied to level the values in the range of
[-1 to 1]. The proposed approach is to increase the cumulative
future reward \(R_{t}\), which is defined as Equation (2):
\(O_{t}=\ \sigma W_{o}\ \times\ h_{t-1}x\left(t\right)+b_{o}\)(2)
With gamma ranging from (0-1]. Under the state \(s_{t}\) and action\(a_{t}\), the estimation of \(R_{t}\) is defined as the value function
in Equation (3):
\(R_{t}=\ r_{t}+\ \Upsilon.r_{t+1}+\ \Upsilon^{2}.r_{t+2}+\ldots=\ \sum_{k=0}^{\infty}{\Upsilon^{k}r_{t+k}}\)(3)
To find the best action value \(P^{*}(s_{t},\ a_{t})\), that is usually
the major of all policies π. Afterwards, the optimal policy selects the
action as Equation (4) to comprehensively train the optimal action
value.
\(P^{\pi}\left(s_{t},\ a_{t}\right)=\ \mathbb{E}_{\pi}\left[R_{t}\middle|s_{t},\ a_{t}\right]=\ \mathbb{E}_{\pi}[\sum_{k=0}^{\infty}{\Upsilon^{k}r_{t+k}|s_{t},\ a_{t}]}\)(4)
This study recommends deep reinforcement learning for
teleoperator-controlling behaviour prediction (DRL). Modified LSTM is
primarily used to predict the transmitted commands’ linear and angular
velocity and turning angle. The anticipated output of LSTM is then
forwarded to the DDPG algorithm. The suggested workflow is shown in
Figure 2.