The transition to weather dependent renewable energy generators requires the electric loads to be adjusted to generation. This is made possible by demand response programs and home energy management systems. However, practically easy to use rule-based control systems often miss many optimization potentials. Self-learning alternatives employing reinforcement learning often ignore the partial observability of the building control problem and consequently neglect the importance of the observation history. Adaptive control systems that do consider that history often rely on policies that suffer from catastrophic forgetting, which makes them unable to fully grasp long histories. As an alternative, we present a new reinforcement learning method for autonomous building energy management control based on the soft actor-critic method and the transformer deep neural network architecture. For the control of a heat pump and an the inlet port of a thermal storage, under consideration of photovoltaic generations and dynamic electricity prices, we formulate the problem as partially observable and use the history of observations to determine the control signals. We show, based on a validated building simulation, that our method outperforms rule-based as well as reinforcement learning methods that use multi layer perceptrons or recurrent neural networks as policy.