Matteo Monopoli -

Enhancing a Soft GPU IP Reliability Against SEUs in Space: Modelling Approach and Cri...

Matteo Monopoli

and 6 more

November 27, 2024

Field Programmable Gate Arrays find extensive use across various domains, ranging from telecommunications to Machine Learning tasks. In the context of space applications, they emerge as a highly favourable choice, owing to their high degree of flexibility and their accessibility in different silicongrade technologies. However, due to the high associated costs of radiation-hardened devices, the prevalence of radiation-tolerant devices is more common in practical deployment scenarios. In this sense, robustness-enhancing techniques can still be applied, thus allowing the employment of these devices in long-duration tasks without the need for a radiation-hardened counterpart. In this paper, we propose a modelling approach for estimating the reliability of a digital design system in space before carrying out fault injection tests and radiation campaigns. For this purpose, we employ a high-level tool called Möbius to obtain a first reliability estimation against Single Event Upsets by means of Fault Tree Models. We also propose a novel methodology for defining which fault mitigation techniques should be applied to each design sub-module, considering the most commonly used in the space environment. In particular, we focus on a specific use case, namely a Soft Graphic Processing Unit IP, implemented on the Xilinx RT XQRKU060 FPGA. We analyze the criticality, power and area impact of the most important design parts of the GPU when implemented in hardware, introducing a classification approach for associating an appropriate fault mitigation technique with each of them. Finally, in the last section of the paper, we hint at future development and conclude the work.

Highly-Parameterised CGRA Architecture for Design Space Exploration of Machine Learni...

Luca Zulberti

and 3 more

September 08, 2023

This work presents a highly parameterised CGRA-based accelerator that we developed for an extensive Design Space Exploration activity on design parameters. The description starts from the CGRA building blocks, the Functional Units, and progresses towards the top level of the architecture, represented by the Node component, which is composed of an NxM matrix of Processing Elements. For each level of the hierarchy, we describe the HDL design parameters affecting the run-time reconfigurability of the accelerator, delving deeper into the functionality of the architecture. Outcomes are reported after synthesis on TSMC 40nm standard-cell technology.

Exploring Key Aspects of Soft GPGPU Computing for On-board Acceleration of Artificial...

Matteo Monopoli

and 3 more

September 14, 2023

Artificial Intelligence has gained widespread adoption across different industrial sectors, serving as a versatile tool to carry out a diverse array of tasks, ranging from image classification and traffic forecasting to natural language processing and speech recognition. In the space domain, however, a special focus must be placed on area overhead, power consumption, and fault-tolerant solutions. In this particular scenario, soft GeneralPurpose Computing on Graphic Processing Units has the potential to revolutionise space-related activities. Indeed, by leveraging both Field Programmable Gate Array technology and Graphic Processing Unit computing, it becomes feasible to achieve highperformance capabilities without compromising neither power consumption nor radiation tolerance features. Moreover, the use of reconfigurable hardware can facilitate the acceleration of a wide range of Machine Learning algorithms, avoiding the drawbacks of excessive specialisation. This paper explores the State-of-the-Art in terms of hardware platforms for on-the-edge acceleration of Artificial Intelligence algorithms and compares it with a possible System-on-Chip implementation based on a softGraphic Processing Unit. Then, the attention is shifted towards the investigation of key aspects for future space missions, such as reliability and Dynamic Partial Reconfiguration. We point out the lack of European technological solutions, emphasising the promising potential offered by NanoXplore devices. We also discuss the importance of fault detection and mitigation techniques in space applications, covering the most commonly employed hardware methods for reliability enhancement and highlighting the lack of work in the field of General-Purpose Computing for Graphic Processing Units, especially in the space sector. Furthermore, we briefly examine the implementation of Dynamic Partial Reconfiguration over a System-on-Chip featuring a softGraphic Processing Unit IP. Finally, in the last section of the paper, we hint at future development of the project and conclude the work.