The rise of eHealth technologies has transformed cardiac disease diagnosis, leveraging edge computing, AI, and IoT to offer critical insights into heart health. Data privacy constraints in centralized systems hinder access to large-scale ECG datasets, posing challenges for early diagnosis. While advancements in quantization and compression enable neural networks to run on edge devices, effective solutions for efficient training and inference on constrained devices are limited. To address challenges to training on the edge, we propose a mix-precision quantized DNN FPGA accelerator designed for multi-class cardiac diagnosis. Our solution achieves a top-1 test accuracy of up to 93.26% while enhancing computational efficiency, optimizing resource usage, and reducing transmission power. Our Mix-Precision Quantized FPGA Accelerator achieves up to 136x and 7.2x faster inference compared to state-of-theart Split-CNN and DCNN-Convolutional FPGA Accelerators, respectively. The accelerators offer a throughput of up to 1439.36 samples per second, latency of only 695µs, and programming logic power consumption staying below 600mW. Using hardwaresoftware co-design, our FPGA-based "Training on the Edge" approach combines software flexibility with hardware speed and improved diagnostic Top-1 test accuracy by up to 2.8% within just five training cycles, making the model more diverse to the dataset. The proposed approach also accelerates the development and reduces hardware rebuild time by a factor of (Training Cycles-1)x, ensuring efficient, sustainable ML solutions on edge devices. The source code is made available on https://github.com/shakeelakram00/Continual-Learningon-FPGAs-using-FINN
Data availability can limit machine learning model generalizability, while privacy concerns arise when sharing data for collaborative learning. Additionally, limited device capabilities and increased communication power can hinder real-world deployment. Differential-Private-Federated-Learning (DPFL) addresses these challenges by sharing encrypted gradients to enhance privacy and minimize data vulnerability while supporting collaborative model training in privacy-sensitive environments. The integration of DPFL with low-power FP-GAs is beneficial in overcoming edge-device limitations. FP-GAs' parallel processing, reconfigurability, quantization support, and hardware-software co-design, enhance DPFL's efficiency and adaptability. This combined approach optimizes training, ensuring secure and efficient collaboration across applications. Importantly, as ML algorithms evolve, FPGA implementations promote sustainability by reducing energy consumption, extending the hardware lifespan, and reducing electronic waste. Moreover, this framework can be used for emulation on FPGAs to explore and develop novel ASIC architectures for DPFL machine learning accelerators. We propose a novel open-source framework-"DPFL-FPGA-Accel" for carrying out design-space exploration of FPGA-based DPFL, source code is made available on https://github.com/shakeelakram00/DPFL-FPGA-Accel-Framework.git for wider adoption. To the best of our knowledge, this marks the first implementation of DPFL with FPGAs that enables users to design optimized FPGA accelerators for ML tasks implementing privacy-preserving federated learning with adjustable epsilon values. We illustrate the application of this framework in processing healthcare data, designing a DPFLenabled FPGA system for cardiac arrhythmia classification within a typical hospital setting, where electrocardiograms are not shared among hospitals. The FPGA-Accel achieves significant inference speedups, providing up to 605.2x and 107.2x faster performance compared to Arm Cortex-A53 and Intel Core i7 processors, respectively.