Real-time video recording was achieved using a Sony IMX708 camera, configured with a resolution of 11.9 megapixels, a 102-degree horizontal field of view, and a 67-degree vertical field of view. Video processing was subsequently performed on a Raspberry Pi 5, equipped with a 2.4 GHz quad-core 64-bit Arm Cortex-A76 CPU, 512 KB per-core L2 cache, a 2 MB shared L3 cache, and 8 GB of memory. This configuration provided a cost-effective computational solution for real-time weed detection. To reduce processing time, a preprocessing stage was implemented to filter video frames based on green pixel density. This approach prioritized frames likely containing vegetation (i.e., those with more than 50% green pixels), thus focusing computational resources on potential weed regions and reducing unnecessary processing of bare ground areas (Figure 3a). For weed detection and classification, the You Only Look Once X (YOLOX) architecture was used. This anchor-free architecture employs a backbone, neck, and head structure to perform feature extraction, feature aggregation, and bounding box prediction, respectively [25]. The YOLOX architecture was chosen over YOLO models for improved speed and accuracy. Two versions of the YOLOX object detection architecture, YOLOX-s and YOLOX-nano were used for training and comparison.