To address the challenges of missed detection and misdetection in tea bud recognition tasks under complex environments, this paper proposes YOLOv8-SIEMF, a novel detection model integrating Sub-models Integral Evaluation (SIE) and Multi-objective Filtering (MF). First, we design a hierarchical detection framework where different sub-models process diverse resolution levels of input images to extract complementary features. An evaluation mechanism is developed to comprehensively fuse the outputs of sub-models by considering detection confidence, box overlap, and image sharpness. Meanwhile, a multi-objective filtering module is introduced to enhance the model’s sensitivity to multi-target clusters and improve edge sharpness in grayscale space, which effectively reduces redundant or invalid detection. Experimental results on a self-built dataset demonstrate that the proposed model outperforms mainstream YOLOv8 variants in terms of precision and recall, achieving superior performance in recognizing fine-grained tea buds under real-field conditions.