Xu Yin -

Instance-level human parsing, critical for human-centric analysis, involves labeling pixels of human body parts and associating them with specific instances. Despite progress in multi-person parsing, segmenting individuals in dense crowds remains challenging. The Visual Saliency-Based Human Parsing (ViS-HuP) algorithm addresses this by using visual saliency to enhance body pixel clarity and incorporating edge detection to refine instance boundaries within a multi-task learning framework. Tested on the Crowd Instance-level Human Parsing (CIHP) dataset[1], ViS-HuP outperforms conventional methods, showing significant accuracy improvements in crowded scenes.