Immunotherapy has emerged as a prominent approach in melanoma treatment, however a substantial number of patients do not respond effectively. This highlights the critical need to accurately predict immunotherapy responses for designing personalised treatment strategies. Current practices rely predominantly on clinical data and the expertise of oncologists; however, a deeper understanding of molecular interactions through tissue-based biomarkers offers a promising avenue for advancement. Whole-slide multiplex immunofluorescence (mIF) images enable detailed analysis of cells/tissues in their microenvironment, deepening insights into disease mechanisms. However, numerous channels, an extensive image size, and spatially dispersed information of mIF images pose analytical challenges, requiring advanced techniques to effectively learn these intricate features for optimal performance. In this study, we introduce a novel deep-learning framework, Channel Optimisation with Multi-Instance Learning (COMIL), specifically designed to classify whole-slide mIF images for predicting immunotherapy response in melanoma patients. The study demonstrates that a feature extraction method that models inter-channel relationships and captures complex interdependencies among multiple channels of mIF images enhances classification performance. Additionally, incorporating this method within an MIL framework, optimised at both the slide and instance levels, further improves the classification performance of whole-slide mIF images. Evaluated on mIF images from the Melanoma Institute Australia showed that COMIL outperformed the baseline methods, underscoring its potential in the prediction of the immunotherapy response.