This paper investigates the impact of various architectural modifications and hyperparameter tuning on the performance of Convolutional Neural Networks (CNNs) in image classification tasks. We conducted multiple tests, varying the number of kernels in the convolutional layer and the output size of the fully connected (FC) layer across different architectures. Despite these variations, the performance on the tuning set consistently surpassed that on the training set, indicating that increased complexity did not enhance model performance. Our experiments revealed that adding dropout layers improved accuracy on the tuning set, with CNN1 achieving the best performance. However, increasing the number of convolutional layers in CNN2 and CNN3 led to overfitting. Comparatively, the Bag of Visual Words (BOWs) method showed the highest accuracy but required significant computational time. The Multilayer Perceptron (MLP) classifier demonstrated substantial overfitting. Future improvements include data augmentation and refined Region of Interest (ROI) segmentation. Additionally, employing pre-trained models like VGG16 for fine-tuning could further enhance performance. This study underscores the importance of balancing architectural complexity and computational efficiency in CNN-based image classification.