We benchmark five ImageNet-pretrained models—ResNet-50, DenseNet-121, EfficientNet-B0, ConvNeXt- Tiny, and ViT-B/16 for pneumonia detection on the Paul Mooney “Chest X-Ray Pneumonia” dataset using a freshly curated, leakage-audited split. A unified PyTorch recipe (AMP, AdamW, OneCycleLR, class-weighted cross-entropy; early stopping on validation AUC) ensures apples-to-apples comparisons under class imbalance ( 1:2.7). On the held-out test set, ResNet-50 provides the best overall trade-off with AUC 0.9972, F1 0.9850, MCC 0.9420, and the lowest Brier 0.0163. EfficientNet-B0 is a close, efficient second (AUC 0.9946; F1 0.9777); ViT-B/16 remains competitive at 224 px. DenseNet-121 and ConvNeXt-Tiny favor very high recall at a specificity cost. ECE and Brier indicate non-trivial miscalibration, motivating temperature scaling. Results are specific to this dataset, which is small and single-source; broader clinical claims require external validation (future work). We report multi-metric results and outline statistical testing protocols (bootstrap CIs, DeLong, McNemar) and provide calibration guidance.