CATEGORY GUIDING PRINCIPLE ADDITIONAL DESCRIPTION LITERATURE COHERENCE
Purpose & relevance P1. Disclose which clinical problem the model addresses and how it fits in a clinical workflow MI-CLAIM, CONSORT-AI, FUTURE-AI
P2. Collect modeling data in a consistent, clinically relevant and generalizable manner that aligns with the intended use DECIDE-AI, MI-CLAIM, FDA
P3. Benchmark performance to existing clinical standards of care or previous AI studies or proof of concepts Choosing a proper benchmark is essential to demonstrate clinical relevance and show the potential for patient care. FUTURE-AI, TRIPOD-AI, MI-CLAIM, FUTURE-AI
Model development M1. Design a conceptual model with a definition of the predicted outcome and its presumed relationship to the input variables A conceptual model stimulates inclusion of domain expertise, focus and prioritization in data collection, and alignment with existing hypothesis and knowledge.
M2. Safeguard appropriate separation between training, validation, and test datasets Ensure that model optimization is performed on the training set with tuning of model configuration on the validation set, without affecting the test set. TRIPOD-AI, MI-CLAIM, MINIMAR, FDA
M3. Ensure proper documentation and execution of model optimization steps The number of configurations steps and decision is generallye extensive and requires thorough tracking and documentation. In addition, it is vital that these steps are not performed on the test set. MINIMAR, FUTURE-AI
M4. Determine the evaluation procedure, metrics and rationale up-front, before starting the modeling procedure Defining metrics post-analysis is a common pitfall that can lead to overestimation of performance. The evaluation procedure should be in concordance with what it clinically relevant. MI-CLAIM, FUTURE-AI
Replicability R1. Evaluate model performance in a prospective study, randomized trial, or at least an independent replication cohort Ensure that the testing conditions are clinically relevant and representative for the intended usage context. RISE, MINIMAR, FDA
R2. Perform sensitivity and robustness checks to assess whether the system is impartial to changing environments or populations This can further be boosted by training the model on a heterogeneous population. TRIPOD-AI, MI-CLAIM, FUTURE-AI
R3. Disclose data preprocessing and the way in which data quality is assessed and ensured DECIDE-AI, TRIPOD-AI, MI-CLAIM, MINIMAR, CONSORT-AI, FUTURE-AI
Explainability E1. Determine and provide appropriate levels of interpetability, depending on use case and users. This often guids which algorithm and interpretability tools need to be employed. RISE, TRIPOD-AI, MI-CLAIM, FUTURE-AI
E2. Leverage interpretability toolkits and libraries for black box models The complexity and black box nature of AI models warrant more focus on interpretability. RISE, FUTURE-AI
System design & usage S1. Focus on multi-displinary collaboration during the full AI solution lifecyle Involve a broad range of functional expertise, from AI leads, users, clinicans, study-design experts, in alle phases of development and implementation. RISE, FDA, FUTURE-AI
S2. Invest in the instruction of users on how to interact with the system and predictions Establishing trust in the solution and explaining how it integrates with the clinical workflow is key for adoption and impact on clinical outcome. DECIDE-AI, MINIMAR, CONSORT-AI, FUTURE-AI, FDA
S3. Set up monitoring processes to track technical and analytical performance FDA, FUTURE-AI
S4. Set up a feedback flow to facilitate iterative system improvement Performance results and user interaction and feedback can be used to improve the system in a targeted way. This can be either periodically or automated in a more self-learning setup, although complexity and existing legislation limit usage of the latter. DECIDE-AI
Risks & ethics R1. Define and evaluate the ethical considerations of the system, e.g. algorithmic fairness Various frameworks are available for the responsible application of AI and ML, which provide guidance on the relevant components and methods to assess them. DECIDE-AI, CONSORT-AI, FUTURE-AI
R2. Assess the potential risks involved in the system and outline approach to manage and mitigate them DECIDE-AI