Jo Weeks

and 2 more

Background There is an increasing number of tools that use AI to carry out assessments of published clinical research. Objective We set out to address the question: “Which AI-enhanced tools have been developed or used to help with evaluation of the trustworthiness of clinical trial publications?” Methods We searched five databases for publications of tools, checklists or methods irrespective of how many items they had (Epistemonikos, Google Scholar, PubMed, Scopus, Web of Science). We excluded studies if they did not apply the tool to publications of randomised clinical trials. Our search was restricted to publications in English. The date of the last search was 27 March 2025. For each identified tool we identified the domains and questions for which they had been used. If reported, we extracted information on accuracy. Results We identified 16 publications describing 17 tools tackling 4 different domains (governance, plausibility, plagiarism, reporting). We found no papers/tools addressing specific questions in the domain of statistics, but one tool was used to prepare data for statistical trustworthiness assessment. Four papers checked adherence to CONSORT and PRISMA guidelines. Four papers looked for evidence of manipulation or duplication of images; seven papers used various tools to look for suggestions that the publication may not be authored by the named authors (e.g. AI-generated); one paper checked four other governance questions, two other reporting questions, and evaluated whether data could be extracted for statistical trustworthiness assessment. Conclusion If used in conjunction with traditional software/human trustworthiness checks, AI-assisted tools can be relevant as an aid to assessment. We suggest that assessors must have realistic expectations about the capabilities and limitations of AI. Any AI-assisted assessment must align with established guidelines and research practices and outputs must be checked carefully, as only humans should make final judgements on clinical and methodological relevance and plausibility. With this proviso, we predict that with the increasing quality and user-friendliness of AI, and an ever-growing demand for trustworthiness assessment, the use of AI in this area will grow exponentially.

Zarko Alfirevic

and 1 more

BACKGROUND Historically, peer reviewing has focused on the importance of research questions/hypotheses, appropriateness of research methods, risk of bias, and quality of writing. Until recently, the issues related to trustworthiness - including but not limited to plagiarism and fraud - have been largely neglected because of lack of awareness and lack of adequate tools/training. We set out to identify all relevant papers that have tackled the issue of trustworthiness assessment to identify key domains that have been suggested as an integral part of any such assessment. METHODS We searched the literature for publications of tools, checklists or methods used or proposed for the assessment of trustworthiness of randomised trials. Data items (questions) were extracted from the included publications and transcribed on Excel including the domain of assessment as described in the original publication. Both authors then independently assessed each data item to see if the original domain(s) could be re-categorised in 5 domains (governance, plausibility, plagiarism, reporting, statistics). RESULTS From the 41 publications we extracted a total of 284 questions and framed 77 summary questions grouped in 5 domains: governance (13 questions); plausibility (16 questions); plagiarism (4 questions), reporting (28 questions and statistics (16 questions). CONCLUSION The proposed menu of domains and questions should encourage peer reviewers, editors, systematic reviewers and developers of guidelines to engage in a more formal trustworthiness assessment. Methodologists should aim to to identify the domains and questions that should be considered mandatory, those that are optional depending on the resources available, and those that could be discarded because of lack of discriminatory power.

Zarko Alfirevic

and 2 more

BACKGROUND There is increasing concern that a significant proportion of randomised controlled trials (RCTs) included in Cochrane reviews may not be trustworthy. Applying a trustworthiness screening tool (TST) has already had a clinically important effect on several reviews published by the Cochrane Pregnancy and Childbirth Group. OBJECTIVES We wanted to assess the impact of removing untrustworthy RCTs from already- published Cochrane reviews on a defined clinical area (ante- and post-natal nutritional interventions). METHODS We applied the tool to 18 Cochrane reviews (375 RCTs). The tool had four domains: i) is the research governance trustworthy; ii) are the baseline characteristics trustworthy; iii) is the study feasible; iv) are the results plausible?). When additional information was needed, authors were contacted using a standard template. At least two attempts were made to contact the authors. At the end of the evaluation process each study was classified as: i) included (YES to all domains); ii) excluded (retracted study); or iii) awaiting classification (any NO to the TST questions). RESULTS 95/375 studies (25%) were removed, affecting 14/18 (78%) reviews. 13/18 reviews (72%) showed a difference in the Summary of Findings tables (direction and size of effects and/or GRADE ratings). 6/18 Cochrane reviews (33%) were judged to require updating because of important differences in either in their conclusions, implication for practice, and/or implication for research. CONCLUSIONS Formal assessment of trustworthiness and inclusion only of studies that satisfy prespecified criteria for trustworthiness affect conclusions in a relatively large number of Cochrane reviews, with potentially important clinical implications for practice and research. The lack of consensus regarding the best tool(s) for assessing trustworthiness cannot be an excuse for ignoring this issue in future Cochrane reviews.