Presentation + Paper
2 April 2024 Leveraging LLMs like ChatGPT for robust quality checks and medical text agreement rationale: enhancing adjudication quality and alignment in BICR for oncology clinical trials
Author Affiliations +
Abstract
Blinded Independent Central Review (BICR) is recommended by the US FDA for registration of oncology trials as image assessment bias is avoided and no chance of unblinding of patient data. Double read with adjudication is the method used to reduce endpoint assessment variability. In cases of disagreement between the readers, a third reader called an adjudicator, reviews the assessment by the two radiologists and decides which assessment is most accurate. Adjudication Rate (AR) and Adjudicator Agreement Rate (AAR) are the two indicators used to evaluate reviewer performance and overall trial variability and quality. Sentiment Analysis (SA) is based on natural language processing and can tag the data as ‘positive’, ‘negative’ or ‘neutral’ although current technologies can provide a more complex analysis of emotions in the written text. Medical SA can analyze patients’ and doctors’ opinions, sentiments, attitudes, and emotions in the clinical background. Python, the most frequently used programming language for deep learning worldwide and ChatGPT, an AI-based chatbot can be used for assessing adjudicator comment quality based on sentiment analysis. If successful, this analysis can open another novel implementation for Large Language Models (LLMs) or ChatGPT in clinical research and medical imaging. This prospective study involved the review of cases for 100 subjects by board-certified radiologists using the Response Evaluation Criteria in Solid Tumors (RECIST) 1.1 criteria. The study employed a double read with adjudication paradigm in a central imaging review setup. The agreement of adjudication was assessed and compared with the overall response, agreed reader, and medical text. The medical text entered by the adjudicator is usually a free text field that typically lacks standardization and control over its content, which may affect its correlation with reviewer selection for agreement. Although uncommon, errors by the adjudicator can occur due to ambiguous text, mis-clicks, or application delay errors. To analyze the adjudicator’s comments, sentiment analysis was conducted using a Python plug-in with ChatGPT as a large language model. Based on this analysis, the subjects were categorized as either having “Potential Error” or “No Error”. The algorithm supported by ChatGPT was evaluated against a Gold Standard, determined by a board-certified radiologist with over 20 years of experience in the BICR process. A comparison was made to assess accuracy and reproducibility, revealing that only four out of 100 subjects had different outcomes. The sensitivity was calculated as 0.857, specificity as 1.0, and accuracy as 0.96. The remarkable Natural Language Processing (NLP) capabilities of ChatGPT are evident in its ability to classify the sentiment as positive, negative, or neutral based on the free-text adjudicator comments provided during the review process. This classification enables a comparison with the actual assessment, adjudicator agreement, and overall patient outcome, highlighting the impressive performance of ChatGPT in this regard.
Conference Presentation
(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.
Manish Sharma, Samira Farough, Andre Burkett, Jerome Prasanth, Nabil El-Shafeey, Dominic Zygadlo, Chera Dunn, and Ron Korn "Leveraging LLMs like ChatGPT for robust quality checks and medical text agreement rationale: enhancing adjudication quality and alignment in BICR for oncology clinical trials", Proc. SPIE 12931, Medical Imaging 2024: Imaging Informatics for Healthcare, Research, and Applications, 1293103 (2 April 2024); https://doi.org/10.1117/12.3009153
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Error analysis

Clinical trials

Oncology

Tumors

Analytical research

Education and training

Radiology

Back to Top