close
close
The use of artificial intelligence-based chatbots in ophthalmological triage

AI-based chatbots have recently emerged as accessible resources for providing medical information to patients (5). These chatbots are based on NLP and machine learning and provide human-like text responses. As these chatbots become more popular, it is important to evaluate their accuracy to help both patients and doctors make decisions.

In contrast to its widespread use, evidence-based data evaluating the scientific accuracy of the chatbot in answering patient questions is rare. Lahat et al. (9). evaluated the performance of ChatGPT in answering patient questions in the field of gastroenterology. Their results showed that ChatGPT was able to provide accurate answers to patients’ questions in some, but not all, cases. The most accurate answers were given to questions about the treatment of specific diseases, while the answers describing the symptoms of the disease were the least accurate.

Our work focuses on evaluating the accuracy, completeness and clarity of AI-based chatbots in answering common patient queries in the field of ophthalmology.

Our results show that both ChatGPT and Bard can provide good and clear answers to patient questions in clinical ophthalmology. This is consistent with previous studies that found chatbots to be a promising diagnostic adjunct in ophthalmology, but still cannot be a substitute for professional ophthalmologic assessment (10,11,12).

In this current study, ChatGPT had higher mean scores for accuracy (4.0 vs. 3.0), completeness (4.5 vs. 3.0), and clarity (5.0 vs. 4) in expert reviews compared to Bard. 0) on. These differences represent a significant, statistically significant difference in the models’ ability to provide accurate, comprehensive and clear answers to ophthalmology questions, giving ChatGPT a relative advantage in these aspects.

Other recent studies comparing Bard and ChatGPT also found that the answers given by ChatGPT were more accurate (13, 14).

In our study, eight specialists from different ophthalmological disciplines compared the answers. This number of experts and their diversity is relatively high compared to previous studies (13, 14).

Our study is not without limitations, although the expert reviews are blind to the specific AI model, they are inherently biased and influenced by their own clinical knowledge and experience. In addition, the conclusions are based on specific questions and may differ if the questions were worded differently. Because these AI models are continually improving, creating a question today will not necessarily provide the same answer as when these models were first used, nor when used in the future. Other AI-based chatbots were not evaluated in this article and their accuracy in answering questions in clinical ophthalmology remains to be investigated. Furthermore, we used the web interface to query the models and therefore did not evaluate hyperparameter tuning or other advanced techniques such as retrieval augmented generation or fine-tuning. Also, we didn’t look at prompt engineering, but rather a simple, straightforward prompt. However, the use of a web interface replicates the usual interaction of patients with chatbots, which we wanted to simulate in our study.

In conclusion, our study highlights the potential usefulness of chatbots, particularly ChatGPT, as complementary resources for answering common patient inquiries about ophthalmology. Although these AI models show promise, the differences in their performance highlight the need for continued refinement and optimization to better match expert-level responses. Future research should focus on improving the completeness, accuracy, and clarity of AI-driven responses to meet the needs of clinical ophthalmology practice.

Leave a Reply

Your email address will not be published. Required fields are marked *