Evaluating ChatGPT’s Utility in Biologic Therapy for Systemic Lupus Erythematosus: Comparative Study of ChatGPT and Google Web Search
Background: Systemic Lupus Erythematosus (SLE) is a life-threatening, multisystem autoimmune disease. Biologic therapy is a promising treatment for SLE. However, public understanding of this therapy is still insufficient, and the quality of related information on the internet varies, which affects patients' acceptance of this treatment. The effectiveness of AI technologies, such as ChatGPT, in knowledge dissemination within the healthcare field has attracted significant attention. Research on ChatGPT's utility in answering questions regarding biologic therapy for SLE could promote the dissemination of this treatment. Objective: This study aims to evaluate ChatGPT's utility as a tool for users to obtain health information about biologic therapy for SLE online. Methods: This study extracted 20 common questions related to biologic therapy for SLE, their corresponding answers, and the sources of these answers from both Google Web Search and ChatGPT-4o. Then, based on Rothwell's classification, the questions were categorized into three main types: fact, policy, and value. The sources of the answers were classified into five categories: commercial, academic, medical practice, government, and social media. The accuracy and completeness of the answers were assessed using Likert scales. The readability of the answers was evaluated using the Flesch Reading Ease (FRE) and Flesch-Kincaid Grade Level (FKGL) scores. Results: The study found that, in terms of question types, ChatGPT-4o had the highest proportion of fact questions (10/20), followed by policy (7) and value (3). Google Web Search had the highest proportion of fact questions (12/20), followed by value (5) and policy (3). In terms of website sources, ChatGPT-4o's answers were sourced from 48 sources, with the majority coming from academic sources (29/48). Google Web Search provided answers from 20 sources, with an even distribution across all five categories. For accuracy, ChatGPT-4o's mean score (5.83 ± 0.49) was higher than that of Google Web Search (4.75 ± 0.94), with a mean difference of 1.08 (95% CI: 0.61-1.54). For completeness, ChatGPT-4o's mean score (2.88 ± 0.32) was higher than that of Google Web Search (1.68 ± 0.69), with a mean difference of 1.2 (95% CI: 0.96-1.44). For readability, the FRE and FKGL scores for ChatGPT-4o and Google Web Search were 11.7 and 14.9, 16.2 and 20, respectively, indicating that both texts were of high reading difficulty, requiring readers to have a college graduate-level reading proficiency. When asking ChatGPT to respond at a sixth-grade level, the readability of the answers significantly improved. Conclusions: ChatGPT's answers are characterized by accuracy, rigor, comprehensive and professional supporting materials, and demonstrate humanistic care. However, the readability of the provided text is low, requiring users to have a college education background. Given the study's limitations in question scope, comparison dimensions, research perspectives, and language types, further in-depth comparative research is recommended.