A Brief Look at the History and Future of Patient-Facing Symptom Checkers

By Susan Carr, Senior Writer

Enabled by technology, we live in an increasingly self-serve society. Instead of standing in line to see the bank teller, we find the nearest ATM. We use the internet to book our own airline and theater tickets and tap our credit card to pay for the gas we pump. The trend extends to health care as well, including self-serve transactions like paying bills online and using a kiosk to check-in before seeing the doctor, reviewing lab results on a patient portal, participating in patient-led disease communities, and searching online for health information and advice.

Patient-facing symptom checkers—interactive tools used by laypeople to understand their symptoms and decide what, if any, treatment they may need—are part of the self-serve movement. Available as independent, direct-to-consumer resources, or through websites or mobile apps offered by physician practices, health systems, and payers, symptom checkers typically ask the patient a series of questions about themselves and their symptoms and then produce a list of possible diagnoses and recommended next steps.

The resources and tools loosely referred to as symptom checkers vary widely in their features, approaches, and goals but generally fall into two distinct categories: structured, rules-based systems and artificial intelligence (AI) systems that use natural language processing (NLP) and machine learning. In rules-based systems, patients pick from a list of symptoms, identify the chief complaint (which symptom bothers them the most), and answer a series of questions to obtain a recommendation of where to seek care. NLP/machine learning AI systems ask patients to describe their symptoms in their own words, ask questions related to those symptoms, and provide care recommendations.

Some of the variation stems from the programs’ origins. Sue Riffel says the patient-facing symptom checker offered by Self Care Decisions, where she is CEO, was first developed by pediatrician Barton Schmitt, MD. Having watched his medical students and residents take after-hours calls from parents of sick children, Dr. Schmitt thought it would help if they had a structured approach to follow. He began writing evidence-based guidelines in the 1970s to help them ask the right questions, gather needed information, and guide parents through next steps. Current versions of Schmitt’s telephone triage guidelines are now used by nurses in call-in centers throughout the country.

By 2000, Dr. Schmitt had developed a version of triage guidelines for parents to use, which further evolved into symptom checkers offered by Self Care Decisions for adults and children. “We don’t try to diagnose,” says Riffel. “We help patients get to the right level of care and then let a professional who knows them and their health history diagnose the problem.” Self Care Decisions aims to guide people to the appropriate setting for care, taking into consideration the severity of their symptoms, how quickly they need care, and what resources they are likely to need.

Isabel Healthcare’s patient-facing symptom checker, first offered in 2012, is based on the company’s decision-support tool used by physicians to help build and broaden their differential diagnosis. The patient symptom checker runs on the same database, with adjustments to make it easier for patients to use for understanding their symptoms and getting triaged to the appropriate care setting. The Isabel engine has been evidence-based, independently validated, and continually updated since 2000. Isabel’s products accept free-text input (NLP) and use machine learning to access its curated database. Don Bauman, CEO of Isabel Healthcare USA, says Isabel used has AI for over 20 years and that the model is inherently flexible, allowing for rapid updates. When something new comes along—COVID-19, for example—Bauman says information can be added to the database without time-consuming coding and the need to build rigid rules around a new disease, where information may be limited.

Studies examine the user experience and accuracy of symptom checkers

Symptom checkers are popular, widely used, and by their nature, challenging to study. The results of one study, performed in 2018, give a detailed view of the priorities and experience of one group of users. The study was based on responses to a survey sent to a random sample of patients using Isabel’s web-based, direct-to-consumer symptom checker. Among 329 individuals who completed the survey:

76% used it to better understand what caused their symptoms
33% wanted help deciding whether to seek in-person health care
21% wanted help deciding what setting of care to visit
16% were looking for health care advice without having to see a doctor
13% were hoping to better understand a diagnosis made by their doctor
Further, 87% found using the symptom checker to be “satisfying,” and 84% found it “useful as a diagnostic tool.”
Ninety-one percent said they would use it again.

The authors note that the self-reported data used in the study cannot be validated nor used to assess the symptom checker’s accuracy, and the sample was “overwhelming female and white,” with a mean of 8 visits to physicians in the prior 12 months.

Researchers struggle to measure the accuracy of symptom checkers

Studies that examine the accuracy of direct-to-consumer symptom checkers typically aggregate the performance of many products and use medical vignettes—brief case studies—from medical courses or medical students to enter symptoms. The studies apply the known final diagnosis from the vignette to evaluate the triage recommendation. That approach may not closely mimic use by patients in the real world, and aggregated results may be of limited value to consumers, who are unlikely to use more than one or two symptom checkers. Non-aggregated results, available in limited form in some studies, show a wide range of performance, from impressive to disappointing.

That said, research done since 2015 has shown that performance is on average mediocre. In 2015, Harvard-based researchers found 23 symptom checkers that met their study criteria: publicly available, free, applied across a range of conditions and diseases, and available in English. They included both self-triage symptom checkers and self-diagnosis models and used 45 vignettes to test performance. In the study, the correct diagnosis was listed among the first three diagnoses in 51% of cases and among the top 20 in 58% of cases, with results varying significantly from program to program. The study also showed that on average, triage advice was “appropriate” in 57% of cases, with a range from 33% to 78%.

In 2020, other researchers revisited the 2015 study to assess whether and how performance had changed. Using the same 45 vignettes from 2015 to test 22 symptom checkers that met the original criteria, overall results were roughly comparable.

The 2020 study also compared its results with a study that examined the triage skills of laypeople. Reviewing the same 45 clinical cases, 91 adults with no professional medical background chose the correct level of triage for 61% of the cases, slightly better than 58%, the result delivered overall in the 2015 study.

A systematic review published in 2022 similarly found “generally low” accuracy for both diagnosis and triage, with major variations in performance across 48 symptom checkers included in the 10 studies reviewed. The authors recommended future research focus on real-world versus simulated patient data and testing specific clinical pathways, as well as increased, ongoing regulatory review

Other researchers surveyed the literature from 2014 to 2017 for studies that covered the performance of symptom checkers. They found many of the problems already mentioned, commenting that, overall, the current evidence base on DTC [direct-to-consumer], interactive diagnostic apps is sparse in scope, uneven in the information provided and inconclusive with respect to safety and effectiveness, with no studies of clinical risks and benefits involving real-world consumer use.

Artificial intelligence offers risk and opportunity for the future

Recent developments in artificial intelligence are raising new questions about future directions in self-serve triage and diagnosis. ChatGPT, a user-friendly chatbot developed by OpenAI, attracted great interest when it was released to the public in November 2022. It is reported to be the “fasted-growing consumer application in history,” gaining 100 million monthly active users in its first two months. The release has attracted both enthusiasm and skepticism among all stakeholders in healthcare.

ChatGPT accesses a large language model trained on vast amounts of text found on the internet. It accepts requests or “prompts” in natural language, infers relationships between words in sequence, and responds in conversational style, offering users engaging and seemingly natural communication. For patients searching for health information online, ChatGPT delivers answers written in full sentences and paragraphs versus the lists of relevant URLs that come from search engines such as Google.

Last month, researchers tested ChatGPT’s performance as a symptom checker, with provocative results. Ateev Mehrotra, M.D., Andrew Beam, Ph.D., and research assistant Ruth Hailu used the 45 vignettes from the 2015 study—on which Mehrotra was a co-author—to test the chatbot’s accuracy. They report it did well, compared with the aggregated results of earlier studies of direct-to-consumer symptom checkers. Amanda Tomlinson, director of quality assurance at Isabel Healthcare, says that result is to be expected. She says, “ChatGPT is trained on freely available information it learns from the internet, and these studies and cases have been available since 2015.”

Noting that physicians’ rates of misdiagnosis are estimated to be 10% to 15%, the researchers say ChatGPT came “close to the performance of physicians in terms of diagnosis” in their study. Specifically, they found:

It listed the correct diagnosis within the top three options in 39 of the 45 vignettes (87%, beating symptom checkers’ 51%) and provided appropriate triage recommendations for 30 vignettes (67%).

They also note the study’s limitations: the sample size was small, the vignettes artificial, and the results not generalizable.

Among its other recent accomplishments, ChatGPT performed “at or near the passing threshold of 60%” on the 2022 United States Medical Licensing Exam—an achievement that some believe reveals more about the exam's limitations than the model’s expertise.

Those and other results have made news, but to realize their potential in diagnosis and other clinical applications, experts agree that ChatGPT and similar programs must be trained on health data and medical literature. AI models also need to overcome problems with errors and fabrications, known as “hallucinations,” which are well-known glitches in the performance of ChatGPT and other chatbots. In current form, they will not garner trust—paramount for participation in health care.

Despite these challenges and active debate about the ethics, equity, and safety of ChatGBT, these new developments in artificial intelligence are likely soon to have an impact on the habits of patients and consumers seeking healthcare advice and the symptom checkers many of them now use.

Subscribe to ImproveDx

Get the latest news on improving diagnosis delivered directly to your inbox. Sign up for our newsletter.