Background ChatGPT is a web interface chatbot based on a large language model with the aim to mimic human conversation tuned with machine learning and supervised techniques, that have gained scientific attention wondering if it can be a tool in medical decision. Methods We tasked ChatGPT 4.0 with creating a multidisciplinary team (MDT) chat and provided it with clinical data from patients (pts) diagnosed with hormone receptor (HR)-positive, human epidermal growth factor receptor 2-negative eBC with an intermediate clinico-pathological risk. These pts were candidates for the Oncotype DX® genomic test. Our goal was to compare our MDT recommendations with those generated by ChatGPT’s and assess the consistency of its responses. Results We gathered data from 100 consecutive pts: median age 57, evenly split between stages I and II, 35 premenopausal. By supplying clinical details (age, stage, menopausal status, HR expression, grading, ki67, comorbidity), we asked ChatGPT to assess the need for Oncotype DX®. Each case was presented 9 times in varied chats to test repeatability, yielding a modal vector with a mean variation ratio of 0.181. Only in 31 pts it always recommended a genomic test. Summarizing ChatGPT's most frequent advices for each patient, it recommended genomic test for 61 pts. Next, we provided Recurrence Scores of the 61 pts, asking for chemotherapy (CT) recommendations. The mean variation ratio in responses was 0.069. The Cohen's kappa coefficient for inter-rater agreement between ChatGPT's and actual CT recommendations was 0.62. ChatGPT did not consider clinical risk but only menopausal status for endocrine therapy: tamoxifen if premenopausal, aromatase inhibitor if postmenopausal. When asked for concurrent CT and genomic test advice, its responses were inconsistent, offering CT for almost all pts regardless of genomic testing recommendation. Conclusions ChatGPT is a generative model capable of producing data that attempts to capture the statistical distribution of its training dataset, but without reasoning abilities. Its low repeatability, along with suboptimal inter-rater agreement, mean it cannot yet replace an MDT. Effective clinical integration requires identifying areas where ChatGPT's knowledge is beneficial.
3P Evaluating conversational generative pre-trained transformer (ChatGPT) as a tool in early breast cancer (eBC) cases
Palumbo F.;
2024
Abstract
Background ChatGPT is a web interface chatbot based on a large language model with the aim to mimic human conversation tuned with machine learning and supervised techniques, that have gained scientific attention wondering if it can be a tool in medical decision. Methods We tasked ChatGPT 4.0 with creating a multidisciplinary team (MDT) chat and provided it with clinical data from patients (pts) diagnosed with hormone receptor (HR)-positive, human epidermal growth factor receptor 2-negative eBC with an intermediate clinico-pathological risk. These pts were candidates for the Oncotype DX® genomic test. Our goal was to compare our MDT recommendations with those generated by ChatGPT’s and assess the consistency of its responses. Results We gathered data from 100 consecutive pts: median age 57, evenly split between stages I and II, 35 premenopausal. By supplying clinical details (age, stage, menopausal status, HR expression, grading, ki67, comorbidity), we asked ChatGPT to assess the need for Oncotype DX®. Each case was presented 9 times in varied chats to test repeatability, yielding a modal vector with a mean variation ratio of 0.181. Only in 31 pts it always recommended a genomic test. Summarizing ChatGPT's most frequent advices for each patient, it recommended genomic test for 61 pts. Next, we provided Recurrence Scores of the 61 pts, asking for chemotherapy (CT) recommendations. The mean variation ratio in responses was 0.069. The Cohen's kappa coefficient for inter-rater agreement between ChatGPT's and actual CT recommendations was 0.62. ChatGPT did not consider clinical risk but only menopausal status for endocrine therapy: tamoxifen if premenopausal, aromatase inhibitor if postmenopausal. When asked for concurrent CT and genomic test advice, its responses were inconsistent, offering CT for almost all pts regardless of genomic testing recommendation. Conclusions ChatGPT is a generative model capable of producing data that attempts to capture the statistical distribution of its training dataset, but without reasoning abilities. Its low repeatability, along with suboptimal inter-rater agreement, mean it cannot yet replace an MDT. Effective clinical integration requires identifying areas where ChatGPT's knowledge is beneficial.File | Dimensione | Formato | |
---|---|---|---|
1-s2.0-S2059702924000267-main.pdf
accesso aperto
Descrizione: 3P Evaluating conversational generative pre-trained transformer (ChatGPT) as a tool in early breast cancer (eBC) cases
Tipologia:
Versione Editoriale (PDF)
Licenza:
Creative commons
Dimensione
83.77 kB
Formato
Adobe PDF
|
83.77 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.