Unlocking Insights in Thyroid Care: Applying Natural Language Processing to Electronic Health Records for Improved Patient Stratification and Clinical Decision Support
Abstract
Background: The clinical management of thyroid disorders, ranging from autoimmune conditions to cancer, generates vast amounts of unstructured data within Electronic Health Records (EHRs). This data, embedded in clinical notes, radiology reports, and pathology summaries, contains rich phenotypic details that are largely inaccessible to traditional analytical methods, creating a significant information gap for research and precision care. Aim: This narrative review aims to synthesize the current landscape, methodologies, challenges, and future directions of applying Natural Language Processing (NLP) to mine EHRs for thyroidology. It evaluates how NLP can transform unstructured text into structured data to enhance patient stratification, support clinical decisions, and advance epidemiological research. Methods: A comprehensive literature search was conducted across PubMed, IEEE Xplore, and ACL Anthology for studies published between 2010 and 2024, using keywords related to NLP, EHRs, and thyroid disorders. Relevant studies were selected and thematically analyzed. Results: The review identifies key NLP architectures—from rule-based systems to deep learning models—successfully applied to extract thyroid-specific concepts, automate TI-RADS scoring, predict outcomes, and identify adverse events. However, significant challenges persist, including data heterogeneity, clinical nuance, and ethical concerns regarding bias and generalizability. Conclusion: NLP is a powerful, transformative tool for thyroid care, poised to unlock latent insights from EHRs. Realizing its full potential requires interdisciplinary collaboration, robust validation, and the development of standardized, ethically aware frameworks to integrate these technologies into clinical workflows and research infrastructures.
Full text article
References
Al-Dhahri, S. F., Mubasher, M., Al-Muhawas, F., Alessa, M., Terkawi, R. S., & Terkawi, A. S. (2014). Early prediction of oral calcium and vitamin D requirements in post-thyroidectomy hypocalcaemia. Otolaryngology--Head and Neck Surgery, 151(3), 407-414. https://doi.org/10.1177/0194599814536848
Aversano, L., Bernardi, M. L., Cimitile, M., Iammarino, M., Macchia, P. E., Nettore, I. C., & Verdone, C. (2021). Thyroid disease treatment prediction with machine learning approaches. Procedia Computer Science, 192, 1031-1040. https://doi.org/10.1016/j.procs.2021.08.106
Baclic, O., Tunis, M., Young, K., Doan, C., Swerdfeger, H., & Schonfeld, J. (2020). Challenges and opportunities for public health made possible by advances in natural language processing. Canada Communicable Disease Report, 46(6), 161. https://doi.org/10.14745/ccdr.v46i06a02
Brito, J. P., Gionfriddo, M. R., Al Nofal, A., Boehmer, K. R., Leppin, A. L., Reading, C., ... & Montori, V. M. (2014). The accuracy of thyroid nodule ultrasound to predict thyroid cancer: systematic review and meta-analysis. The Journal of Clinical Endocrinology & Metabolism, 99(4), 1253-1263. https://doi.org/10.1210/jc.2013-2928
Cary Jr, M. P., Zink, A., Wei, S., Olson, A., Yan, M., Senior, R., ... & Pencina, M. J. (2023). Mitigating racial and ethnic bias and advancing health equity in clinical algorithms: a scoping review: scoping review examines racial and ethnic bias in clinical algorithms. Health Affairs, 42(10), 1359-1368. https://doi.org/10.1377/hlthaff.2023.00553
Chaker, L., van den Berg, M. E., Niemeijer, M. N., Franco, O. H., Dehghan, A., Hofman, A., ... & Peeters, R. P. (2016). Thyroid function and sudden cardiac death: a prospective population-based cohort study. Circulation, 134(10), 713-722. https://doi.org/10.1161/CIRCULATIONAHA.115.020789
Cibas, E. S., & Ali, S. Z. (2017). The 2017 Bethesda system for reporting thyroid cytopathology. Thyroid, 27(11), 1341-1346. https://doi.org/10.1089/thy.2017.0500
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019, June). Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers) (pp. 4171-4186). Doi: 10.18653/v1/N19-1423
Haugen, B. R., Alexander, E. K., Bible, K. C., Doherty, G. M., Mandel, S. J., Nikiforov, Y. E., ... & Wartofsky, L. (2016). 2015 American Thyroid Association management guidelines for adult patients with thyroid nodules and differentiated thyroid cancer: the American Thyroid Association guidelines task force on thyroid nodules and differentiated thyroid cancer. Thyroid, 26(1), 1-133. https://doi.org/10.1089/thy.2015.0020
Huang, S. C., Pareek, A., Seyyedi, S., Banerjee, I., & Lungren, M. P. (2020). Fusion of medical imaging and electronic health records using deep learning: a systematic review and implementation guidelines. NPJ digital medicine, 3(1), 136. https://doi.org/10.1038/s41746-020-00341-z
Kung, T. H., Cheatham, M., Medenilla, A., Sillos, C., De Leon, L., Elepaño, C., ... & Tseng, V. (2023). Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLoS digital health, 2(2), e0000198. https://doi.org/10.1371/journal.pdig.0000198
Lamartina, L., Grani, G., Durante, C., Borget, I., Filetti, S., & Schlumberger, M. (2018). Follow-up of differentiated thyroid cancer–what should (and what should not) be done. Nature Reviews Endocrinology, 14(9), 538-551. https://doi.org/10.1038/s41574-018-0068-3
Lee, J., Yoon, W., Kim, S., Kim, D., Kim, S., So, C. H., & Kang, J. (2020). BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 36(4), 1234-1240. https://doi.org/10.1093/bioinformatics/btz682
Li, Y., Li, C., Li, X., Wang, K., Rahaman, M. M., Sun, C., ... & Wang, Q. (2022). A comprehensive review of Markov random field and conditional random field approaches in pathology image analysis. Archives of Computational Methods in Engineering, 29(1), 609-639. https://doi.org/10.1007/s11831-021-09591-w
Liu, Y., Che, W., Qin, B., & Liu, T. (2020). Exploring segment representations for neural semi-Markov conditional random fields. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 28, 813-824. https://doi.org/10.1109/TASLP.2020.2964960
Obermeyer, Z., Powers, B., Vogeli, C., & Mullainathan, S. (2019). Dissecting racial bias in an algorithm used to manage the health of populations. Science, 366(6464), 447-453. https://doi.org/10.1126/science.aax2342
Rieke, N., Hancox, J., Li, W., Milletari, F., Roth, H. R., Albarqouni, S., ... & Cardoso, M. J. (2020). The future of digital health with federated learning. NPJ digital medicine, 3(1), 119. https://doi.org/10.1038/s41746-020-00323-1
Savova, G. K., Masanz, J. J., Ogren, P. V., Zheng, J., Sohn, S., Kipper-Schuler, K. C., & Chute, C. G. (2010). Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. Journal of the American Medical Informatics Association, 17(5), 507-513. https://doi.org/10.1136/jamia.2009.001560
Shcherbak, S. G., Changalidi, A. I., Barbitoff, Y. A., Anisenkova, A. Y., Mosenko, S. V., Asaulenko, Z. P., ... & Glotov, O. S. (2022). Identification of genetic risk factors of severe COVID-19 using extensive phenotypic data: A proof-of-concept study in a cohort of Russian patients. Genes, 13(3), 534. https://doi.org/10.3390/genes13030534
Shi, M., Nong, D., Xin, M., & Lin, L. (2022). Accuracy of ultrasound diagnosis of benign and malignant thyroid nodules: a systematic review and meta‐analysis. International Journal of Clinical Practice, 2022(1), 5056082. https://doi.org/10.1155/2022/5056082
Shin, D., Kam, H. J., Jeon, M. S., & Kim, H. Y. (2021). Automatic classification of thyroid findings using static and contextualized ensemble natural language processing systems: development study. JMIR Medical Informatics, 9(9), e30223. https://doi.org/10.2196/30223
Sutton, R. T., Pincock, D., Baumgart, D. C., Sadowski, D. C., Fedorak, R. N., & Kroeker, K. I. (2020). An overview of clinical decision support systems: benefits, risks, and strategies for success. NPJ digital medicine, 3(1), 17. https://doi.org/10.1038/s41746-020-0221-y
Taylor, P. N., Albrecht, D., Scholz, A., Gutierrez-Buey, G., Lazarus, J. H., Dayan, C. M., & Okosieme, O. E. (2018). Global epidemiology of hyperthyroidism and hypothyroidism. Nature Reviews Endocrinology, 14(5), 301-316. http://dx.doi.org/10.1038/nrendo.2018.18
Tessler, F. N., Middleton, W. D., Grant, E. G., Hoang, J. K., Berland, L. L., Teefey, S. A., ... & Stavros, A. T. (2017). ACR thyroid imaging, reporting and data system (TI-RADS): white paper of the ACR TI-RADS committee. Journal of the American college of radiology, 14(5), 587-595. https://doi.org/10.1016/j.jacr.2017.01.046
Uzuner, Ö., South, B. R., Shen, S., & DuVall, S. L. (2011). 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text. Journal of the American Medical Informatics Association, 18(5), 552-556. https://doi.org/10.1136/amiajnl-2011-000203
Wang, Y., Wang, L., Rastegar-Mojarad, M., Moon, S., Shen, F., Afzal, N., ... & Liu, H. (2018). Clinical information extraction applications: a literature review. Journal of biomedical informatics, 77, 34-49. https://doi.org/10.1016/j.jbi.2017.11.011
Wang, Y., Zhao, Y., Therneau, T. M., Atkinson, E. J., Tafti, A. P., Zhang, N., ... & Liu, H. (2020). Unsupervised machine learning for the discovery of latent disease clusters and patient subgroups using electronic health records. Journal of biomedical informatics, 102, 103364. https://doi.org/10.1016/j.jbi.2019.103364
Wong, C. M., Kezlarian, B. E., & Lin, O. (2023). Current status of machine learning in thyroid cytopathology. Journal of pathology informatics, 14, 100309. https://doi.org/10.1016/j.jpi.2023.100309
Xue, Y., Zhou, Y., Wang, T., Chen, H., Wu, L., Ling, H., ... & Wang, B. (2022). Accuracy of ultrasound diagnosis of thyroid nodules based on artificial intelligence‐assisted diagnostic technology: a systematic review and meta‐analysis. International Journal of Endocrinology, 2022(1), 9492056. https://doi.org/10.1155/2022/9492056
Zhang, J., Li, J., Zhu, Y., Fu, Y., & Chen, L. (2023). Thyroidkeeper: a healthcare management system for patients with thyroid diseases. Health Information Science and Systems, 11(1), 49. https://doi.org/10.1007/s13755-023-00251-w
Zhao, Z., Yang, C., Wang, Q., Zhang, H., Shi, L., & Zhang, Z. (2021). A deep learning‐based method for detecting and classifying the ultrasound images of suspicious thyroid nodules. Medical Physics, 48(12), 7959-7970. https://doi.org/10.1002/mp.15319
Authors
Copyright (c) 2024 Shama Rshaid Bin Zwayed, Rowaf Zghir A Alrowaili, Mohammed Abdullah Ail Al Nosyan, Ali Mohammed Kleibi, Dalia Nawash Alanzi, Nawal Ibrahim Yaqub Qadah, Hanan Abraham Al Gezani, Albandary Awadh Almutairi, Majed Mislat Eid Albaqami, Zahra Ahmed Bosilly, Mofarhe Yahya Mashike, Ghaliah Musallam Alhawit

This work is licensed under a Creative Commons Attribution 4.0 International License.
