IKAR: An Interdisciplinary Knowledge-Based Automatic Retrieval Method from Chinese Electronic Medical Record
Abstract
:1. Introduction
- Suitable ultrasound reports in obstetrics and gynecology are difficult to obtain. Training the model requires a large number of labeled datasets. Available Chinese datasets for public access are not found on the internet.
- The majority of ultrasound reports are unstructured text. Despite the fact that a considerable quantity of relevant medical data is saved, the lack of a standard structural framework and the existence of many flaws, such as improper grammar, spelling errors, and semantic ambiguity, make data processing and analysis more difficult.
- Traditional diagnostic approaches depend primarily on the healthcare professional’s judgment, which might be subjective at times. Two doctors may make different diagnoses based on their expertise and experience if there is no gold standard or predetermined level of agreement on diagnostic criteria [16].
- As the diagnosis is an inferred result based on the doctor’s knowledge and experience, words may appear in the ultrasound diagnosis that do not appear in the ultrasound descriptions. Three cases of ultrasound reports are shown in Table 1: (1) diagnostic results can be extracted directly within the report; (2) diagnostic results are not in the report and are obtained by inference; (3) part of the diagnostic results is in the report and the other part is not. The left column shows the ultrasound descriptions summarized by the sonographer from the ultrasound images, and the right column shows the diagnostic result. The texts in red are the words used in ultrasonic descriptions and ultrasound diagnosis that are basically the same. The text in blue is the ultrasound diagnosis as summarized by the corresponding ultrasonic descriptions that does not use the original words in the ultrasonic descriptions.
- We constructed and published a fully open dataset containing 180,000 Chinese obstetric and gynecologic ultrasound reports. Our dataset is available in GitHub https://github.com/rachel12308/Chinese-OB-GYN-report-dataset (accessed on 6 June 2022).
- We proposed an interdisciplinary knowledge-based automatic retrieval method (IKAR) for obstetric and gynecological ultrasound in which the ultrasound diagnosis can be generated automatically from ultrasonic descriptions. The model was applied on the hospital dataset for the experimental verification of its effectiveness and efficiency. As a result, it was proved that the model could achieve an accuracy, recall and F-score of around 90%.
- We have carried out a detailed analysis of the dataset and proposed several targeted approaches to address the challenges encountered in the Chinese diagnostic task. Both of these methods are better at reducing errors and significantly improving inference performance.
2. Related Work
3. Dataset Construction
3.1. Data Collection
3.2. Preprocessing
- Step 1. Dealing with typos in texts.
- Step 2. Dealing with redundant texts in the report. Ultrasound is only an ancillary item to help the doctor make a diagnosis. As a result, there are many suggestive phrases in the ultrasound report, such as “re-evaluation after 2–3 weeks” or “please correlate with clinical finding”. These statements were not useful to the clinical support system in our task, so 56 similar suggestive statements were eliminated.
3.3. Named Entity Recognition
- First, ultrasound reports contain a large number of medical terms. Because the word frequency of specialized vocabulary is much lower than that of common vocabulary, the NER may make mistakes. For example, it is possible to split the sentence “宫腔线清 (The endometrium is clearly visible)” into “宫腔” and “线清”, but the correct result is “宫腔线” and “清”.
- Second, ultrasound reports are relatively similar in content and often use repeated words. Only 3763 words are used in the ultrasound descriptions of this dataset, while only 498 words are used in the ultrasound findings.
3.4. Data Analysis
4. IKAR Method
4.1. Task
4.2. Implementation of the System
4.2.1. Word Embedding
4.2.2. Sequence-to-Sequence Model
4.2.3. Relation Extraction
4.3. Extrinsic Evaluation Framework
4.3.1. Synonyms Handling
Algorithm 1: Pattern-Matching Algorithm. |
4.3.2. Probabilistic Accuracy
4.3.3. Evaluation Metrics
Algorithm 2: Probabilistic accuracy. |
5. Experiments
5.1. The Basic Sequence-to-Sequence Models
- seq2seq+RNN is the earliest seq2seq model [47]. This model sets one RNN as the encoder and one RNN as the decoder, which can score some sequences. This model can also generate target sequences based on source sequences.
- seq2seq+LSTM [48] replaces the RNN module of the above model with an LSTM module. Complete sentences are used for training instead of just phrases.
- seq2seq+copyRnn [49] first applied the encoder–decoder structure to the keyword generation problem. By adding a replication mechanism to the RNN, it helps the model to predict those words with a low number of occurrences in the original text.
- seq2seq+Reinforcement Learning [50] introduces reinforcement learning to the keyword generation task for the first time. The model is set up with an adaptive reward function. The function first uses the recall value as a reward to ensure that a sufficient number of keywords is generated.
5.2. Pattern-Matching Algorithm
5.3. Individual Examination Items
5.4. Limitations
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
RF | Random Forest |
EMR | Electronic Medical Record |
SVM | Support Vector Machine |
LR | Logistic Regression |
CNN | Convolutional Neural Networks |
UIMA | Unstructured Information Management Architecture |
UMLS | Unified Medical Language System |
NER | Named Entity Recognition |
References
- Wang, Y.; Wang, L.; Rastegar-Mojarad, M.; Moon, S.; Shen, F.; Afzal, N.; Liu, S.; Zeng, Y.; Mehrabi, S.; Sohn, S.; et al. Clinical information extraction applications: A literature review. J. Biomed. Inform. 2018, 77, 34–49. [Google Scholar] [CrossRef] [PubMed]
- Chen, C.H.; Hsieh, J.G.; Cheng, S.L.; Lin, Y.L.; Lin, P.H.; Jeng, J.H. Emergency department disposition prediction using a deep neural network with integrated clinical narratives and structured data. Int. J. Med. Inform. 2020, 139, 104146. [Google Scholar] [CrossRef] [PubMed]
- Arnaud, É.; Elbattah, M.; Gignon, M.; Dequen, G. Deep learning to predict hospitalization at triage: Integration of structured data and unstructured text. In Proceedings of the 2020 IEEE International Conference on Big Data (Big Data), Atlanta, GA, USA, 10–13 December 2020; pp. 4836–4841. [Google Scholar]
- Carchiolo, V.; Longheu, A.; Reitano, G.; Zagarella, L. Medical prescription classification: A NLP-based approach. In Proceedings of the 2019 Federated Conference on Computer Science and Information Systems (FedCSIS), Leipzig, Germany, 1–4 September 2019; pp. 605–609. [Google Scholar]
- Roch, A.M.; Mehrabi, S.; Krishnan, A.; Schmidt, H.E.; Kesterson, J.; Beesley, C.; Dexter, P.R.; Palakal, M.; Schmidt, C.M. Automated pancreatic cyst screening using natural language processing: A new tool in the early detection of pancreatic cancer. HPB 2015, 17, 447–453. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Sances, G.; Larizza, C.; Gabetta, M.; Bucalo, M.; Guaschino, E.; Milani, G.; Cereda, C.; Bellazzi, R. Application of bioinformatics in headache: The I2B2-pavia project. J. Headache Pain 2010, 11, S134–S135. [Google Scholar]
- Li, X.; Wang, H.; He, H.; Du, J.; Chen, J.; Wu, J. Intelligent diagnosis with Chinese electronic medical records based on convolutional neural networks. BMC Bioinform. 2019, 20, 62. [Google Scholar] [CrossRef] [Green Version]
- Cai, T.; Giannopoulos, A.A.; Yu, S.; Kelil, T.; Ripley, B.; Kumamaru, K.K.; Rybicki, F.J.; Mitsouras, D. Natural language processing technologies in radiology research and clinical applications. Radiographics 2016, 36, 176–191. [Google Scholar] [CrossRef] [Green Version]
- Liu, H.; Xu, Y.; Zhang, Z.; Wang, N.; Huang, Y.; Hu, Y.; Yang, Z.; Jiang, R.; Chen, H. A natural language processing pipeline of chinese free-text radiology reports for liver cancer diagnosis. IEEE Access 2020, 8, 159110–159119. [Google Scholar] [CrossRef]
- Castro, S.M.; Tseytlin, E.; Medvedeva, O.; Mitchell, K.; Visweswaran, S.; Bekhuis, T.; Jacobson, R.S. Automated annotation and classification of BI-RADS assessment from radiology reports. J. Biomed. Inform. 2017, 69, 177–187. [Google Scholar] [CrossRef]
- Lakhani, P.; Kim, W.; Langlotz, C.P. Automated detection of critical results in radiology reports. J. Digit. Imaging 2012, 25, 30–36. [Google Scholar] [CrossRef] [Green Version]
- Yetisgen-Yildiz, M.; Gunn, M.L.; Xia, F.; Payne, T.H. A text processing pipeline to extract recommendations from radiology reports. J. Biomed. Inform. 2013, 46, 354–362. [Google Scholar] [CrossRef] [Green Version]
- Dutta, S.; Long, W.J.; Brown, D.F.; Reisner, A.T. Automated detection using natural language processing of radiologists recommendations for additional imaging of incidental findings. Ann. Emerg. Med. 2013, 62, 162–169. [Google Scholar] [CrossRef] [PubMed]
- Peng, F.; Feng, F.; McCallum, A. Chinese segmentation and new word detection using conditional random fields. In Proceedings of the COLING 2004: 20th International Conference on Computational Linguistics, Geneva, Switzerland, 23–27 August 2004; pp. 562–568. [Google Scholar]
- Zheng, X.; Chen, H.; Xu, T. Deep learning for Chinese word segmentation and POS tagging. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, WA, USA, 18–21 October 2013; pp. 647–657. [Google Scholar]
- Schiff, G.D.; Hasan, O.; Kim, S.; Abrams, R.; Cosby, K.; Lambert, B.L.; Elstein, A.S.; Hasler, S.; Kabongo, M.L.; Krosnjar, N.; et al. Diagnostic error in medicine: Analysis of 583 physician-reported errors. Arch. Intern. Med. 2009, 169, 1881–1887. [Google Scholar] [CrossRef] [PubMed]
- Savova, G.K.; Fan, J.; Ye, Z.; Murphy, S.P.; Zheng, J.; Chute, C.G.; Kullo, I.J. Discovering peripheral arterial disease cases from radiology notes using natural language processing. In Proceedings of the AMIA Annual Symposium Proceedings. American Medical Informatics Association, Washington, DC, USA, 13–17 November 2010; Volume 2010, p. 722. [Google Scholar]
- Tian, Z.; Sun, S.; Eguale, T.; Rochefort, C.M. Automated extraction of VTE events from narrative radiology reports in electronic health records: A validation study. Med. Care 2017, 55, e73. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Hinz, E.R.M.; Bastarache, L.; Denny, J.C. A natural language processing algorithm to define a venous thromboembolism phenotype. In Proceedings of the AMIA Annual Symposium Proceedings. American Medical Informatics Association, Washington, DC, USA, 16–20 November 2013; Volume 2013, p. 975. [Google Scholar]
- Afzal, N.; Sohn, S.; Abram, S.; Liu, H.; Kullo, I.J.; Arruda-Olson, A.M. Identifying peripheral arterial disease cases using natural language processing of clinical notes. In Proceedings of the 2016 IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI), Las Vegas, NV, USA, 24–27 February 2016; pp. 126–131. [Google Scholar]
- Kim, Y.; Garvin, J.; Heavirland, J.; Meystre, S.M. Improving heart failure information extraction by domain adaptation. In MEDINFO 2013; IOS Press: Amsterdam, The Netherlands, 2013; pp. 185–189. [Google Scholar]
- Chen, M.C.; Ball, R.L.; Yang, L.; Moradzadeh, N.; Chapman, B.E.; Larson, D.B.; Langlotz, C.P.; Amrhein, T.J.; Lungren, M.P. Deep learning to classify radiology free-text reports. Radiology 2018, 286, 845–852. [Google Scholar] [CrossRef] [Green Version]
- Fu, S.; Leung, L.Y.; Wang, Y.; Raulli, A.O.; Kallmes, D.F.; Kinsman, K.A.; Nelson, K.B.; Clark, M.S.; Luetmer, P.H.; Kingsbury, P.R.; et al. Natural language processing for the identification of silent brain infarcts from neuroimaging reports. JMIR Med. Inform. 2019, 7, e12109. [Google Scholar] [CrossRef]
- Zhou, X.; Wang, Y.; Sohn, S.; Therneau, T.M.; Liu, H.; Knopman, D.S. Automatic extraction and assessment of lifestyle exposures for Alzheimer’s disease using natural language processing. Int. J. Med. Inform. 2019, 130, 103943. [Google Scholar] [CrossRef]
- Ludvigsson, J.F.; Pathak, J.; Murphy, S.; Durski, M.; Kirsch, P.S.; Chute, C.G.; Ryu, E.; Murray, J.A. Use of computerized algorithm to identify individuals in need of testing for celiac disease. J. Am. Med. Inform. Assoc. 2013, 20, e306–e310. [Google Scholar] [CrossRef] [Green Version]
- Kehl, K.L.; Elmarakeby, H.; Nishino, M.; Van Allen, E.M.; Lepisto, E.M.; Hassett, M.J.; Johnson, B.E.; Schrag, D. Assessment of deep natural language processing in ascertaining oncologic outcomes from radiology reports. JAMA Oncol. 2019, 5, 1421–1429. [Google Scholar] [CrossRef]
- Drozdov, I.; Forbes, D.; Szubert, B.; Hall, M.; Carlin, C.; Lowe, D.J. Supervised and unsupervised language modelling in Chest X-Ray radiological reports. PLoS ONE 2020, 15, e0229963. [Google Scholar] [CrossRef] [Green Version]
- Wood, D.A.; Lynch, J.; Kafiabadi, S.; Guilhem, E.; Al Busaidi, A.; Montvila, A.; Varsavsky, T.; Siddiqui, J.; Gadapa, N.; Townend, M.; et al. Automated Labelling using an Attention model for Radiology reports of MRI scans (ALARM). In Proceedings of the Medical Imaging with Deep Learning, PMLR, Lima, Peru, 4 October 2020; pp. 811–826. [Google Scholar]
- Bressem, K.K.; Adams, L.C.; Gaudin, R.A.; Tröltzsch, D.; Hamm, B.; Makowski, M.R.; Schüle, C.Y.; Vahldiek, J.L.; Niehues, S.M. Highly accurate classification of chest radiographic reports using a deep learning natural language model pre-trained on 3.8 million text reports. Bioinformatics 2020, 36, 5255–5261. [Google Scholar] [CrossRef] [PubMed]
- Smit, A.; Jain, S.; Rajpurkar, P.; Pareek, A.; Ng, A.Y.; Lungren, M.P. CheXbert: Combining automatic labelers and expert annotations for accurate radiology report labeling using BERT. arXiv 2020, arXiv:2004.09167. [Google Scholar]
- Bozkurt, S.; Alkim, E.; Banerjee, I.; Rubin, D.L. Automated detection of measurements and their descriptors in radiology reports using a hybrid natural language processing algorithm. J. Digit. Imaging 2019, 32, 544–553. [Google Scholar] [CrossRef]
- Warner, J.L.; Levy, M.A.; Neuss, M.N.; Warner, J.L.; Levy, M.A.; Neuss, M.N. ReCAP: Feasibility and accuracy of extracting cancer stage information from narrative electronic health record data. J. Oncol. Pract. 2016, 12, 157–158. [Google Scholar] [CrossRef]
- Mehrabi, S.; Krishnan, A.; Roch, A.M.; Schmidt, H.; Li, D.; Kesterson, J.; Beesley, C.; Dexter, P.; Schmidt, M.; Palakal, M.; et al. Identification of patients with family history of pancreatic cancer-Investigation of an NLP System Portability. Stud. Health Technol. Inform. 2015, 216, 604. [Google Scholar] [PubMed]
- Farrugia, H.; Marr, G.; Giles, G. Implementing a natural langugage processing solution to capture cancer stage and recurrence. In Proceedings of the European Congress of Radiology-RANZCR-AOCR 2012, Sydney, Australia, 30 August–2 September 2012. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. arXiv 2017, arXiv:1706.03762. [Google Scholar]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
- Johnson, A.E.; Pollard, T.J.; Greenbaum, N.R.; Lungren, M.P.; Deng, C.y.; Peng, Y.; Lu, Z.; Mark, R.G.; Berkowitz, S.J.; Horng, S. MIMIC-CXR-JPG, a large publicly available database of labeled chest radiographs. arXiv 2019, arXiv:1901.07042. [Google Scholar]
- Huang, Z.; Xu, W.; Yu, K. Bidirectional LSTM-CRF models for sequence tagging. arXiv 2015, arXiv:1508.01991. [Google Scholar]
- Cao, P.; Chen, Y.; Liu, K.; Zhao, J.; Liu, S. Adversarial transfer learning for Chinese named entity recognition with self-attention mechanism. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018; pp. 182–192. [Google Scholar]
- Dai, X.; Karimi, S.; Hachey, B.; Paris, C. Using similarity measures to select pretraining data for NER. arXiv 2019, arXiv:1904.00585. [Google Scholar]
- Xie, H. Ultrasonographic Diagnosis in Obsterics and Gynecology; People’s Medical Publishing House: Beijing, China, 2005. [Google Scholar]
- Bahdanau, D.; Cho, K.; Bengio, Y. Neural machine translation by jointly learning to align and translate. arXiv 2014, arXiv:1409.0473. [Google Scholar]
- Takase, S.; Kiyono, S. Lessons on parameter sharing across layers in transformers. arXiv 2021, arXiv:2104.06022. [Google Scholar]
- Vaage, A.B.; Tingvold, L.; Hauff, E.; Van Ta, T.; Wentzel-Larsen, T.; Clench-Aas, J.; Thomsen, P.H. Better mental health in children of Vietnamese refugees compared with their Norwegian peers-a matter of cultural difference? Child Adolesc. Psychiatry Ment. Health 2009, 3, 1–9. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Takase, S.; Kiyono, S. Rethinking perturbations in encoder-decoders for fast training. arXiv 2021, arXiv:2104.01853. [Google Scholar]
- Wang, Y.; Mehrabi, S.; Sohn, S.; Atkinson, E.J.; Amin, S.; Liu, H. Natural language processing of radiology reports for identification of skeletal site-specific fractures. BMC Med. Inform. Decis. Mak. 2019, 19, 73. [Google Scholar] [CrossRef] [PubMed]
- Cho, K.; Van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv 2014, arXiv:1406.1078. [Google Scholar]
- Sutskever, I.; Vinyals, O.; Le, Q.V. Sequence to sequence learning with neural networks. Advances in Neural Information Processing Systems. 2014. Available online: https://proceedings.neurips.cc/paper/2014/file/a14ac55a4f27472c5d894ec1c3c743d2-Paper.pdf (accessed on 15 February 2020).
- Meng, R.; Zhao, S.; Han, S.; He, D.; Brusilovsky, P.; Chi, Y. Deep keyphrase generation. arXiv 2017, arXiv:1704.06879. [Google Scholar]
- Chan, H.P.; Chen, W.; Wang, L.; King, I. Neural keyphrase generation via reinforcement learning with adaptive rewards. arXiv 2019, arXiv:1906.04106. [Google Scholar]
- Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. Pytorch: An imperative style, high-performance deep learning library. Advances in Neural Information Processing Systems. 2019. Available online: https://proceedings.neurips.cc/paper/2019/file/bdbca288fee7f92f2bfa9f7012727740-Paper.pdf (accessed on 15 February 2020).
- Sun, J. Jieba (Chinese for “to Stutter”) Chinese Text Segmentation: Built to Be the Best Python Chinese Word Segmentation Module. Available online: https://github.com/fxsjy/jieba (accessed on 15 February 2020).
- Arimura, H. Image-Based Computer-Assisted Radiation Therapy; Springer: Berlin/Heidelberg, Germany, 2017. [Google Scholar]
- Gupta, K.K.; Dhanda, N.; Kumar, U. A comparative study of medical image segmentation techniques for brain tumor detection. In Proceedings of the 2018 4th International Conference on Computing Communication and Automation (ICCCA), Greater Noida, India, 14–15 December 2018; pp. 1–4. [Google Scholar]
Ultrasonic Descriptions | Ultrasound Diagnosis |
---|---|
子宫前位, 正常大, 宫腔线清, 内膜厚1.2 cm, 子宫肌层回声均匀。 双卵巢正常大。 CDFI: 未见异常血流信号。 (Uterus anteverted with normal size. The endometrium is clearly visible. No abnormality seen in the myometrial echo. Endometrial thickness is 1.2 cm.The size of left and right ovary is normal. CDFI: No abnormal blood flow signal was observed.) | 子宫正常大 (The size of the uterus is normal.) |
子宫前位, 正常大, 宫腔线清, 内膜厚0.5 cm, 宫壁回声不均匀。 子宫前壁见4.0*4.1 cm低回声结节。 双卵巢正常大, CDFI: 未见异常血流信号。 (Uterus anteverted with normal size. The endometrium is clearly visible. Endometrial thickness is 0.5 cm. Abnormality seen in the myometrial echo. A 4.0∗4.1 cm hypoechoic nodule is seen in the anterior uterus wall. The size of left and right ovary is normal. CDFI: No abnormal blood flow signal was observed.) | 子宫肌瘤 (Fibroid) |
宫内单胎。 胎位: 头位; 胎心: 145次/分。 AFI: 10 cm; BPD: 5.2 cm; HC: 19.0 cm; AC: 17.2 cm; FL: 3.7 cm。 胎盘: 前壁0级。 胎儿描述: 头颅: 颅骨光环完整, 其内结构未见明显异常。 眼: 可见。 上唇: 皮肤回声连续, 未见明显异常。 四腔心: 可见。 胃泡: 显示。 膀胱: 显示。 双肾: 大小正常。 胎儿其他情况: 胎儿颈部可见“U”形压迹。 孕妇情况: 双卵巢未显示, 双附件区未及明显包块。 Single viable fetus. VERTEX presentation at the time of scan. Fetal heart rate is about 145 B/m. AFI: 10 cm; BPD: 5.2 cm; HC: 19.0 cm; AC: 17.2 cm; FL: 3.7 cm. Placenta on anteverted aspect. Grade 0. Fetal description: Head: The cranial halo is intact and its structure shows no obvious abnormalities. Eyes: Visible. Upper lip: continuous skin echogenicity, no significant abnormalities. Four-chamber view of normal fetal heart: Visible. Magenblase: Visible. Urinary bladder: Visible. Kidney: Normal size. Other conditions of the fetus: “U”-shaped pressure traces can be seen on the fetal neck. The situation of pregnant women: The size of left and right ovary is normal. No obvious masses in the bilateral annex area.) | 宫内单活胎, 胎儿脐带绕颈一周, 请结合临床。 (Single viable fetus. Fetal umbilical cord wrapped around the neck once. Please correlate with clinical finding.) |
Application Areas | Methods | No. of Papers |
---|---|---|
Circulatory system diseases | cTAKES + Self-designed NLP Algorithm [17] Rule-based algorithm [18] KnowledgeMap Concept Identifier (KMCI)[19] MedTagger + Self-designed NLP Algorithm [20] CUIMANDREef + Self-designed Algorithm [21] CNN VS traditional NLP models [22] | 7 |
Nervous system diseases | RF/SVM/LR/CNN + Rule-based Algorithm [23] MetaMap+Build Dictionary [24] | 2 |
Digestive system diseases | Unstructured Information Management Architecture (UIMA) + Rule-based Algorithm [5] Self-designed NLP Algorithm [25] | 2 |
Tumor | Neural Network model [26] | 1 |
Other | 13 Neural Network models [27] Transformer-based model [28] BERT [29] CheXbert [30] | 4 |
Type | Percentage |
---|---|
Words formed by two Chinese characters | 76.70% |
Words formed by three Chinese characters | 16.50% |
Words formed by more than three Chinese characters | 6.80% |
#Number | #Tokens_DESCRIPTION | #Tokens_RESULT | |
---|---|---|---|
Train | 19,900 | 767,026 | 81,304 |
Dev | 4950 | 195,843 | 20,342 |
Test | 4950 | 196,976 | 20,949 |
Total | 29,800 | 1,159,845 | 122,595 |
Definition of Patterns | Result |
---|---|
胎儿+位置+动词+压迹形状 (Fetus + position + verb + shape of pressure marks) | 胎儿脐带绕+位置+一周/两周/三周 (Fetal umbilical cord wrapped around the + position + one/two/three times) |
AFI + number (number range from 0 to 7.9) | 羊水 偏少 (Oligohydramnios) |
AFI + number (number greater than 18) | 羊水 偏多 (Hydramnion) |
Accuracy | Recall | F-Score | |
---|---|---|---|
LSTM | 58.14% | 86.82% | 69.64% |
RNN | 86.20% | 88.89% | 87.52% |
copyRnn | 84.97% | 91.10% | 87.93% |
Reinforcement Learning | 86.21% | 89.63% | 88.39% |
Transformer | 87.42% | 90.75% | 89.05% |
Accuracy | Recall | F-Score | |
---|---|---|---|
LSTM | 63.00% | 85.71% | 72.62% |
RNN | 87.03% | 88.04% | 87.53% |
copyRnn | 87.32% | 90.10% | 88.69% |
Reinforcement Learning | 88.23% | 89.22% | 88.72% |
Transformer | 89.38% | 91.09% | 90.23% |
Name | Brief Description | |
---|---|---|
1 | The condition of the uterus | Describe the size, presence, shape and number of the uterus, etc. |
2 | The condition of the annex area | Describe the presence or absence of abnormal masses in the annex areas. |
3 | The condition of the ovaries | Describe the presence or absence of polycystic ovary syndrome. |
4 | The condition of the cervix | Describe the presence of lesions or abnormalities in the cervix. |
5 | The condition of the vagina | Describe the presence of lesions or abnormalities in the vagina. |
6 | Whether the patient is pregnant | Describe whether the patient is pregnant, confirm fetal viability and check for the number of fetuses. |
7 | The condition of the fetus | Describe the basic condition of the fetal head, eyes, heart, etc. |
8 | Fibroid | Describe whether the patient has fibroid and their number. |
9 | Adenomyosis | Describe whether the patient has adenomyosis. |
10 | Other abnormal conditions | Describe other abnormalities in the patient’s report. |
Name of Each Part | IKAR | |||
---|---|---|---|---|
Accuracy | Recall | F-Score | ||
1 | The condition of the uterus | 95.56% | 93.99% | 94.77% |
2 | The condition of the annex area | 79.37% | 81.00% | 80.18% |
3 | The condition of the ovaries | 76.23% | 87.72% | 81.97% |
4 | The condition of the cervi | 83.17% | 95.45% | 88.89% |
5 | The condition of the vagina | 99.48% | 98.46% | 98.96% |
6 | Whether the patient is pregnant | 95.81% | 96.88% | 96.34% |
7 | The condition of the fetus | 80.54% | 62.16% | 70.17% |
8 | Hysteromyoma | 86.87% | 93.8% | 90.2% |
9 | Adenomyosis | 90.48% | 90.19% | 90.34% |
10 | Other abnormal conditions | 79.19% | 90.55% | 84.49% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhao, Y.; Hu, L.; Chi, L. IKAR: An Interdisciplinary Knowledge-Based Automatic Retrieval Method from Chinese Electronic Medical Record. Information 2023, 14, 49. https://doi.org/10.3390/info14010049
Zhao Y, Hu L, Chi L. IKAR: An Interdisciplinary Knowledge-Based Automatic Retrieval Method from Chinese Electronic Medical Record. Information. 2023; 14(1):49. https://doi.org/10.3390/info14010049
Chicago/Turabian StyleZhao, Yueming, Liang Hu, and Ling Chi. 2023. "IKAR: An Interdisciplinary Knowledge-Based Automatic Retrieval Method from Chinese Electronic Medical Record" Information 14, no. 1: 49. https://doi.org/10.3390/info14010049
APA StyleZhao, Y., Hu, L., & Chi, L. (2023). IKAR: An Interdisciplinary Knowledge-Based Automatic Retrieval Method from Chinese Electronic Medical Record. Information, 14(1), 49. https://doi.org/10.3390/info14010049