Causal Pathway Extraction from Web-Board Documents
Abstract
:1. Introduction
- EDU1.
- “เมื่อผู้ป่วยขาดอินซูลิน/When a patient lacks insulin,”เมื่อ/When (ผู้ป่วย/a patient)/NP1 ((ขาด/lacks)/Verbstrong (อินซูลิน/insulin)/NP2)/VP
- EDU2.
- “ทำให้ ร่างกายไม่สามารถนำน้ำตาลไปใช้เป็นพลังงานให้ส่วนต่างๆ ของร่างกายได้/causing the body to be unable to use sugar as energy in various parts of the body”.ทำให้/causing ((ร่างกาย/the body)/NP1(ไม่สามารถ/is unable) (นำ/to take)/Verbstrong (น้ำตาล/sugarไปใช้เป็นพลังงาน/to use as energyให้ส่วนต่างๆของร่างกายได้/in various part of body)/NP2)/VP
- EDU3.
- “ทำให้ [ผู้ป่วย] มีระดับน้ำตาลในเลือดสูง/Causing [the patient] to have hyperglycaemia”.ทำให้/causing ([ผู้ป่วย/the patient])/NP1((มี/has)/Verbweak (ระดับน้ำตาลในเลือด สูง/hyperglycaemia)/NP2)/VP
- EDU4.
- “และส่งผลให้ [ผู้ป่วย]เป็นโรคเบาหวาน/And causing [the patient] to be diabetes”. ………และส่งผลให้/and causing ([ผู้ป่วย/the patient])/NP1((เป็น/gets)/Verbweak (โรคเบาหวาน/diabetes)/NP2)/VP
2. Related Works
3. Problems of Causal-Pathway Extraction
3.1. How to Determine CErel on an EDU-Concept Pair/a wrdCoc Pair
- EDU1.
- “ผู้ป่วยเบาหวานอาจเป็นโรคหัวใจ/A diabetic patient might get heart disease”.(ผู้ป่วยเบาหวาน/A diabetic patient)/NP1((อาจเป็น/might get)/Verbweak(โรคหัวใจ/the heart disease)/NP2)/VPwrdCocEDU1 = getHeartDisease(person)
- EDU2.
- “เนื่องจาก[ผู้ป่วย]มีภาวะน้ำตาลในเลือดสูง/Since [the patient] has hyperglycaemia,”เนื่องจาก/Since ([ผู้ป่วย/the patient])/NP1((มี/has)/Verbweak ภาวะน้ำตาลในเลือด/hyperglycaemia)/VPwrdCocEDU2 = haveHyperGlycaemia(person)
- EDU3.
- ทำให้สารเคมีบางชนิดเพิ่มสูงขึ้นในเลือด/causing some chemicals increase in the blood. ” …ทำให้/causing (สารเคมีบางชนิด/some chemicals)/NP1((เพิ่มสูงขึ้น/increase)/Verbstrong ใน/in เลือด/the blood.)/VPwrdCocEDU3 = increase(chemical,blood)
3.2. How to Extract the Causal Pathways
- EDU1.
- “เมื่อผู้ป่วยขาดอินซูลิน/When a patient lacks insulin,”wrdCocEDU1 = lack(person,insulin)
- EDU2.
- “อินซูลินมีหน้าที่ส่งสัญญาณให้เซลน้าน้าตาลไปใช/Insulin has a function of signaling cells to take sugar for use”.(อินซูลิน/Insulin)/NP1((มี/has)/Verbweak หน้าที่/a function of ส่งสัญญาณให้เซล/signaling cellsนำน้าตาล/to take sugar ไปใช้/for use)/VPwrdCocEDU3 = hasFunction(insulin,signaling)
- EDU3.
- “ทำให้ ร่างกายไม่สามารถนำน้ำตาลไปใช้เป็นพลังงานให้ส่วนต่างๆ ของร่างกายได้/causing the body to be unable to use sugar as energy in various parts of the body”.wrdCocEDU1 = beUnableToUseSugar(person)
- EDU4.
- “ทำให้ [ผู้ป่วย] มีระดับน้ำตาลในเลือดสูง/Causing [the patient] to have hyperglycaemia”.wrdCocEDU1 = haveHyperglycaemia(person)
- EDU5.
- “และส่งผลให้ [ผู้ป่วย] เป็นโรคเบาหวาน/And causing [the patient] to be diabetes”. ………wrdCocEDU1 = getDiabetes(person)
3.3. How to Indicate Implicit Mediators for Explicit Mediator Representation
- EDU1.
- “ผู้ป่วยเป็นเบาหวานมานานหลายปี/A Patient gets a diabetic disease for several years”.)(ผู้ป่วย/A patient)/NP1((เป็น/get)/Verbweakโรคเบาหวาน/a diabetesมานานหลายปี/for several years)/VPwrdCocEDU1 = getDiabetes(person)
- EDU2.
- “เนื่องจาก[ผู้ป่วย]มีระดับน้ำตาลในเลือดสูงอยู่เป็นระยะเวลานาน/Since [the patient] have hyperglycaemiafor a long period of time, ”เนื่องจาก/Since ([ผู้ป่วย/the patient])/NP1(( มี/has)/Verbweakระดับน้ำตาลในเลือดสูง/hyperglycaemiaอยู่เป็นระยะเวลานาน/for a long period of time)/NP2)/VPwrdCocEDU2 = haveHyperglycaemia(person,long-time)
- EDU3.
- “ทำให้หลอดเลือดทั่วร่างกายจะแข็ง และหนา/causing blood vessels of whole body to be stiff and thick”.ทำให้/causing (หลอดเลือดทั่วร่างกาย/blood vessels of whole body)/NP1(จะ/will(แข็ง/be stiff)/Verbstrongและ/and(หนา/thick)/Verbstrong)/VPwrdCocEDU3 = beStiff&Thick(bloodVessel)
- EDU4.
- “ทำให้เลือดไปเลี้ยงได้น้อยในส่วนต่าง ๆ ของร่างกาย/Causing the blood supply less to the parts of the body”.ทำให้/causing ( เลือด/blood)/NP1((ไปเลี้ยง/supplies)/Verbstrong น้อย/lessในส่วนต่างๆของร่างกาย/to the parts of the body)/VPwrdCocEDU4 = beSupplied(blood,less)
- EDU5.
- “ส่งผลให้ [ผู้ป่วย]เป็นโรคไต/Causing [the patient] gets a chronic kidney disease”. ….ส่งผลให้/causing ([ผู้ป่วย/the patient])/NP1(เป็น/gets)/Verbweak (โรคไต/kidney disease)/NP2)/VPwrdCocEDU5 = getKidneyDisease(person)
- EDU1.
- “ถ้าระดับน้ำตาลในเลือดสูงเกิดขึ้นเป็นระยะเวลานาน/If hyperglycaemia occurs for a long-term,”ถ้า/If (ระดับน้ำตาลในเลือดสูง/Hyperglycaemia)/NP1(( เกิดขึ้น/occurs)/Verbstrongเป็นระยะเวลานาน/for a long-term)/VPwrdCocEDU1 = occur(hyperglycaemia,long-term)
- EDU2.
- “ผนังหลอดเลือดจะอักเสบ/the vascular wall will become inflamed”.(ผนังหลอดเลือด/The vascular wall)/NP1(จะ/will (อักเสบ/become inflamed)/Verbstrong)/VPwrdCocEDU2 = becomeInflamed(bloodVesselWall)
- EDU3.
- “ทำให้หลอดเลือดแข็งและตีบ/Causing the arteries to be stiff and narrow”.ทำให้/causing (หลอดเลือด/the arteries)/NP1((แข็ง/be stiff)/Verbstrongและ/and(ตีบ/be narrow)/Verbstrong)/VPwrdCocEDU3 = beStiff&Narrow(bloodVessel)
- EDU4.
- “ดังนั้นหลอดเลือดเล็ก ๆ เช่น หลอดเลือดไตมักจะได้รับผลกระทบก่อน/Then, small blood vessels such as renal arteries are often affected first”.ดังนั้น/Then (หลอดเลือดเล็กๆ/small blood vesselsเช่น หลอดเลือดไต/such as renal arteries)/NP1(มักจะ/often (ได้รับ/gets)Verbweak ผลกระทบก่อน/affected first)/VPwrdCocEDU4 = getAffect(bloodVessel)
- EDU5.
- “ทำให้ผู้ป่วยเกิดภาวะไตวาย/Causing the patient to have kidney failure”. …ทำให้/causing (ผู้ป่วย/the patient)/NP1(( เกิด/have)/Verbweakภาวะไตวาย/kidneyfailure)/VPwrdCocEDU5 = haveKidneyFailure(person)
4. System Overview
4.1. Corpus Preparation
4.1.1. Word and EDU Segmentations
4.1.2. Semi-Automatic Corpus Annotation
4.2. CErel Learning on Each wrdCoc Pair
- (a)
- NB [18]. The NB learning results of each disease group by this step based on using Weka (http://www.cs.wakato.ac.nz/ml/weak/ accessed on 10 August 2021) are the probabilities of CErel and nonCErel of CwrdCoc features and EwrdCoc features in wrdCoc pairs as shown in Table 2. Where CwrdCoc ∈ CWC which is a causative-wrdCo-concept set; EwrdCoc ∈ EWC which is an effect-wrdCo-concept set; and CWC∩EWC≠∅.
- (b)
- SVM [19]. The SVM learning is a linear binary classification applied to classify the CErel and nonCErel of each wrdCoc pairs from the annotated corpus by using Weka. This linear function, f(x), of the input x = (x1, x2, …, xn) assigned to the Cerel class if f(x) > 0, and otherwise to the nonCErel class, is as Equation (3).
- (c)
- LR [20]. The logistic regression model of the research is based on the linear logistic regression with binary vector data. The distinguishing feature of the logistic regression model is that the variable is binary or dichotomous. Usually, the input data with any value from negative to positive infinity would be used to establish which attributions are influential in predicting the given outcome with values between 0 and 1, and hence is interpretable as a probability. The logistic function can be written as:The learning results by NB, SVM, and LR models are the estimators which are used for determining wrdCoc pairs having CErel from the test corpus of each disease group in the next step of Section 4.3. Moreover, all precisions of learning by NB, SVM, and LR from the learning corpus of each disease group are greater than 0.8.
4.3. Determination of wrdCoc Pairs Having CErel
- (a)
- NB. Regarding Equation (5) and the CErel and nonCErel probabilities of CwrdCoc and EwrdCoc (Table 2), the CwrdCoc EwrdCoc pairs as the wrdCoc pairs having Cause-EffectRelationClass as CErel is determined from the self-Cartesian product (WC × WC) result and then collected into WCP of each disease group on which CwrdCoc and EwrdCoc are independent.
- (b)
- SVM. The bias, b, and the weight vector, w, of the CWC elements and the EWC elements in the wrdCoc pairs from the SVM learning (Section 4.2 (b)) are used to determine and collect the CwrdCoc EwrdCoc pairs as the wrdCoc pairs having the CErel class into WCP of each disease group from the self-Cartesian product (WC × WC) result with Equation (3).
- (c)
- LR the research applies Equation (4) along with Equation (6) to determine the CErel class between the CWC elements and the EWC elements in the wrdCoc pairs from both the positive/CErel class determination and the negative/nonCErel class determination by using the estimators from the LR learning (Section 4.2 (c)).CErelClass = Max(F(x)CErelClass, F(x)nonCErelClass)According to (6), x1 and x2 as CwrdCoc and x2 as EwrdCoc are the attribute variable pair of each wrdCoc pair from the test corpus of each disease group where ß0, ß1, and ß2 of CwrdCoc and EwrdCoc are obtained by the supervised learning with LR on the learning corpus of each disease group. The wrdCoc pair (or the CwrdCoc and EwrdCoc pair) with the CErel class is determined and collected into WCP of each disease group from the self-Cartesian product (WC × WC) result.
4.4. Causal Pathway Extraction
Algorithm 1 Causal Pathway Extraction | |
CAUSAL_PATHWAY_EXTRACTION /* (Extraction of several CEpairi sequences as causal pathways.) /* Assume that each EDU is represented by (NP1 VP). /* L is a list of EDUs from one test-corpus document after stemming words and the stop word removal. /* CEpairi is a wrdCoc pair with index i of the causal pathway. /* wcc[ ] is an array of wrdCoc and is collected from this test corpus. /* WCP is a set of wrdCoc pairs having CErel. 1: ct = 1;j = 1;ct = 1; a = 0; string wcc[]; 2: ArrayList<string> []allPathways = new ArrayList[a]; /* array of arrayList. 3: while j≤ Length[L] do 4: {1 wcexpj = getWrdCo(EDUj); /* Get wrdCo expression of EDUj from the test corpus. 5: If wcexpj.v∈Vstrong ∪ Vinf then /* wcexpj.v is a predicate verb va on a wrdCo expression with index j. 6: { wcc[ct] = getWrdCoConcept; ct++}; /* getWrdCoConcept by the wrdCo-expression matching between wrdCo expressions of the test-corpus document and the wrdCo expressions with the wrdCoc features on wrdCo-Concept Table (Table 1). 7: j++ }1; 8: count = ct; ct = 1; i = 1; flagce = 0; flagec = 0; fl = 0; 9: while ct ≤ count-1 do 10: {1 while (wcc[ct] + wcc[ct+1]∈ WCP) ·∧· (ct ≤ count-1) do /* a causal pathway extraction by wrdCoc-pair matching wcc[ct]+wcc[ct+1]) among wcpk (where wcpk∈WCP). 11: {2 CEpairi = wcc[ct]+ wcc[ct+1]; /* CErel occurs on Text as EDUCauseEDUEffect. 12: If flagce = 0 then {a++; flagce = 1}; 13: allPathways[a].AddNewCause¬EffectPair(CEpairi); i++;ct++ }2; 14: If flagce = 1 then { flagce = 0; fl = 1; i = 1}; 15: while (wcc[ct+1] + wcc[ct] ∈ wcpk) ∧ (ct ≤ count-1) do /* another causal pathway extraction by wrdCoc-pair matching (wcc[ct+1]+wcc[ct])among wcpk (where wcpk∈WCP). 16: {3 CEpairi = wcc[ct+1] + wcc[ct]; /* CErel occurs on Text as EDUEffectEDUCause. /* wcc[ct+1] is Cause and wcc[ct] is Effect. 17: If flagec = 0 then {a++; flagec = 1}; 18: allPathways[a].AddNewCause¬EffectConceptPair(CEpairi); 19: i++; ct++}3; 20: If flagec = 1 then { flagec = 0; fl = 1; I = 1}; 21: If fl = 0 then ct++; 22: else fl = 0; }1; 23: }Return allPathways /* Return causal pathways. | |
4.5. Implicit-Mediator Indication and Representation with Explicit-Mediators
- (1)
- Determine TransCEPair from all correct extracted causal pathways (Pathways[α]; α = 1, 2, …, a).
- (2)
- Use TransCEPair to indicate CEpairi having the implicit mediator as the causal transitivity on Pathways[α]; if Pathways[α] contains CEpairi ∉ TransCEPair(where i = 1, 2, …, numOfCauseEffectConceptPairs), Pathways[α] is the explicit- mediator causal pathway; and is added to ExplicitPath[α].
- (3)
- If Pathways[α].CEpairi ∈TransCEPair, we mark ‘*’on CEpairi having causal transitivity or the implicit mediator, add *CEpairi and subsequent CEpairi+1, CEpairi+2, …, CEpairnumOfCauseEffectConceptPairs of Pathways[α] to ImplicitjPath[α], and add CEpair1,CEpair2…CEpairi−1 of Pathways[α] to ExplicitPath[α].
- (4)
- Replace *CEpairi with the explicit mediator(s) of CEpairidd from ExplicitPath as shown in Algorithm 2.
Algorithm 2 Explicit Causal Pathway Representation | |
EXPLICIT_CAUSAL_PATHWAY_REPRESENTATION (ArrayList<string> []Pathways; a) /* Assume Pathways is allPathways (by Algorithm 1) with eliminating the duplicate causal pathways. /* trsvSet is TransCEPair which is a set of CEpairs with CErel to be transitive. /* ExplicitPath is a dynamic ExplicitCEpairWithCErelPathways template. 1: trsv = 0; check1 = 0; trsvSet←∅; 2: ArrayList<string> []ExplicitPath = new ArrayList[a]; ArrayList<string> []ImplicitPath = new ArrayList[a]; ArrayList<String> fill = new ArrayList<>(); 3: num1 = a /* where a is the number of Pathways elements(the Pathways array size). 4: For (α = 1 to num1; α++) /*determine trsvSet from Transitive Closure 5: {trsvSet←trsvSet ∪ Pathways[α].transitiveClosureDetermination }; 6: For (α = 1 to num1; α++) 7: {1 i = 1; /*collect Explicit-Mediator Causal Pathways to ExplicitPath. 8: while (i ≤ Pathways[α].numberOfCauseEffectConceptPairs) ∧ (Pathways[α].Get(CEpairi)∉TrsvSet) do /*add explicitCEpairi to ExplicitPath. 9: { ExplicitPath[α].Add(Pathways[α].Get(CEpairi)); i++ } 10: while (i ≤ Pathways[α].numberOfCauseEffectConceptPairs) do 11: {If (Pathways[α].Get(CEpairi)∈TrsvSet) then /* Identify and mark CEpairi having implicit Mediator as causalTransitivity with ‘*’; and then add ‘*’CEpairi&all subsequent CEpairi,...to ImplicitPath. 12: ImplicitPath[α].Add(‘*’+Pathways[α].Get(CEpairi)) else ImplicitPath[α].Add(Pathways[α].Get(CEpairi)); i++ } }1 /* replace‘*’CEpairi on ImplicitPath with explicit mediator from ExplicitPath. 13: while check1 = 0 do 14: {1For (α =1 to num1; α++) 15: {2If ImplicitPath[α]isNotEmpty then 16: {3idi = 1; id = 1; check2 = 0; fill.clear() 17: while idi ≤ImplicitPath[α].numberOfCEpairs do /* find‘*’CEpairidion ImplicitPath. 18: {4 If ImplicitPath[α].Get(CEpairidi).contain(’*’) = true then 19: {5 ImplicitPath[α].Get(CEpairidi).replace(‘*’,””); 20: C = CEpairidi.wccidi; E = CEpairidi.wccidi+1; /* get C/cause & E/effect. b = 1; idd = 1; f1 = 0; f2 = 0; f3 = 0; 21: while b ≤ num1 ∧ f1 = 0 do /*find an explicit mediator: C+ mediator(s)(CEpairidi...)+E from ExplicitPath. 22: {6 while idd ≤ExplicitPath[b].numberOfCauseEffectConceptPairs ∧f3=0 do 23: {7 If C = ExplicitPath[b].Get(CEpairidd).wccidd then f2 = 1 24: elseIf(E=ExplicitPath[b].Get(CEpairidd).wccidd+1)∧f2 = 1 then f3=1; 25: If f2 = 1∨f3 = 1 then {fill[id]=ExplicitPath[b].Get(CEpairidd); id++}; 26: idd++ }7; /* fill contains C+ mediator(s)(CEpairidi...)+E. 27: If f3 = 0 then {idd = 1; f2 = 0; fill.clear()} /* no E in fill. 28: else {f1 = 1; ExplicitPath[α].addAll(fill); check2 = 1}; /* add fill to ExplicitPath. Id = 1; b++ }6 }5 29: elseIf5 ImplicitPath[α].Get(CEpairidi).contain(’*’) = false ∧ check2 = 1 then {ExplicitPath[α].add(ImplicitPath[α].Get(CEpairidi)}5 idi++ }4 30: If check2 = 1 then {ImplicitPath[α].clear(); check2 = 0 } }3 }2 31: For (α = 1 to num1; α++ ) /* check ImplicitPath being empty. 32: {If ImplicitPath[α] isNotEmpty then check2 = 1 }; 33: If check2 = 0 then check1 = 1; 34: }1; ExplicitPath.sortRowOfArrayOfArrayList; ExplicitPath.removeDuplicateRow; 35: }Return ExplicitPath | |
- (5)
- Check the ExplicitPath result from the ExplicitCausalPathwayRepresentation algorithm does not contain the implicit-mediator(s)/the causal transitivity by comparing trsvSet (line no. 5 of Algorithm 2) to TransCEPair determined from the ExplicitPath result on line no. 35 of Algorithm 2. If the TransCEPair determination from ExplicitPath is the same as trsvSet, ExplicitPath contains the explicit- mediator causal pathways without the causal transitivity or the implicit mediator, otherwise the ExplicitCausalPathwayRepresentation algorithm is re-executed after copying ExplicitPath to Pathways of Algorithm 2 and then setting ExplicitPath to empty.
5. Evaluation and Discussion
5.1. Determination of wrdCoc Pairs Having CErel
5.2. Causal Pathway Extraction
- EDU1.
- “โรคไตเสื่อมเกิดจากการเป็นเบาหวานมาเป็น/Kidney disease is caused by being diabetic for a long time”.(โรคไตเสื่อม/Kidney disease)/NP1( เกิดจาก/is caused by (การเป็นเบาหวานมาเป็นเวลานาน/being diabetic for a long time)/NP2)/VP.
- EDU2.
- “ทำให้ผนังหลอดเลือดถูกทำลาย/Causing artery wall to be destroyed”.(ทำให้/Causing)/conj (ผนังหลอดเลือด/artery wall)/NP2(( ถูกทำลาย/is destroyed)/Verbstrong)/VP
- EDU3.
- “แล้วการทำหน้าที่กรองของไตจะเสื่อม/Then the filtration function of the kidneys will deteriorate”.แล้ว/Then(การทำหน้าที่กรองของไต/the filtration function of the kidneys)/NP1((จะเสื่อม/will deteriorate)/Verbstrong)/VP.
- EDU4.
- “ทำให้โปรตีนรั่วออกมาในปัสสาวะ/Causing protein to leak out in the urine”.(ทำให้/Causing)/conj (โปรตีน/protein)/NP1(( รั่วออกมา/leaks out)/Verbstrongใน/in(ปัสสาวะ/the urine)/NP2)/VP.
5.3. Implicit-Mediator Indication and Representation with Explicit-Mediators
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Khoo, C.; Na, J.C. Semantic relations in information science. Annu. Rev. Inf. Sci. Technol. 2006, 40, 157–228. [Google Scholar] [CrossRef] [Green Version]
- Staplin, N.; Herrington, W.G.; Judge, P.K.; Reith, C.A.; Haynes, R.; Landray, M.J.; Baigent, C.; Emberson, J. Use of causal diagrams to inform the design and interpretation of observational studies: An example from the study of heart and renal protection (SHARP). Clin. J. Am. Soc. Nephrol. 2017, 12, 546–552. [Google Scholar] [CrossRef] [PubMed]
- Gaskell, A.L.; Sleigh, J.W. An Introduction to causal diagrams for anesthesiology research. Anesthesiology 2020, 132, 951–967. [Google Scholar] [CrossRef] [PubMed]
- Carlson, L.; Marcu, D.; Okurowski, M.E. Building a discourse-tagged corpus in the framework of rhetorical structure theory. Curr. New Dir. Discourse Dialogue 2003, 22, 85–112. [Google Scholar]
- Girju, R. Automatic detection of causal relations for question answering. In Proceedings of the 41st annual meeting of the association for computational linguistics, workshop on multilingual summarization and question answering-Machine learning and beyond, Sapporo, Japan, 11–12 July 2003; pp. 76–83. [Google Scholar]
- Cao, M.; Sun, X.; Zhuge, H. The contribution of cause-effect link to representing the core of scientific paper-The role of Semantic Link Network. PLoS ONE 2018, 13, 0199303. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Chang, D.-S.; Choi, K.-S. Incremental cue phrase learning and bootstrapping method for causality extraction using cue phrase and word pair probabilities. Inf. Process. Manag. 2006, 42, 662–678. [Google Scholar] [CrossRef]
- Pechsiri, C.; Piriyakul, R. Explanation knowledge graph construction through causality extraction from texts. J. Comput. Sci. Technol. 2010, 25, 1055–1070. [Google Scholar] [CrossRef]
- Sawamaru, H.; Kobayashi, I. An Approach to Extraction of Causal Chain among Events in Multiple Documents. SCIS-ISIS. In Proceedings of the 6th International Conference on Soft Computing and Intelligent Systems, and the 13th International Symposium on Advanced Intelligence Systems, Kobe, Japan, 20–24 November 2012; IEEE: Piscataway, NJ, USA, 2012; pp. 1104–1108. [Google Scholar]
- Kang, D.; Gangal, V.; Lu, A.; Chen, Z.; Hovy, E. Detecting and explaining causes from text for a time series event. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, 9–11 September 2017; pp. 2758–2767. [Google Scholar]
- Izumi, K.; Sakaji, H. Economic Causal-Chain Search using Text Mining Technology. In Proceedings of the 1st Workshop on Financial Technology and Natural Language Processing, Macao, China, 12 August 2019; pp. 61–65. [Google Scholar]
- Nordon, G.; Koren, G.; Shalev, V.; Kimelfeld, B.; Shalit, U.; Radinsky, K. Building causal graphs from medical literature and electronic medical records. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; pp. 1102–1109. [Google Scholar]
- Takishita, S.; Rzepka, R.; Araki, K. Implicit Knowledge Completion Using Relevance Calculation of Distributed Word Representations. In Proceedings of the IJCAI Workshop on Bridging the Gap between Human and Automated Reasoning, Macao, China, 12 August 2019; pp. 60–64. [Google Scholar]
- Song, M.-K.; Lin, F.-C.; Ward, S.E.; Fine, J.P. Composite Variables. Nurs. Res. 2013, 62, 45–49. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Ayinde, B.O.; Inanc, T.; Zurada, J.M. Regularizing deep neural networks by enhancing diversity in feature extraction. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 2650–2661. [Google Scholar] [CrossRef] [PubMed]
- Leng, C.; Zhang, H.; Cai, G.; Cheng, I.; Basu, A. Graph regularized Lp smooth non-negative matrix factorization for data representation. IEEE/CAA J. Autom. Sin. 2019, 6, 584–595. [Google Scholar] [CrossRef]
- Weisstein, E.W. “Cartesian Product”. Available online: www.mathworld.wolfram.com (accessed on 5 September 2020).
- Mitchell, T.M. Machine Learning; The McGraw-Hill Co., Inc.; MIT Press: Singapore, 1997. [Google Scholar]
- Cristianini, N.; Shawe-Taylor, J. An Introduction to Support Vector Machines; Cambridge University Press: Cambridge, UK, 2000. [Google Scholar]
- Freedman, D.A. Statistical Models: Theory and Practice; Cambridge University Press: Cambridge, UK, 2009. [Google Scholar]
- Weisstein, E.W. “Transitive Closure”. from MathWorld—A Wolfram Web Resource. Available online: https://mathworld.wolfram.com/TransitiveClosure.html (accessed on 30 August 2021).
- Eve, J.; Kurki-Suonio, R. On computing the transitive closure of a relation. Acta Inform. 1977, 8, 303–314. [Google Scholar] [CrossRef]
- Sudprasert, S.; Kawtrakul, A. Thai word segmentation based on global and local unsupervised earning. In Proceedings of the NCSEC 2003, Chonburi, Thailand, 28–30 October 2003; pp. 1–8. [Google Scholar]
- Chanlekha, H.; Kawtrakul, A. Thai named entity extraction by incorporating maximum entropy model with simple heuristic information. In Proceedings of the IJCNLP 2004, Hainan Island, China, 22–24 March 2004; pp. 1–7. [Google Scholar]
- Tongtep, N.; Theeramunkong, T. Pattern-based Extraction of Named Entities in Thai News Documents. Thammasat Int. J. Sci. Technol. 2010, 15, 70–81. [Google Scholar]
- Chareonsuk, J.; Sukvakree, T.; Kawtrakul, A. Elementary discourse unit segmentation for Thai using discourse cue and syntactic information. In Proceedings of the NCSEC 2005, Bangkok, Thailand, 27–28 October 2005; pp. 85–90. [Google Scholar]
- Ketui, N.; Theeramunkong, T.; Onsuwan, C. Thai elementary discourse unit analysis and syntactic-based segmentation. Information 2013, 16, 7423–7436. [Google Scholar]
- Miller, G.A. WordNet: A lexical database. Commun. ACM 1995, 38, 39–41. [Google Scholar] [CrossRef]
- Adhikari, B.K.; Zuo, W.; Maharjan, R.; Han, X.; Liang, S. Detection of Sensitive Data to Counter Global Terrorism. Appl. Sci. 2020, 10, 182. [Google Scholar] [CrossRef] [Green Version]
wrdCo Expression Based on wrdCoPattern | wrdCoc Feature | ||
---|---|---|---|
V | W1 | W2 | |
สะสม/deposit <deposit> | ไขมัน/fat <fat> | ผนังหลอดเลือด/blood vessel wall <wall> | beDeposited(fat,bloodVessel) |
เกาะ/deposit <deposit> | ไขมัน/fat <fat> | เส้นเลือดแดง/artery <bloodVessel> | beDeposited(fat,bloodVessel) |
จับ/deposit <deposit> | การมีไขมัน/having fat <fat> | ผนังหลอดเลือด/blood vessel wall <wall> | beDeposited(fat,bloodVessel) |
มีไขมัน/have fat <haveFat> | หลอดเลือดแดง/artery <bloodVessel> | สะสม/deposit <deposit> | beDeposited(fat,bloodVessel) |
เกาะ/deposit <deposit> | ไขมัน/fat <fat> | ตะกรัน/plaque <bePlaque> | beDeposited(fat,bePlaque) |
คือตะกรันไขมัน/is fatty-plaque <bePlaque> | สิ่งอุดตันในหลอดเลือด/embolism <embolism> | null | bePlaque(embolism) |
ก่อตัว/form <form> | ตะกรันไขมัน/ fatty-plaque <plaque> | หนา/thick <thick> | Form(plaque,thick) |
ตีบแคบ/be narrow <beNarrow> | หลอดเลือด/blood vessel <bloodVessel> | null | beNarrow(bloodVessel) |
ตีบแคบ/be narrow <beNarrow> | หลอดเลือด/blood vessel <bloodVessel> | ลง/more <more> | beNarrow(bloodVessel) |
หล่อเลี้ยง/supply <supply> | เลือด/blood <blood> | กล้ามเนื้อหัวใจ/ myocardium < myocardium> | beSuppliedInsufficiently (blood, myocardium) |
ขาด/lack <lack | กล้ามเนื้อหัวใจ/ myocardium < myocardium> | เลือด/blood <blood> | beSuppliedInsufficiently (blood, myocardium) |
ไปเลี้ยง/supply <supply> | เลือด/blood <blood> | เนื้อเยื่อ/tissue <tissue> | beSuppliedInsufficiently (blood, tissue) |
ขาด/lack <lack | สมอง/brain <brain> | เลือด/blood <blood> | beSuppliedInsufficiently (blood, brain) |
เกิด/occur <occur> | การสร้างสารเคมี/chemical forming < chemicalForming> | null | Occur(chemicalForming) |
เกิด/occur <occur> | หลอดเลือด/blood vessel <bloodVessel> | การอักเสบ/inflammation < inflammation> | Occur (bloodVessel,inflammation) |
เกิดขึ้น/occur <occur> | การอักเสบ/inflammation < inflammation> | หลอดเลือด/blood vessel <bloodVessel> | Occur (bloodVessel,inflammation) |
แข็งตัว/be stiff <beStiff> | หลอดเลือด/blood vessel <bloodVessel> | null | beStiff(bloodVessel) |
อุดตัน/get clogged <beClogged> | หลอดเลือด/blood vessel <bloodVessel> | null | beClogged(bloodVessel) |
มีระดับน้ำตาลในเลือดสูง/have hyperglycaemia <haveHyperglycaemia> | ผู้ป่วย/patient <person> | null | haveHyperglycaemia(person) |
แตก/break <break> | หลอดเลือด/blood vessel <bloodVessel> | null | beBroken(bloodVessel) |
ถูกทำลาย/damage <beDamaged> | เนื้อเยื่อ/tissue <tissue> | null | beDamaged(tissue) |
………………….. | ……………….. | ……………… | ………………….. |
Disease Group | CwrdCoc | CErel | Noncerel | EwrdCoc | CErel | Noncerel |
---|---|---|---|---|---|---|
Diabetes & Kidney Disease Group | <beLost(protein,urine)> | 0.0698 | 0.0784 | <beLost(protein,urine)> | 0.0074 | 0.0066 |
<beFailure(kidney)> | 0.1581 | 0.0784 | <beFailure(kidney)> | 0.0355 | 0.0333 | |
<haveHyperglycaemia (person)> | 0.0452 | 0.0294 | <haveHyperglycaemia (person)> | 0.0411 | 0.0200 | |
<notTakeSugar(body)> | 0.0266 | 0.0392 | <notTakeSugar(body)> | 0.0112 | 0.0133 | |
<beNarrow(bloodVessel)> | 0.0185 | 0.0098 | <beNarrow(bloodVessel)> | 0.0374 | 0.0533 | |
…………… | … | … | …………….. | … | … | |
Heart & Artery Disease Group | <beDeposited(fat, bloodVessel)> | 0.0540 | 0.0288 | <beDeposited(fat, bloodVessel)> | 0.0011 | 0.0193 |
<beThick(bloodVessel)> | 0.0149 | 0.0288 | <beThick(bloodVessel)> | 0.0169 | 0.0038 | |
<beDamages(bloodVesselWall)> | 0.0218 | 0.0041 | <beDamages(bloodVesselWall)> | 0.0101 | 0.0038 | |
<becomeInflamed(bloodVessel)> | 0.1323 | 0.1028 | <becomeInflamed(bloodVessel)> | 0.0079 | 0.0115 | |
<beClogged(bloodVessel,organ)> | 0.0448 | 0.0288 | <beClogged(bloodVessel,organ)> | 0.0418 | 0.0424 | |
…………… | … | … | …………….. | … | … |
Disease Group (2000 EDUs/Group) | #of Differrent wrdCoc Features | Extraction of wrdCoc Pairs Having CErel | ||||||
---|---|---|---|---|---|---|---|---|
NB | SVM | LR | ||||||
Cause | Effect | Precision | Recall | Precision | Recall | Precision | Recall | |
Diabetes and kidney disease group | 40 | 92 | 0.844 | 0.777 | 0.893 | 0.803 | 0.877 | 0.795 |
Heart and artery disease group | 93 | 110 | 0.826 | 0.746 | 0.841 | 0.760 | 0.831 | 0.754 |
Disease Group (2000 EDUs/Group) | Causal Pathway Extraction | |
---|---|---|
Precision | Recall | |
Diabetes and Kidney Disease Group | 0.840 | 0.724 |
Heart and artery Disease Group | 0.828 | 0.706 |
Disease Group | Concise Representation | Comprehensible Representation | ||
---|---|---|---|---|
Average Score by Doc Representation | Average Score by Graph Representation | Average Score by Doc Representation | Average Score by Graph Representation | |
Diabetes and Kidney | 3 | 4.5 | 3 | 4.3 |
Heart and Artery | 2.2 | 4.3 | 2.7 | 4.2 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Pechsiri, C.; Piriyakul, R. Causal Pathway Extraction from Web-Board Documents. Appl. Sci. 2021, 11, 10342. https://doi.org/10.3390/app112110342
Pechsiri C, Piriyakul R. Causal Pathway Extraction from Web-Board Documents. Applied Sciences. 2021; 11(21):10342. https://doi.org/10.3390/app112110342
Chicago/Turabian StylePechsiri, Chaveevan, and Rapepun Piriyakul. 2021. "Causal Pathway Extraction from Web-Board Documents" Applied Sciences 11, no. 21: 10342. https://doi.org/10.3390/app112110342
APA StylePechsiri, C., & Piriyakul, R. (2021). Causal Pathway Extraction from Web-Board Documents. Applied Sciences, 11(21), 10342. https://doi.org/10.3390/app112110342