Machine Learning Techniques in Predicting Delayed Pneumothorax and Hemothorax Following Blunt Thoracic Trauma
Journal of Archives in Military Medicine: May 01, 2014, 2 (2); e18133
May 18, 2014
Article Type: Brief Report
February 16, 2014
April 6, 2014
April 6, 2014
A R, Bayati
S E, Sanei
B. Machine Learning Techniques in Predicting Delayed Pneumothorax and Hemothorax Following Blunt Thoracic Trauma,
J Arch Mil Med.
Delayed pneumothorax and hemothorax are among the possible fatal complications of blunt thoracic trauma.
Finding reliable criteria for timely diagnosis of high-risk patients has been an area of interest for researchers.
Material and methods:
We gathered a database including 616 patients among which, 17 patients experienced the delayed complications. Employing four classification techniques, namely, linear regression, logistics regression, artificial neural network, and naïve Bayesian classifier, we tried to find a predictive pattern to recognize patients with positive results based on recorded clinical and radiological variables at the time of admission.
First, without using machine learning techniques, we tried to predict the complications based only on a single variable. We recognized chest wall tenderness as the best single criterion that enables to classify all high-risk patients with 100% sensitivity (95% CI, 82-100). This criterion potentially excludes 57% (95% CI, 53-61) of low-risk patients from further observation. Then we used the machine learning techniques to assess the effect of all admission time variables together. According to our results, amongst the utilized techniques, logistics regression model enables not only to exclude 81% (95% CI, 77-84) of patients without complications from unnecessary observation, but also to recognize all patients with true positive results for pneumothorax and hemothorax (95% CI, 82-100).
Instead of serial chest X-ray, patients with blunt chest trauma could be initially evaluaed by a risk assessment model in order to avoid unnecessary work-up.
Copyright © 2014, AJA University of Medical Sciences. This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/) which permits copy and redistribute the material just in noncommercial usages, provided the original work is properly cited.
Delayed pneumothorax (DPTX) and hemothorax (DHTX) are among possible fatal complications of blunt thoracic trauma (BTT). Although the incidence rates of DPTX and DHTX following BTT are somehow low and have been reported as frequent as 7.4% for DPTX and 2% to 6% in DHTX, serious consideration is required due to the high risk of mortality (
1, 2). Current medical guidelines recommend the follow-up of seemingly high-risk patients with six-hour intervals serial chest X-Ray (CXR) ( 3). However, besides exposing the patients to excessive radiation and obtaining serial CXRs is not optimal and economical. In this respect, finding reliable criteria to classify high-risk patients for careful observation would be of great importance.
Rib fractures are recognized as an underlying factor for the delayed complications in different studies (
4- 7). Simon et al. found high prevalence of multiple or displaced rib fractures in patients with DHTX ( 8). Liman et al. discovered a correlation between number of fractured rib and DHTX occurrence ( 4). Sharma et al. emphasized on careful observation of these patients for well-timed diagnosis of DHTX ( 6, 7). However, to classify high-risk subjects accurately, considering the prevalence of rib fracture in patients with no delayed complication is also essential.
To exclude low-risk patients based on CXR findings, Rodriguez et al. investigated the diagnostic significance of the different clinical variables (
9). They exploited features like mechanism of injury, intoxication, chest tenderness on palpation, crepitus, etc to classify high-risk complications. Using screening tests and based on the CXR findings, they reported the combination of tenderness on palpation and hypoxia as the best measure excluding 46% of patients. Shekarchi et al. recorded clinical and CXR variables of 680 patients (under publication) to predict the delayed complications. To assess the combination of variables, they applied logistic regression classifier with 64.7% sensitivity and 93% specificity. In this study, we generalized their conclusions by emphasizing on classification methods like artificial neural networks (ANNs) in order to determine better screening methods. In addition, we considered the possibility of developing a single variable-based recognition method for the delayed complications. Furthermore, we introduced a new formula enabling better screening accuracy.
ANNs provide a risk assessment tool with the capability of application in diagnosis, prognosis, focusing on recalling the incidence of rarely occurring disease profiles, and analysis of different treatment choices (
5). Although there were only 20 published works concerning ANNs in medical practice until 1988, the method is regularly used in the medical field currently ( 10). However, the training process of ANNs necessitates different aspects, which may not be always available, leading to inconvenience ( 11). To overcome this problem, statistical tests are employed to evaluate the mapping confidence by dividing the data set to the training and validating subsets.
Table 1 . List of Classification Input Variables and Their Frequencies in Positive and Negative Classes
Input Variable True Positive (TP) Sensitivity (TP/17) (95% CI) False Positive (FP) Specificity (1-FP/599) (95% CI) Chest w all t enderness 17 100 (82-100) 257 57 (53-61) Chest p ain 16 94 (73-99) 434 27 (24-31) Chest w all c repitation 4 24 (10-42) 5 99 (98-100) Rib f racture 3 18 (6-41) 5 99 (98-100) Subcutaneous e mphysema 3 18 (6-41) 8 99 (97-99) Abdomino pelvic t rauma 3 18 (6-41) 23 96 (94-97) Chest w all E cchymosis 2 12 (3-34) 26 96 (94-97)
a Abbreviation: CI, confidence interval.
b Sensitivity and specificity of single variable classification are also calculated with 95% confidence interval.
In this study, we applied four classification techniques to find a predictive pattern for recognition of the high-risk patients using admission time recorded variables. The variables included radiological and clinical criteria mentioned in
We employed the dataset recorded by Shekarchi et al. from July 2009 to December 2010 in three hospitals. Only the patients who accepting to participate in the study along with meeting the inclusion criteria like no need for surgical interventions were included. Our analysis included 616 patients with BCT consisting of 422 (68%) males and 200 (32%) females who had 18 to 96 years of age (mean ± SD, 44.3 ± 20.0 years). The machine learning algorithms (explained in the Methods section) determined 17 subjects positive for delayed complications including nine cases with DHTX, seven with DPTX; moreover, it determined one case with delayed hemopneumothorax from 599 patients with negative results.
Table 1 displays the algorithm input variables as well as their frequencies in the high-risk and low-risk classesBesides, sensitivity and specificity of single variable recognition and the corresponding 95% confidence intervals are also displayed.
3. Materials and Methods
Classification methods provided a mapping from the input space (See
Table 1) to the categorical output space, i.e, positive and negative classes. Up to this point, we had employed four classification methods, namely, linear regression (LinReg), logistic regression (LogReg), ANNs, and naive Bayesian classifier (NBC) ( 12). The classification algorithms tried to learn the characteristics of the classes using the training data subset in a training phase. Then, the classification performance was tested on validation data subset to examine how the mapping could be generalized to new patterns. We trained an ANN with three and five neurons in the first and the hidden layers by minimizing classification error through the back-propagation algorithm. To train LinReg, LogReg, and NBC, we applied matrix pseudoinverse, iteratively reweighted least squares, and single variable histogram calculating algorithms, respectively.
To analyze the performance, we used the four well-known diagnostic test indices, namely, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). In addition, ratio indices, namely, positive likelihood ratio (PLR) and negative likelihood ratio (NLR) as screening criteria were reported. Confidence intervals of diagnostic test and ratio indices were calculated using Wilson score method (
13) and the method introduced by Simel et al., respectively ( 14).
For each classification technique, we repeated the training phase 100 times with randomly chosen two-thirds of the data as training subset. Then, the best classifier based on having the highest receiver operating characteristics (ROC) curve area was selected.
Table 2 reports the diagnostics results on all the data consisting of training and validation subsets; it provides an overall evaluation the delayed complications prediction in our subjects.
LogReg had the sensitivity of 100% (95% CI, 82-100) with the specificity of 81% (95% CI, 77-84) while the three other methods had high specificity of 97% (95% CI, 95-98) with much less sensitivity. LogReg led to the best NLR and NPV with high screening accuracy while LinReg, ANN, and NBC had comparable PLRs. Considering the risk of missing a high-risk patient, we were interested in recognizing all subjects with delayed complication; therefore, high sensitivity with reasonable specificity was important for our screening test. In fact, it would provide a tool to classify high-risk patients while removing many low-risk ones.
As stated before, the best screening accuracy was achieved by LogReg with sensitivity of 100% (95% CI, 82-100), specificity of 81% (95% CI, 77-84), PPV of 49% (95% CI, 33-64), and NPV of 100% (95% CI, 99-100). The model follows the below formula:
Where z is defined as follows:
z = 25.01Ch.Tend+ 3.89Ch.Pain+ 2.01Ch.Crep+ 2.68Rib.Frac+ 2.19Sub.Emph + 4.07Abp.Tra + 1.32Ch.Ecch-27.94
(The formula acronyms are defined in
Table 1). For classification of the output, y should be compared with 0.5 as a threshold. It should be noted that although ANN provided a more complex method to model the data, LogReg outperformed it in terms of screening accuracy. We interpreted it by the fact that more complex models need more data for correct estimation of the model coefficients. In addition, model complexity increases the chance of trapping in local minima in training phase. Thus, with this number of patterns, LogReg that could be considered as a single neuron, outperformed the multilayer ANNs.
Table 2. Diagnostics Accuracies and Corresponding Confidence Intervals Obtained by Four Classification Techniques
LinReg LogReg ANN NBC Sensitivity (95% CI) 65 (41-83) 100 (82-100) 71 (47-87) 65 (41-83) Specificity (95% CI) 97 (95-98) 81 (77-84) 97 (95-98) 97 (95-98) PPV (95% CI) 65 (41-83) 49 (33-64) 38 (23-55) 38 (21-53) NPV (95% CI) 99 (97-99) 100 (99-100) 99 (98-100) 99 (98-100) PLR (95% CI) 21 (12-38) 5 (4-6) 21 (12-36) 19 (11-34) NLR (95% CI) 0.36 (0.19-0.69) 0 0.3 (0.15-0.64) 0.37 (0.19-0.7) ROC area 94.9 95.6 96.1 95
aAbbreviations: ANN, artificial neural network; CI, confidence interval; LinReg, linear regression; LogReg, logistic regression; NBC, naive Bayesian classifier; NLR, negative likelihood ratio; NPV, negative predictive value; PPV, positive predictive value, ROC, receiver operating characteristics.
In this study, we investigated the possibility of BTT delayed complications prediction based on admission-time recorded clinical and radiological variables. We used a dataset consisting of 17 patients with delayed complications and 599 patients without them whom were recorded in three hospitals from July 2009 to December 2010. Four classification algorithms were employed to find a predictive pattern for recognizing high-risk patients. To evaluate the results, diagnostics test indices namely sensitivity, specificity, PPV, NPV, PLR, and NLR with corresponding 95% confidence intervals were calculated.
In agreement with Rodriguez et al. (
9), we recognized chest wall tenderness as the best single criterion enabling to classify all high-risk patients with sensitivity of 100% (95% CI, 82-100). This criterion potentially excluded 57% (95% CI, 53-61) of low-risk patients from further observation. In contrast with previous studies emphasizing on high sensitivity of the rib fracture ( 4, 6- 8), this factor could only recognize 18% (95% CI, 6-41) of subjects with delayed complications in the our dataset.
We concluded that using the aforementioned LogReg formula identified all high-risk subjects and potentially excluded 81% (95% CI 77-84) of low-risk patients from serial CXR in the studied dataset. However, it should be noted that this was the primary and initiative result that should be validated and evaluated in larger and more comprehensive datasets before being put in practice.