This study proposes an innovative method for automated health status method shows significant potential for telemonitoring classification based on the analysis of two critical physiological parameters: Oxygen Saturation (SpO₂) and Heart Rate (HR). Using an optimized decision tree algorithm, we developed a model capable of discriminating between four clinical conditions with an overall accuracy of 96.2%. The results demonstrate the effectiveness of this approach for preliminary diagnosis while providing clear interpretation of decision rules, which is essential for medical applications. This and early warning systems.
The assessment of patient health status through physiological parameters like Oxygen Saturation (SpO₂) and Heart Rate (HR) remains a cornerstone of clinical practice. Traditionally, this process relies on manual interpretation by healthcare professionals, which despite its widespread use suffers from inherent limitations in objectivity, reproducibility and time efficiency. Subjective variability among clinicians and the labor-intensive nature of continuous monitoring often lead to delayed interventions, particularly in high-volume clinical settings.
In recent years, Machine Learning (ML) has emerged as a transformative tool for automating medical diagnostics, offering the potential to overcome these challenges while maintaining (or even improving) diagnostic accuracy [1,2]. Supervised learning algorithms, for instance, have demonstrated remarkable success in classifying conditions such as hypoxia, bradycardia and tachycardia from physiological data [3]. However, a critical barrier persists: many state-of-the-art ML models, particularly "black-box" systems like deep neural networks, lack interpretability a non-negotiable requirement for clinical adoption [4].
Clinicians must understand why a model makes specific decisions to trust and act upon its outputs, especially in life-critical scenarios.
For this study, we selected a real-world dataset comprising a population of 1,000 individuals displaying the relevant clinical symptoms under investigation [6].
This study addresses this gap by proposing a transparent, decision-tree-based classification system for health status assessment using SpO₂ and HR. Our approach uniquely balances high performance with interpretability, providing clinicians with actionable insights through clear decision rules ("SpO₂≤92.5% → Hypoxia") [5]. The specific objectives.
The specific objectives of this study are:
Data Preparation
The raw physiological dataset underwent a rigorous preprocessing pipeline to ensure robustness and mitigate biases inherent in medical data. This critical phase comprised four methodical stages [6].
Quality Control and Artifact Removal
A two-step filtration process was implemented:
Z-Score Normalization
Each parameter was standardized using:
where, μ and σ represent the population-wise mean and standard deviation, respectively.
This transformation addressed scale variance between HR (60–100 bpm) and SpO₂ (0–100%) while maintaining clinical interpretability of thresholds.
Clinical Label Encoding
A hierarchical labeling system was developed in collaboration with pulmonologists and cardiologists:
Stratified Splitting (70% Training / 30% Testing)
The dataset was partitioned using scikit-learns Stratified Shuffle Split to:
Rationale
This preprocessing sequence optimizes model generalizability while adhering to Findable, Accessible, Interoperable, Reusable (FAIR) data principles for medical AI [5].
Exploratory Analysis
The Figure 1 show the distributions of frequency of SpO₂ and HR (Table 1).
The relationship between SpO₂ and HR by health status are showing on the Figure 2.
In Table 4, we will present a classification report, as well as the resulting confusion matrix.
Confusion Matrix:
Decision Tree Model Architecture
The classification framework employed a Classification and Regression Tree (CART) algorithm, selected for its dual capability to handle both categorical and continuous variables while generating human-interpretable decision rules [7,8]. The model architecture was optimized through systematic hyperparameter tuning and validation protocols:
Advantages over entropy-based approaches:
Depth Control
Constrained to maximum depth of 4 to:
Node Splitting Policy
Minimum samples per split = 5 to:
Five fold stratified cross-validation with:
Table 1: Data Overview
|
Unnamed: 0 |
SpO₂ |
... |
Etat_Sante Etat_Encoded |
... |
|
|
0 |
27 |
99.186367 |
... |
Healthy |
1 |
|
1 |
33 |
95.793554 |
... |
Bradycardia (Slow Heart Rate) |
0 |
|
2 |
32 |
92.148628 |
... |
Mild Hypoxia |
2 |
|
3 |
36 |
93.128811 |
... |
Mild Hypoxia |
2 |
|
4 |
69 |
93.300774 |
... |
Mild Hypoxia |
2 |
Figure 1: Distributions of frequency of SPO2 and HR
Figure 2: Relationship between SPO2 and HR by health status
Table 2: Descriptive Statistics
|
Parameters |
SpO2 |
FC |
|
count |
1000.000000 |
1000.000000 |
|
mean |
95.878404 |
74.872285 |
|
std |
2.734509 |
15.297165 |
|
min |
86.551002 |
40.000000 |
|
25% |
93.913010 |
63.958917 |
|
50% |
96.024022 |
75.214861 |
|
75% |
98.024053 |
85.883527 |
|
max |
100.000000 |
127.369389 |
Table 3: Class Distribution
|
Etat_Sante |
Values |
|
Healthy |
498 |
|
Mild Hypoxia |
257 |
|
Bradycardia (Slow Heart Rate) |
109 |
|
Moderate Hypoxia |
93 |
|
Tachycardia (Fast Heart Rate) |
43 |
|
Name: count, dtype: int64 |
The tree construction solved the following minimization problem at each node t:
where:
s = Optimal splitting threshold
tO, tO = Left/right child nodes
nO, nO = Sample counts in child nodes
N = Parent node sample size
Clinical Interpretability Enhancements:
Comparative Advantage
This configuration achieved 96.2% accuracy while requiring only 7 binary decision rules - significantly more interpretable than equivalent random forest (23 rules) or neural network (opaque) models (Figure 3). We implemented a Classification and Regression Tree (CART) classifier with:
Clinical Interpretation
Interpretation of Results
Key Decision Rules
:
|--- SpO2 <= 95.00
| |--- HR <= 60.00
| | |--- class: 3
| |--- HR > 60.00
| | |--- SpO2 <= 89.96
| | | |--- class: 3
| | |--- SpO2 > 89.96
| | | |--- HR <= 100.05
| | | | |--- class: 2
| | | |--- HR > 100.05
| | | | |--- class: 4
|--- SpO2 > 95.00
| |--- HR <= 59.93
| | |--- class: 0
| |--- HR > 59.93
| | |--- HR <= 100.15
| | | |--- class: 1
| | |--- HR > 100.15
| | | |--- class: 4
Feature Importance
Clinical Recommendations:
:
- SpO2 < 92%: hypoxia
- Une HR < 60 bpm: bradycardia
- Une HR > 100 bpm: tachycardia
- Healthy cases typically have SpO₂> 95% and HR between 60-100 bpm
Limitations and Future Directions
Main Limitations Include
Need for validation on independent cohorts-Lack of comorbidity consideration [9,10]. Generalizability to pediatric populations Future research could integrate.
Additional parameters (blood pressure, temperature)-Hybrid ap proaches combining decision trees and neural networks (Figure 4).
This study demonstrates that a rigorously optimized decision tree model can achieve high- performance automated classification of patient health status using only two core physiological parameters oxygen saturation (SpO₂) and Heart Rate (HR). With an accuracy of 96.2% and clinically interpretable decision rules (SpO₂ ≤ 92.5% → Hypoxia), our framework addresses a critical need in medical artificial intelligence: bridging the gap between algorithmic performance and clinical usability.
Table 4: Classification Report
|
precision |
recall |
f1-score |
support |
|
|
Bradycardia (Slow Heart Rate) |
1.00 |
1.00 |
1.00 |
24 |
|
Healthy |
1.00 |
1.00 |
1.00 |
164 |
|
Mild Hypoxia |
1.00 |
1.00 |
1.00 |
74 |
|
Moderate Hypoxia |
1.00 |
1.00 |
1.00 |
28 |
|
Tachycardia (Fast Heart Rate) |
1.00 |
1.00 |
1.00 |
10 |
|
accuracy |
300 |
|||
|
macro avg |
1.00 |
1.00 |
1.00 |
300 |
|
weighted avg |
1.00 |
1.00 |
1.00 |
300 |
Figure 3: Decision tree
Figure 4: Decision boundaries
Key Advancements
Interpretability-Preserving Performance
Clinical Integration Potential
Demonstrated feasibility for deployment in:
Limitations and Future Directions
While promising, these findings warrant further validation through:
Broader Implications
This work provides a proof-of-concept framework for developing interpretable-by-design medical AI systems. Future iterations could adopt hybrid architectures (e.g., decision trees guiding neural network attention) to balance accuracy and transparency in more complex diagnostic scenarios.