Incomplete Data Classification via Distribution Alignment With Evidence Combination
-
Graphical Abstract
-
Abstract
Handling missing attribute values is a common challenge in data classification, often leading to degraded classification accuracy due to the reliance of most data-driven models on complete datasets. In practice, missing value imputation techniques are typically employed to obtain complete data for model construction; however, the quality of imputed values significantly influences the performance of downstream classification tasks, necessitating their improvement to achieve higher accuracy. To address this issue, we propose a novel method, incomplete data classification based on distribution alignment with evidence combination (IDAC), which is designed for scenarios with complete training data but incomplete test data. In IDAC, the missing attribute values of test data are initially imputed using the mean value vectors of the complete training data, minimizing the first-order distribution differences. The second-order statistical distributions (i.e., covariance matrices) of the training data and the imputed test data are subsequently aligned to derive two feature transformation matrices, which are used to map the complete training data and the incomplete test data into new feature representations. For classification, two classifiers are trained using the original and transformed feature representations of the training data, and their outputs are combined using evidence theory, with optimal combination weights determined by minimizing the mean squared error (MSE) between the combined results and the ground truth on the labeled training data. The proposed IDAC method was evaluated on multiple benchmark datasets, and experimental results demonstrate that IDAC consistently achieves superior classification performance compared to several advanced methods.
-
-