Beruflich Dokumente
Kultur Dokumente
Deep Neural Networks for High Dimension, Low Sample Size Data
2287
Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17)
2288
Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17)
2289
Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17)
2290
Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17)
Figure 3: Test AUC scores of different methods with respect to the number of selected features.
learns a linear decision boundary which is insufficient for Table 1: Performance of classification and feature selection on syn-
highly complex and nonlinear data. The GBFS uses the re- thetic datasets with different numbers of true features. The statisti-
gression tree as a base learner, thereby achieving an axis- cally best performance is shown in bold.
parallel decision boundary. The HSIC-Lasso and the pro- True Dim 2 5 10 25
posed DNP not only model the nonlinear decision bound- LogR-l1
AUC 0.868 0.826 0.755 0.661
aries but also exactly identify the two true features. As Table Std 0.003 0.001 0.014 0.017
1 shows, the F1 scores of feature selection reach 1 for both AUC 0.748 0.721 0.757 0.565
GBFS
methods. Std 0.184 0.130 0.011 0.029
We further compare the performance of different algo- AUC 0.948 0.881 0.747 0.642
HSIC-Lasso
Std 0.003 0.001 0.003 0.007
rithms as shown in Table 1. The test AUC scores and F1
AUC 0.926 0.887 0.813 0.650
scores are averaged across five random generated datasets. In DNP
Std 0.005 0.023 0.025 0.016
terms of the test AUC score, DNP and HSIC-Lasso both show
superior performance over others. DNP performs best on all F1 score 0.141 0.313 0.364 0.266
the datasets and significantly outperforms HSIC-Lasso when LogR-l1
Std 0.025 0.045 0.046 0.035
m = 10 in terms of the t-test (p-value < 0.05). LogR-l1 is F1 score 0.182 0.050 0.429 0.105
significantly outperformed by DNP on three out of four syn- GBFS
Std 0.183 0.056 0.048 0.052
thetic datasets and tends to be comparable when m = 25. In F1 score 1.000 0.889 0.667 0.253
HSIC-Lasso
terms of the F1 score for feature selection, DNP performs the Std 0.000 0.000 0.000 0.033
best on all datasets and it even outperforms the most competi- F1 score 1.000 0.862 0.857 0.378
DNP
tive baseline, HSIC-Lasso, by 8.65% on average. GBFS con- Std 0.000 0.075 0.085 0.054
sistently performs worst in terms of both classification and Table 2: Statistics for real-world datasets.
feature selection. Data Colon Prostate GE Leukemia
Sample size 62 102 72
4.2 Experiments on Real-World Biological Dimensionality 2,000 5,966 7,070
Datasets Data ALLAML SMK CAN 87 GLI 85
To investigate the performance of DNP on the real-world Sample size 72 187 85
datasets, we use six public biological datasets3 , all of which Dimensionality 7,129 19,993 22,283
suffer from the HDLSS problem. The statistics of these gle feature in each iteration. As a result, in Fig. 3, the per-
datasets are shown in Table 2. We report the average results formance curve of DNP is equivalent to how the test AUC
for 10 times random split with 80% data for training, 10% for scores change with respect to the number of iterations. On all
validation, and 10% for testing. six datasets, test AUC scores of DNP converge quickly within
In Fig. 3, we investigate the average test AUC scores with fewer than 10 iterations. For LogR-l1 , GBFS, and HSIC-
respect to the number of selected features. DNP selects a sin- Lasso, the number of selected features is tuned by choosing
3
the appropriate hyper-parameters. We use a circle as an in-
Biological datasets: http://featureselection.asu. dicator when DNP is outperformed by the best baseline and
edu/datasets.php
2291
Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17)
2292
Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17)
2293