Semi-blind machine learning for fMRI-based predictions of intelligence
Predicting neuromarkers for cognitive abilities using fMRI has been a major focus of research in the past few years. However, it has recently been reported that many thousands of participants are required to obtain reproducible results (Marek et al (2022)). This appears to be a major impediment to obtaining neuromarkers from fMRI because large sample sizes are typically not available in neuroimaging studies. Here we show that the out-of-sample prediction accuracy can be dramatically improved by supplementing fMRI with readily available non-imaging information so that reliable predictive modeling becomes feasible even for small sample sizes. Specifically, we introduce a novel machine learning method that predicts intelligence from resting-state fMRI data, leveraging educational level as supplementary information. We refer to our approach as "semi-blind machine learning (SML)" because it operates under the assumption that supplementary information, such as educational level, is available for subjects in both the training and test sets. This setup closely mirrors real-world scenarios, especially in clinical contexts, where patient background information typically exists and can be utilized to boost prediction accuracy. However, guarding against bias is crucial. Subjects should not be categorized as more intelligent simply based on their higher education levels. Therefore, our approach contains a component explicitly designed for bias control. We have applied our method to three different data collections and observed marked improvements in prediction accuracies across a wide range of sample sizes. We anticipate that semi-blind machine learning provides a promising approach to fMRI-based predictive modelling with the potential for a wide range of future applications.