Automated segmentation for cortical thickness of the medial perirhinal cortex
Participants and MRI acquisition
Training data set
The training data set (N = 126, mean age = 69.8 ± 10.8 years) consisted of 101 patients and 25 healthy control participants (NC). Written informed consent was obtained from all individuals prior to participation and the study was approved by the local ethics committee (EKNZ: Ethics Committee of Northwestern and Central Switzerland). All methods were performed in accordance with the relevant guidelines and regulations. NCs were recruited from the “Registry of Healthy Individuals Interested to Participate in Research” of the Memory Clinic FELIX PLATTER Basel, Switzerland. They had undergone a thorough medical screening and neuropsychological testing to confirm their cognitive health. In particular, the exclusion criteria encompassed severe impairments in auditory, visual, or speech abilities; substantial sensory or motor deficits; severe systemic illnesses; persistent moderate to intense pain; conditions with significant or likely effects on the central nervous system (e.g., neurological disorders such as cerebral-vascular disease, generalized atherosclerosis, and psychiatric disorders); and the use of potent psychoactive substances, except for mild tranquilizers. In addition, all individuals classified NC obtained standard scores within the normal range on the Mini-Mental State Examination (MMSE)25, California Verbal Learning Task26, Clock Drawing Test (Critchley, 1953), and the short version of the Boston Naming Test27. Of the 101 patients, 29 participants were diagnosed with mild cognitive disorder (MCI) according to DMS-IV28. 26 participants were diagnosed with Major Depression (MD) including 14 participants recruited from the Memory Clinic FELIX PLATTER Basel, Switzerland, and 12 recruited from the University Psychiatric Clinics Basel, Switzerland. MDs had to score 10 or more points on the Becks Depression Inventory29, 13 or more on Becks Depression Inventory-II30, or 6 or more points on the Geriatric Depression Scale31. 8 participants were diagnosed with aMCI32 according to DSM-IV28 and Winblad et al. (2004) criteria. 18 participants were diagnosed with dementia due to AD (dAD) according to DSM-IV criteria28, and NINCDS-ADRDA33. aMCI and dAD were combined to one AD group (N = 26) based on the assumption that the progression from aMCI to early dementia stage of AD is gradual and time of diagnosis can differ34. Further, 20 patients were diagnosed with dementia due to other etiologies than AD (non-AD; e.g., due to Lewy body disease) according to DSM-IV. For an overview see Table 1. All patients had been recruited either from the Memory Clinic FELIX PLATTER Basel, Switzerland, where they had received neuropsychological testing, and medical and neurological examinations including blood analyses, or in the case of the 12 MDs from the University Psychiatric Clinics Basel, Switzerland. All participants were native Swiss-German or German-speaking adults.
Participants received T1-weighted 3D magnetization-prepared rapid acquisition gradient echo (MPRAGE) structural MRI using the same 3-Tesla scanner (MAGNETOM Skyra fit, Siemens; inversion time = 900 ms, repetition time 2300 ms, echo time 2.92 ms, flip angle = 9; acquisition matrix = 256 × 256 mm, voxel size = 1 mm isotropic, acquisition time = 5 min 12 s) at the University Hospital Basel, Switzerland.
Test data set
The test data set (N = 103, mean age = 76.4 ± 7.0 years) is identical to the one used for group comparison in Krumm et al.14 and contained 46 healthy control participants (NC), 34 participants diagnosed with early Alzheimer’s dementia (dAD) according to NINCDS-ADRDA and DSM-IV criteria28 and 23 patients with amnestic mild cognitive disorder (aMCI) according to DSM-IV and Winblad et al.35 criteria (see Table 2). For a comprehensive overview of the inclusion and exclusion criteria, see Krumm et al.14. All patients had been recruited from the Memory Clinic FELIX PLATTER Basel, Switzerland, where they had received neuropsychological testing, and medical and neurological examinations including blood analyses. All participants were native Swiss-German or German-speaking adults.
Participants received T1-weighted 3D MPRAGE structural MRI using the same 3-Tesla scanner (MAGNETOM Verio, Siemens; inversion time = 1000 ms, repetition time 2000 ms, echo time 3.75 ms, flip angle = 8; acquisition matrix = 256 × 256 mm, voxel size = 1 mm isotropic, acquisition time = 7 min 30 s) at the University Hospital Basel, Switzerland.
Preprocessing of structural MRI and manual segmentation
MRI scans were preprocessed using FreeSurfer (Massachusetts General Hospital, Boston, MA, USA; http://surfer.nmr.mgh.harvard.edu; accessed on 7 January 202036,37). In a semi-automated processing stream, FreeSurfer segmented the T1-weighted 3D MPRAGE volumes into grey and white matter. Next, the surface of white matter, represented by the transition area from white to grey matter, and the pial surface were modeled36. Lastly, tissue classification was visually confirmed for all participants, and, if required, manual adjustments were performed. Regions of interest (ROIs; i.e., mPRC, lPRC, and ERC) for both hemispheres were manually drawn by a blinded rater on coronal slices, according to the protocol depicted in Krumm et al.14, which takes collateral sulcus variation into account (for visual examples of the anterior-posterior borders of manual segmentation, see18).
Training and application of automated segmentation
The semi-automatic labels were mapped to the gray matter obtained by Freesurfer and transformed to the 3D voxel space to create regional masks for mPRC, lPRC and ERC. Using each of the masks, we trained a separate network to segment the respective region as a voxel mask (for examples see Supplementary material). The predicted voxel mask was then mapped back to the Freesurfer space to compute morphological characteristics such as the average cortical thickness. We used the nnU-Net38 framework to train the networks. The nnU-Net23,24 is a toolbox to train 2D and 3D U-Nets, specifically optimized for user-friendly model training and selection with biomedical imaging data. The U-Net23,24 is a multi-stage neural network architecture for semantic segmentation. The input image, a T1 weighted MRI in this work, is processed on multiple resolution levels. The features from the analysis path (with increasing voxel size) are combined with the features from the synthesis path (with decreasing voxel size) at every resolution level except the lowest. This leads to an effective combination of high-level features with large spatial context and low-level features with small spatial context. The output is a pixel-wise semantic segmentation. At the border of regions, the class labels are ambiguous. For example, a pixel contains 50% of two classes due to interpolation. To better account for this ambiguity, we substitute the default sparse cross-entropy loss with dense cross-entropy loss that was capable of modeling a full probability distribution. The conversion from surface-based annotations to voxel label and back were done with Freesurfer. Eventually, we trained a separate network for the ERC, mPRC, and lPRC, respectively for 150 epochs.
The inference of the MRI data was performed without additional pre-processing. In two cases, the prediction of one of the masks failed and could not be projected to the FreeSurfer space to cortical thickness values (e.g., ERC right hemisphere for one participant, lPRC right hemisphere for another participant). To ensure the accuracy of the automated segmentations, we performed a quality control assessment on a subset of 60 participants, with 20 randomly selected from each diagnostic group (healthy controls, aMCI, and AD). The process involved a detailed visual inspection of coronal slices in FreeSurfer, focusing on key anatomical landmarks such as the medial and lateral borders of all ROIs (ERC, mPRC, and lPRC). Each segmentation layer was inspected systematically from the anterior to posterior border to detect any gross overextensions, under-segmentations, or incorrectly labeled pixels. A significant deviation would have included segmentation labels being entirely misplaced outside the medial temporal lobe, gross misplacement of the ROI, such as segmentation labels extending well beyond the expected anatomical boundaries, extensive gaps within the ROI where relevant pixels belonging to the cortical structure were consistently excluded, or a complete absence of labeled pixels for a given ROI. Additionally, a segmentation would have been flagged if it spanned fewer than 10 slices in the anterior-posterior direction, as this would indicate insufficient coverage of the expected anatomical region. In this sub-sample of 60 participants, the ROI masks performed as expected, with no significant deviations observed. Given these consistent results and the high ICC values between manual and automated segmentation, extending quality control to the full sample was deemed unnecessary. An example, where the progression of segmentation masks across consecutive coronal slices from the anterior to posterior boundary is displayed alongside the corresponding unsegmented T1-weighted images, is displayed in Supplementary Fig. 2. In addition, all quality control criteria used for evaluating the segmentation masks are summarized in Supplementary Table 1. Based on the regions that were analyzed in the study by Krumm et al.14, we additionally trained a separate network for the parahippocampal cortex. However, since this region is not the focus of this work, it is not further discussed in this manuscript.
Statistical analyses
For each ROI, an aggregated bilateral cortical thickness value was used. Cortical thickness measurements were normalized for head size (as total intracranial volume [TIV]) as reported by Krumm et al.14 using the formula [(cortical thickness)/(TIV) × 100]. For reporting in Table 3, normalized values were retransformed to mm using the mean TIV of the two comparing groups (e.g., dAD versus NC mean TIV = 1453 cm3; aMCI versus NC mean TIV = 1480 cm3). Group differences were examined conducting univariate analysis of covariance (ANCOVA), incorporating age, sex, and education level as covariates. To address multiple comparisons, significance thresholds were adapted using the Bonferroni correction (e.g., p = 0.05/8 = 0.00625). In addition, to evaluate the accuracy between the two methods (manual and automated segmentation), TIV corrected bilateral cortical thickness values of all participants of the test data set were compared using intraclass correlation coefficient (ICC) estimates and their 95% confidence intervals based on a single-rating, consistency, and a 2-way mixed-effects model according to the guidelines of Koo and Li39. All analyses were executed in SPSS software, and while Krumm et al.14 utilized SPSS 21.0, our replication utilized the subsequent version, SPSS 22.0 (IBM Corp. Released 2013. IBM SPSS Statistics for Windows, Version 22.0. Armonk, NY, USA).