Rationale: We think the key for improving the accuracy of seizure prediction is to incorporate data structures into the model. The data structures are two folds: 1) the epilepsy patient population structure associated with clinical categories such as pathologies, multi-foci and drug resistance; 2) the latent epilepsy state structure including interictal, preictal, ictal, postictal.
We propose to identify the underlying data structure using hierarchical clustering methodologies based on SS or GKM, and perform statistical inference to determine the underlying number of clusters. In Section 1C.1we focus on capturing the data structure among the patient population; in Section 1C.2 we will investigate the data structure along the time scale.
…show more content…
There are many different factors causing epilepsy including perinatal hypoxia, head injuries, glioblastoma, tuberous sclerosis, virus infections and stroke abnormal and so on [31]. Moreover, in up to 70% of all cases of epilepsy in adults and children, the cause for epilepsy is unclear. Therefore, using a single model to predict epileptic seizure for all patients could lead to decreased specificity and sensitivity especially when the model is generalized to different population [32, 33].
In order to tackle this issue, we propose to develop a novel method to detect the underlying group structure among epilepsy patients, and build classification model within each group to achieve better performance. Clustering is a widely used unsupervised learning methods to identify structure in the data by finding groups of similar objects. Clustering has been used to successfully identify systematic patterns in a broad range of biomedical applications including cancer research [34,35] and Parkinson’s Disease
…show more content…
The diffusion metric, using the time it takes the diffusion to spread over the patient populations, yields a continuous hierarchy of clusters. Hierarchy clustering can lead to any desired number of clusters, but how to quantify the statistical significance in the obtained clustering result is a challenging task. Several groups have developed cluster evaluation method to estimate the number of clusters in the non-nested setting. (see for example [38-43] ). For hierarchical setting, Suzuki and Shimodaira (2006) developed a testing procedure for high dimensional data based on multiscale bootstrapping procedures on the feature space proposed by Shimodaira (2004). \ Since the diffusion metric is based on low dimensional manifold, the on multiscale bootstrapping procedures cannot be implemented. In order to handle both low and high dimensional data, Kimes et al. (2015) proposed a sequential testing scheme, Significance of Hierarchical Clustering (SHC), to assess the significance of hierarchical clustering result. While SHC is able to control the family wise error rate, it assumes the data follows multivariate Gaussian distribution, which is questionable for SS and GKM. We thus propose to extend the sequential testing scheme of SHC using a non-parametric bootstrap method to evaluate the significances of the derived clustering on SS and