PhD thesis: Statistical data analysis

Introduction
The exploration and interpretation of multivariate data has gained high interest in the last years. Not only the huge amount of data collected from financial markets and other economical places, but also biomedical data, like EEG-data, recorded by physicians has to be analysed by data analysis tools to reveal the underlying information. Statistical data analysis is a promissing technique where statistical methods are used, as for example the independent component analysis (ICA), random matrix theory (RMT) or time series analysis, to handle this kind of data.

A common problem in data analysis and signal processing is to find a suitable transformation of the data, so that the new representation reveals the essential structure of the data. An example is given in Fig. 1, where some sensor signals were recorded, but a suitable tranformation has to be found to reveal the essential structure of the data. Using the independent component analysis one can find the hidden structure of the data only by impossing the restriction to find a representation, where the components have minimal stochastical dependence.


Fig. 1

Such a representation seems to capture the essential structure of the data in many applications and can be used to solve the blind source separation (BSS) problem. As an example we will visualize the BSS problem with a mixture of images. Suppose we have some original images as in Fig. 2.


Fig. 2

Due to some circumstances, only a mixture of these original images can be observed, as seen in Fig. 3. The goal is now to determine the original images/sources only from the mixed images and the knowledge that the original images are independent.


Fig. 3

This problem can be solved by using the independent component analysis. We visualize in Fig. 4 the unmixing process of the an ICA-algorithm. Every set of images corresponds to one iteration step. It is obvious that the algorithm converges to the original images, except of scaling (sign) and permutation, which can't be distinguished by the algorithm.


Fig. 4

But not only for synthetic data this method is applicapable, also for bio-medical data one can use these data analysis methods. Currently we are working together with the Neurosurgery group of Prof. Brawanski at the University Hospital Regensburg. Two main projects are of interest, first the electro-encephalography (EEG) to detect abnormal behaviour of the brain and second the neuro monitoring of patients on the intensive care unit. Here we give two examples:

Electro-encephalography (EEG) - Data Analysis
Electro-Encephalography (EEG) is a method, where electrical potentials are measured with electrodes on the surface of the head. Goal is to get a better understanding of the processes in the human brain, to diagnose brain disease or monitor the depth of anaesthesia. One main problem is the superposition of the signals of the brain it self and the artifacts like eye-blinks, head movements or the heartbeat. Before any further analysis is possible, one has to extract and separate these signals by a data analysis tool. In Fig. 5, we have plotted on the left side the signals measured at the electrodes over a periode of 7 seconds.


Fig. 5

To extract the artifacts and other interesting signatures we use the independent component analysis, which results in the right plot of Fig. 5. All 21 independent components (IC) are plotted. One can clearly see the eye-blink in the first IC, the heart beat in IC #5 and the main alpha waves of the brain in IC #9. But we gain more information than only the separated signals, the origin of the ICs can be projected on the head, using the information from the recovered mixing vectors, as shown in Fig. 6.


Fig. 6

Still there is no ordering of the independent components in respect to their relevance and meaningfulness, this is a topic of current research, as well as taking time structures into account to impove separation quality.

Another interesting topic is the analysis of EEG data with random matrix theory, which seem to give some indications of the patients brain behaviour. But this is still work in progress...

Neuro-monitoring
Goal of the analysis of neuro monitoring data is to identify brain diseases, by a better understanding which parameter and processes the system trigger and which influence they have on the brain. Main parameters measured are the oxygen level in the blood and brain tisue, the pressure in the brain, the temperature and the blood pressure (see Fig. 7).


Fig. 7

Predicting these parameters would improve the treatment of patients on the intensive care unit. This could probably be done by modelling the system "brain" and fitting parameters to this modell. Also possible chaotic behaviour could be analysed and a critical states be identified.

References
A. Jung, Statistical analysis of biomedical data, PhD-Thesis - University Regensburg, Dec. 2003. pdf

A. Jung, A mathematical model of the hydrodynamical processes in the brain - a rigorous approach, Proceedings of Workshop GK "Nonlinearity" - Regensburg, Oct. 2002. ps-zip (11.7MB) pdf (low resolution-1.6MB) pdf (high resolution-10.8MB)

A. Jung, An introduction to a new data analysis tool: Independent Component Analysis, Proceedings of Workshop GK "Nonlinearity" - Regensburg, Oct. 2001. ps-zip pdf