Topological data analysis
Topological data analysis (TDA) is based on an equally simple as intriguing principle: leverage invariants from algebraic topology to gain novel insights into data. While initially, TDA started as a vague idea, it is now applied by researchers working in astronomy, biology, finance and materials science. Since also data sets collected by businesses are rapidly growing in dimensionality and complexity, mastering the versatile tools from TDA is on the verge of becoming a key asset for data scientists working in industry.
However, the excitement about the spectacular novel results comes with a caveat: new insights are often presented through plots that are visually compelling, but critically lack a solid statistical underpinning. In order to decide scientifically whether any purported findings are indeed significant or merely an incarnation of chance, the rigor of statistical tests is essential.
The major driving force behind the success of TDA is the persistence diagram. Loosely speaking, it is constructed on a process of growing balls centered at a cloud of data points. In this growth process, at specific time instances, topological features such as loops or higher-dimensional holes may form. When they become covered by the growing balls, such features disappear again, thus eventually leading to a family of birth- and death times collected in the persistence diagram.
Within this stream of research, in a paper/ code with C.A.N. Biscio, N. Chenavier and A.M. Svane, we develop a goodness-of-fit test for point patterns based on a functional CLT for the persistence diagram.
Furthermore, the field of TDA is not limited to point patterns, but extends to richer structures. For instance, the persistence diagram has been used to analyze complex arterial networks in the brain. In a preprint/ code with J.T.N. Krebs, we are establishing a functional CLT also for spatial random networks such as the directed spanning forest.
When analyzing persistent homology, practitioners often look for features living for exceptionally long periods of time, and then draw conclusions if they do occur. However, how can we decide whether the observed long life times come from genuinely interesting phenomena and are not a mere incarnation of chance? In a preprint/ code with N. Chenavier, we move one step closer to statistical applications and establish Poisson approximation results for extremal life times of loops and holes in large sampling windows.