Topological data analysis

Last updated on Aug 31, 2025

Topological data analysis (TDA) is based on an equally simple as intriguing principle: leverage invariants from algebraic topology to gain novel insights into data. While initially, TDA started as a vague idea, it is now applied by researchers working in astronomy, biology, finance and materials science. Since also data sets collected by businesses are rapidly growing in dimensionality and complexity, mastering the versatile tools from TDA is on the verge of becoming a key asset for data scientists working in industry.

However, the excitement about the spectacular novel results comes with a caveat: new insights are often presented through plots that are visually compelling, but critically lack a solid statistical underpinning. In order to decide scientifically whether any purported findings are indeed significant or merely an incarnation of chance, the rigor of statistical tests is essential.

TDA-based hypothesis tests for point patterns

The major driving force behind the success of TDA is the persistence diagram. Loosely speaking, it is constructed on a process of growing balls centered at a cloud of data points. In this growth process, at specific time instances, topological features such as loops or higher-dimensional holes may form. When they become covered by the growing balls, such features disappear again, thus eventually leading to a family of birth- and death times collected in the persistence diagram.

Within this stream of research, in a paper/ code with C.A.N. Biscio, N. Chenavier and A.M. Svane, we develop a goodness-of-fit test for point patterns based on a functional CLT for the persistence diagram.

Furthermore, the field of TDA is not limited to point patterns, but extends to richer structures. For instance, the persistence diagram has been used to analyze complex arterial networks in the brain. In a paper/ code with J.T.N. Krebs, we are establishing a functional CLT also for spatial random networks such as the directed spanning forest. Moreover, in materials science, measuring 3D data of a material is often prohibitively expensive, whereas it is substantially more practical to gather several 2D slices. Devising statistical tests in this context is challenging since the topologies in adjacent slices are highly correlated. In a paper with A. Cipriani and M. Vittorietti, we rely on the tools of persistence in order to track such correlations over several slices.

When analyzing persistent homology, practitioners often look for features living for exceptionally long periods of time, and then draw conclusions if they do occur. However, how can we decide whether the observed long life times come from genuinely interesting phenomena and are not a mere incarnation of chance? In a paper/ code with N. Chenavier, we move one step closer to statistical applications and establish Poisson approximation results for extremal life times of loops and holes in large sampling windows. In a paper with M. Otto and N. Lundbye, we consider the sparse regime where it is possible to give a more precise description of the shape of the maximal cycles.

Asymptotic normality for Betti numbers of Gaussian fields

At the heart of many goodness-of-fit tests in increasing sampling windows lies a central limit theorem (CLT), which in the setting of TDA has a clear geometric flavor. In particular, the underlying correlations are also of geometric nature and require us to understand large-scale structures percolating over long spatial distances. For spatial random networks, the notion of percolation and large components is well- established2 and has been pivotal for proving consistency properties or deriving quantitative CLTs. Hence, in order to prove functional CLTs in TDA, the major piece that we are missing at the moment is a proper concept of percolation that is adapted to TDA. With D. Valesin we conceptualized a novel form of percolation designed to mirror correlations in functional CLTs.

TDA-based notions of continuum percolation

For instance, in astronomy, TDA may give indications whether the cosmic microwave background stems from an isotropic Gaussian random field. However, in order to decide scientifically whether any purported findings are indeed significant or merely an incarnation of chance, the rigor of statistical tests is essential. In a paper with R. Lachièze-Rey, we concentrate on the Gaussian case. The idea is here to leverage Morse theory that connects persistent Betti to the study of critical points of Gaussian fields. A key difficulty is to control higher moments of the number of critical points.

Limit theorems for multiparameter persistence

Understanding the statistical foundations of the persistence diagram is an important problem. There are, however, many situations for which it is natural to simultaneously consider multiple filtration parameters, e.g. when a point cloud comes equipped with additional measurements taken at the locations of the data. Multiparameter persistent homology was introduced to accommodate such multifiltrations, and it has become one of the most active areas of research within TDA, with exciting progress on multiple fronts. In a paper with M.B. Botnan, we offer a first step towards a rigorous statistical foundation of multiparameter persistence. Notably, we establish the strong consistency and asymptotic normality of the multiparameter persistent Betti numbers in growing domains.

Publications

Functional limit theorems for edge counts in dynamic random connection hypergraphs

We introduce a dynamic random hypergraph model constructed from a bipartite graph. In this model, both vertex sets of the bipartite …

C. Hirsch, B. Jahnel, P. Juhász