Spectral methods for genomic multivariate analysis

We have worked on clustering methods for genomic data which allow multiple membership of genes in clusters. In particular, we used spectral analysis methods, such as Singular Value Decomposition (SVD), to uncover global patterns of gene expression. We also looked at other methods such as association rule mining, fuzzy clustering, and the general systems problem solver. We wrote a “manual” for how to use SVD and the related principal component analysis, for microarray data, which included a overview of the method and some insights about its relationship to Fourrier analysis [Wall, Rechtsteiner, and Rocha ,2003]. We have applied this method to uncover novel expression patterns in human cells subjected to Human Cytomegalovirus (Herpes) infection [Challacombe, et al,2004] in a collaboration with the life sciences division at Los Alamos and the Shenk lab at Princeton University. Another thread is using SVD in the context of biomedical text mining for the automatic functional annotation of genes and proteins from the literature [Rechtesteiner, 2005; Maguitman, A. et al, 2006; Haidar et al, 2008;].

Singular value decomposition of microarray data: example data from human cells subjected to Human Cytomegalovirus (Herpes) infection.Challacombe, et al,(2004).

Two relevant clusters of co-expressed genes are identified. Above: Two first eigengenes shown. Below: Correlation plot of all genes on the space of the two first eigenassays shown.



The human brain displays heterogeneous organization in both structure and function. The figure depicts a method to characterize brain regions and networks in terms of information-theoretic measures. This framework is applied to human brain fMRI recordings of resting-state activity and DSI-inferred structural connectivity. From Kolchinsky et al 2014

Multi-scale modularity and inference

More recently, we have used SVD and information theory to cluster very large knowledge networks of gene regulation obtained from bioinformatics databases and the literature. This allows us to identify overlapping functional clusters that occur in various scales of complex networks [Correia, Navarro-Costa and Rocha, 2020], such as those characterizing gene regulation. Together with our distance backbone methodology, this has lead to the discovery of novel genes involved in human infertility [Correia et al, 2022].

In addition to spectral methods, we study statistical prediction [Kolchinsky and Rocha, 2011], modularity [Kolchinsky, Gates and Rocha, 2015, Marques-Pita and Rocha, 2013], multi-scale integration in the dynamics of complex networks, such as brain networks [Kolchinsky et al 2014], and other scalable methods to study dynamics of networks [Rocha , 2022; Parmer, Rocha, & Radicchi, 2022]---for dynamics on networks see our work on distance backbones.




Project Members (Current and Former)

Luis Rocha (PI)

Alaa Abi-Haidar

Rion Brattig Correia

Alex Gates

Artemy Kolchinsky

Ana Maguitman

Thomas Parmer

Olaf Sporns

Andreas Rechtsteiner




Selected Project Publications