Topological Data Analysis for Genomics and Applications to Cancer


The recent explosion of biological data has underscored the need of theoretical, mathematical and computational frameworks to extract fundamental knowledge in these systems. Cancers are not an exception. Topological Data Analysis or TDA refers to a variety of techniques based on algebraic topology to study the structure of data.

This course will be divided into three parts:

- A comprehensive introduction with the necessary mathematical background on TDA techniques, dimensionality reduction and manifold learning.

- An introduction to cancer genomics, genomic analysis techniques, basics of population genetics, and mathematical modeling of the evolution of cancers.

- Research projects on studying large datasets of cancers using genomics and TDA techniques based on The Cancer Genome Atlas. Students will be assigned to research teams.

Due to the large scope of this course, during the lectures each topic will be covered at an introductory level. Problems will be assigned at each lecture so students can work at home and get familiarized with the topics.


This course is intended for cancer biologists, computational biologists, quantitative scientists (mathematicians, physicists, engineers, computer scientists) with interest in biological applications.

Although not required, it is strongly recommended that students have a basic knowledge of linear algebra (vector spaces and matrices), statistics, first order linear differential equations and basic programming.

Faculty name: Raul Rabadan / Anthea Monod / Pablo Camara

Department: Biomedical Informatics

Semester: Spring 2017

Place: Room 817 Irving Cancer Research Center (Medical Center Campus)

Schedule: M 12:30-3:30

50% Midterm.
50% Presentation.


  • Andrew Blumberg, Raul Rabadan, Topological Data Analysis for Genomics and Evolution, Cambridge University Press 2017.
  • Warrens Ewens, Mathematical Population Genetics, Springer 2004.
  • Joseph Felsenstein, Inferring Phylogenies, Sinauer, 2004.
  • Martin Nowak, Evolutionary Dynamics: Exploring the Equations of Life, Harvard University Press, 2006.
  • Matthew Hamilton, Population Genetics, Wiley-Blackwell, 2009.
  • Durbin et al., Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids, Cambridge University Press, 1998.