Research Interests
The amount of high throughput data in biological and clinical systems—from next-generation sequencing experiments to electronic health records—is increasing dramatically, allowing for the development of a quantitative understanding of these complex systems. Our lab is an interdisciplinary team interested in developing mathematical and computational tools to extract useful biological information from large data sets.

Our work focuses on three distinct topics:

  • Infectious diseases.
    Evolution is a dynamic process that shapes genomes. Our team at Columbia is developing algorithms and software to analyze genomic data, with a view to understanding the molecular biology, population genetics, phylogeny, and epidemiology of viruses.

  • Cancer.
    Next-generation sequencing technologies provide an extraordinary opportunity to identify somatic mutations that contribute to the development of tumors. We are developing methods to identify cancer-driving mutations in high throughput sequencing datasets.

  • Electronic Health Records.
    Clinical databases constitute a rich and complex source of raw data. We are using the power of statistics and computers to tease out important clinical patterns in these diverse, important datasets.

  • Because of this recent explosion in biological and medical data—a 2011 New York Times article referred to the phenomenon as a "deluge of data"—tackling these research problems often requires heavy-duty computation. To facilitate this, we have access to a super-fast computer cluster maintained by the Center for Computational Biology and Bioinformatics.

