Restricted to current U-M faculty, staff, and students

  •   Home
  • Research Collections
  • Dissertations and Theses (Ph.D. and Master's)

Statistical Methods and Computational Tools for Genetics and Genomics Data

PDF file

Deep Blue DOI

Collections, remediation of harmful language.

The University of Michigan Library aims to describe library materials in a way that respects the people and communities who create, use, and are represented in our collections. Report harmful or offensive language in catalog records, finding aids, or elsewhere in our collections anonymously through our metadata feedback form . More information at Remediation of Harmful Language .


If you are unable to use this file in its current format, please select the Contact Us link and we can modify it to make it more accessible to you.

Statistical Genetics

Statistical geneticists at SPH develop statistical methods for understanding the genetic basis of human diseases and traits.  These methods involve large-scale data sets from candidate-gene, genome-wide and resequencing studies, using both unrelated and related individuals.  SPH statistical geneticists collaborate with other investigators at SPH and around the world on studies of cancer, heart disease, diabetes, respiratory disease, psychiatric disease, and health-related behaviors (e.g. smoking, diet).  They have close ties to the Program in Quantitative Genomics and Computational Biology and Bioinformatics group at SPH.  Training encompasses basic statistics; Mendelian and population genetics; design and analysis of genetic association studies; gene expression and epigenetic markers; and gene-environment interaction.

Students holding a degree in mathematics, computer science, statistics or a related field and an interest in genetics are invited to apply to our Doctoral or Master’s degree programs.  Faculty in the PGSG advise students in both the Epidemiology and Biostatistics departments. Prospective students can apply to either department. While it is possible to apply to both departments, it is typically not recommended. It is Graduate School policy that an individual may submit no more than three applications during the course of his or her academic career. Prospective students are encouraged to discuss which program will best fit their needs with potential advisors. More details about the application process can be found here .

Postdoctoral training positions are also available, with support coming from individual Principal Investigators or appropriate training grants.  Prospective students or postdoctoral fellows with an interest in statistical genetics at SPH should contact Alkes Price .

News from the School

Air pollution and cardiovascular hospitalization

Air pollution and cardiovascular hospitalization

Unlocking new information about vaginal health

Unlocking new information about vaginal health

‘I’m going to fix everyone’

‘I’m going to fix everyone’

Reducing the burden of prostate cancer

Reducing the burden of prostate cancer

Monash University

File(s) not publicly available

Statistical and ascertainment problems in human genetics, campus location, year of award, department, school or centre, degree type, usage metrics.

Faculty of Science Theses


  • Statistical Genetics

My research interests include phylogenetics, modeling biological data, statistical analysis of molecular data, and parallel computing. I am particularly interested in reconstructing species phylogenies from multilocus sequences.

  • Big Data Analytics
  • Design of Experiments
  • Functional Magnetic Residence Imaging (fMRI)
  • General Statistics
  • Machine Learning
  • Statistics and Education

We appreciate your financial support. Your gift is important to us and helps support critical opportunities for students and faculty alike, including lectures, travel support, and any number of educational events that augment the classroom experience. Click here to learn more about giving.

Every dollar given has a direct impact upon our students and faculty.

Mathematics, genetics and evolution

  • Published: 06 February 2013
  • Volume 1 , pages 9–31, ( 2013 )

Cite this article

  • Warren J. Ewens 1  

5736 Accesses

4 Citations

1 Altmetric

Explore all metrics

The importance of mathematics and statistics in genetics is well known. Perhaps less well known is the importance of these subjects in evolution. The main problem that Darwin saw in his theory of evolution by natural selection was solved by some simple mathematics. It is also not a coincidence that the re-writing of the Darwinian theory in Mendelian terms was carried largely by mathematical methods. In this article I discuss these historical matters and then consider more recent work showing how mathematical and statistical methods have been central to current genetical and evolutionary research.

Article PDF

Download to read the full article text

Similar content being viewed by others

statistical genetics thesis

Evolutionary Genetics

statistical genetics thesis

Natural Selection and Evolution

statistical genetics thesis

Models in the Biological Sciences

Avoid common mistakes on your manuscript.

Darwin, C. (1859) On the Origin of Species by Means of Natural Selection or the Preservation of Favoured Races in the Struggle for Life. London: John Murray.

Google Scholar  

Mendel, G. (1866) Versuche über pflanzenhybriden (Experiments relating to plant hybridization). Verh. Naturforsch. Ver. Brunn, 4, 3–17.

Hardy, G. H. (1908) Mendelian proportions in a mixed population. Science, 28, 49–50.

Article   PubMed   CAS   Google Scholar  

Weinberg, W. (1908) Über den Nachweis der Vererbung beim Menschen. (On the detection of heredity in man). Jahreshelfts. Ver. Vaterl. Naturf. Württemb., 64, 368–382.

Fisher, R. A. (1930) The Genetical Theory of Natural Selection. Oxford: Clarendon Press.

Malécot, G. (1948) Les Mathématiques de l’Hérédité. Paris: Masson.

Kingman, J. F. C. (1961) A mathematical problem in population genetics. Proc. Camb. Philol. Soc., 57, 574–582.

Article   Google Scholar  

Wright, S. (1931) Evolution in Mendelian populations. Genetics, 16, 97–159.

PubMed   CAS   Google Scholar  

Ewens, W. J. (2004) Mathematical Population Genetics. New York: Springer.

Book   Google Scholar  

Kimura, M. (1971) Theoretical foundation of population genetics at the molecular level. Theor. Popul. Biol., 2, 174–208.

Ewens, W. J. and Kirby, K. (1975) The eigenvalues of the neutral alleles process. Theor. Popul. Biol., 7, 212–220.

Ewens, W. J. (1972) The sampling theory of selectively neutral alleles. Theor. Popul. Biol., 3, 87–112.

Kimura, M. (1969) The number of heterozygous nucleotide sites maintained in a finite population due to steady flux of mutations. Genetics, 61, 893–903.

Watterson, G. A. (1975) On the number of segregating sites in genetical models without recombination. Theor. Popul. Biol., 7, 256–276.

Tavaré, S. (1984) Lines of descent and genealogical processes, and their applications in population genetic models. Theoret. Pop. Biol., 26, 119–164.

Watterson, G. A. (1977) Heterosis or neutrality? Genetics, 85, 789–814.

Ewens, W. J. (1974) A note on the sampling theory for infinite alleles and infinite sites models. Theor. Popul. Biol., 6, 143–148.

Tajima, F. (1989) Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics, 123, 585–595.

Hein, J., Schierup, M. H. and Wiuf, C. (2005) Gene Genealogies, Variation and Evolution. Oxford: Oxford University Press.

Wakeley, J. (2009) Coalescent Theory. Greenwood Village, Colorado: Roberts and Company.

Marjoram, P. and Joyce, P. (2009) Practical implications of coalescent theory. In Lenwood, L. S. and Ramakrishnan, N. (eds.), Problem Solving Handbook in Computational Biology and Bioinformatics. New York: Springer.

Nordborg, M. (2001) Coalescent theory. In Balding, D. J., Bishop, M. J. and Cannings, C. (eds.), Handbook of Statistical Genetics. Chichester, UK: Wiley.

Kingman, J. F. C. (1982) The coalescent. Stoch. Proc. Appl., 13, 235–248.

Kingman, J. F. C. (1982) On the genealogy of large populations. J. Appl. Probab., 19, 27–43.

Kelly, F. P. (1977) Exact results for the Moran neutral allele model. J. Appl. Probab., 14, 197–201.

Donnelly, P. J. and Tavaré, S. (1986) The ages of alleles and a coalescent. Adv. Appl. Probab., 18, 1–19.

Donnelly, P. J. (1986) Partition structures, Polya urns, the Ewens sampling formula, and the ages of alleles. Theor. Popul. Biol., 30, 271–288.

Watterson, G. A. and Guess, H. A. (1977) Is the most frequent allele the oldest? Theor. Popul. Biol., 11, 141–160.

Article   CAS   Google Scholar  

Kingman, J. F. C. (1975) Random discrete distributions. J. R. Stat. Soc. [Ser. A], 37, 1–22.

Watterson, G. A. (1976) The stationary distribution of the infinitelymany neutral alleles model. J. Appl. Probab., 13, 639–651.

Crow, J. F. (1972) The dilemma of nearly neutral mutations: how important are they for evolution and human welfare? J. Hered., 63, 306–316.

Griffiths, R. C. (1980) Unpublished notes.

Engen, S. (1975) A note on the geometric series as a species frequency model. Biometrika, 62, 697–699.

McCloskey, J. W. (1965) A model for the distribution of individuals by species in an environment. Unpublished PhD. thesis. Michigan State University.

Tavaré, S. (2004) Ancestral inference in population genetics. In Picard J. (ed.), Êcole d’Êté de Probabilités de Saint-Fleur XXX1-2001, 1–188, Berlin: Springer-Verlag.

Durrett, R. (2008) Probability Models for DNA Sequence Evolution. Berlin: Springer-Verlag.

Etheridge, A. (2011) Some Mathematical Models from Population Genetics. Berlin: Springer-Verlag.

Download references

Author information

Authors and affiliations.

Department of Biology and Statistics, The University of Pennsylvania, Philadelphia, PA, 19104, USA

Warren J. Ewens

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Warren J. Ewens .

Rights and permissions

Reprints and permissions

About this article

Ewens, W.J. Mathematics, genetics and evolution. Quant Biol 1 , 9–31 (2013).

Download citation

Received : 08 October 2012

Revised : 22 October 2012

Accepted : 06 November 2012

Published : 06 February 2013

Issue Date : March 2013


Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Natural Selection
  • Random Mating
  • Additive Genetic Variance
  • Allelic Type
  • Allele Model
  • Find a journal
  • Publish with us
  • Track your research

Purdue University Graduate School

Computational Methods for Population Genetics

Iii: small: novel statistical data analysis approaches for mining human genetics datasets.

Directorate for Computer & Information Science & Engineering

III: Small: Fast and Efficient Algorithms for Matrix Decompositions and Applications to Human Genetics

Bigdata: f: dka: collaborative research: randomized numerical linear algebra (randnla) for multi-linear and non-linear data, degree type.

  • Doctor of Philosophy
  • Computer Science

Campus location

  • West Lafayette

Advisor/Supervisor/Committee Chair

Advisor/supervisor/committee co-chair, additional committee member 2, additional committee member 3, usage metrics.

  • Statistical and quantitative genetics
  • Genetics not elsewhere classified
  • Anthropological genetics
  • Behavioural genetics
  • Bioinformatics and computational biology not elsewhere classified
  • Applications in life sciences
  • Knowledge representation and reasoning
  • Pattern recognition
  • Data mining and knowledge discovery

CC BY 4.0

statistical genetics thesis

  • Our Department
  • Equality, Diversity and Inclusion
  • Graduate Research
  • Msc in Statistical Science
  • Undergraduate Study
  • Research Facilitation
  • Public Engagement with Research
  • Industry Relations
  • Consultancy


DPhil in Statistics

The Department of Statistics admits doctoral students each year to a programme of instruction and research leading to the Doctor of Philosophy (DPhil) in Statistics degree. A doctorate normally requires between three and four years of full-time study. 

In the DPhil in Statistics, you will investigate a particular project in depth and write a thesis which makes a significant contribution to the field.   It can be in any of the subject areas for which supervision is available.

The Department of Statistics in the University of Oxford is a world leader in research in probability, bioinformatics, mathematical genetics and statistical methodology, including computational statistics and machine learning. Much of the department’s research is either explicitly interdisciplinary or draws its motivation from application areas, ranging from biology and physics to the social sciences.

You will be assigned a named supervisor or supervisors, who will have overall responsibility for the direction of your work on behalf of the department. You will have the opportunity to interact with fellow students and other members of your research groups, and more widely across the department. Typically, as a research student, you should expect to have meetings with your supervisor or a member of the supervisory team with a frequency of at least once every two weeks averaged across the year. The regularity of these meetings may be subject to variations according to the time of the year, and the stage that you are at in your research programme.

There are formal assessments of progress on the research project at around 12 to 15 months and at around 30 to 36 months. These assessments involve the submission of written work and oral examination.

The final thesis is normally submitted for examination during the fourth year and is followed by the viva examination.

You will be expected to acquire transferable skills as part of your training, and to undertake a total of 100 hours broadening training outside your specialist area. Part of that broadening training is obtained through APTS, the Academy for PhD Training in Statistics; this is a joint venture with a group of leading university statistics departments which runs four weeks of appropriate courses a year. You will give a research presentation or prepare a research poster each year in the department.

Our research students are actively involved in a lively academic community by means of seminars, lectures, journal clubs, working groups and social events. They receive training in modern probability, stochastic processes, statistical methodology, computational methods and transferable skills, in addition to specialised topics relevant to specific application areas. In particular, a broad structured programme of training in modern statistical methodology is available via courses in the Academy for Postgraduate Training in Statistics (APTS), of which the Department is a founding member.

Information about application deadlines, entry requirements and funding are available from the DPhil in Statistics prospectus page on the University of Oxford website.

Research Areas

Supervisors for DPhil projects are listed below, with links to the research group page.

Computational Statistics and Machine Learning

Supervisor: Professor François Caron Bayesian Statistics, Statistical Machine Learning, Statistical Network Analysis, Bayesian Nonparametrics.

Supervisor: Professor Mihai Cucuringu Development and mathematical & statistical analysis of algorithms that extract information from massive noisy data sets. Computationally-hard inverse problems on large graphs with applications in machine learning. Spectral and semidefinite programming algorithms with application to ranking, clustering, group synchronization, phase unwrapping. Network analysis: community and core-periphery structure, network time series. Statistical analysis of financial data, statistical arbitrage, limit order books, risk models.

Supervisor: Professor George Deligiannidis Computational statistics, in particular theory and methodology for Monte Carlo methods, especially MCMC and SMC for high-dimensional targets; limit theorems and convergence rates for Markov chains and stochastic processes in general; random walks.

Supervisor: Professor Arnaud Doucet Possible research areas: Bayesian Computation, Monte Carlo methods, Statistical Machine Learning.

Supervisor: Professor Patrick Rebeschini Investigation of fundamental principles in high-dimensional probability, statistics and optimisation to design computationally efficient and statistically optimal algorithms for machine learning.

Supervisor: Professor Yee Whye Teh Machine learning. Probabilistic modelling, learning and inference.

Econometrics and Population Statistics

Supervisor: Professor Mihai Cucuringu Development and mathematical & statistical analysis of algorithms that extract information from high-dimensional noisy data sets, network time series, and certain computationally-hard inverse problems on large graphs. Particular areas of focus include statistical arbitrage, machine-learning for asset pricing, lead-lag detection, market microstructure, limit order books, synthetic data generation, as well as nonlinear dimensionality reduction techniques for high-dimensional time series data.

Supervisor: Professor Frank Windmeijer Causal Inference, Instrumental Variables Estimation (instrument selection using machine learning, weak instrument robust inference, bootstrap), Mendelian Randomisation.


Supervisor: Professor Julien Berestycki Branching processes, branching random walks, coalescence, fragmentation, population genetics, reaction-diffusion equations, front propagation, random trees.

Supervisor: Professor Alison Etheridge Stochastic analysis, especially problems related to stochastic modelling in population genetics.

Supervisor: Professor Christina Goldschmidt Research area: random discrete structures (eg trees and graphs) and their scaling limits.

Supervisor: Professor James Martin Probability theory, with strong links to statistical physics and theoretical computer science. Particular interests include: random graphs; interacting particle systems; models of random growth and percolation; models of coagulation and fragmentation; queueing networks.

Supervisor: Professor Gesine Reinert Investigation of networks such as protein-protein interaction networks and social networks in a statistically rigorous fashion. Often this will require some approximation, and approximations in statistics are another of my research interests. There is an excellent method to derive distances between the distributions of random quantities, namely Stein’s method, and I am interested in Stein’s method also from a theoretical viewpoint. The general area of my research falls under the category Applied Probability and many of the problems and examples I study are from the area of Computational Biology.

Supervisor: Professor David Steinsaltz Random dynamical systems, particularly with applications to population ecology. Evolutionary and biodemographic models of ageing.

Supervisor: Professor Matthias Winkel Probability and stochastic processes, in particular problems involving branching processes, Levy processes, fragmentation processes, random tree structures.

Protein Informatics

Supervisor: Professor Charlotte Deane Developing novel methodologies to understand and predict protein evolution, interaction, structure and function.

Supervisor: Professor Garrett Morris Developing novel therapeutics and improving our understanding of living systems at the molecular level, in particular methods development in computer-aided drug discovery. Harnessing the increasing amounts of experimental data, and the development of novel algorithms in chemoinformatics and bioinformatics, machine learning, network pharmacology, and structural biology, to help solve real-world drug discovery problems.

Oxford Protein Informatics Group

Statistical Genetics and Epidemiology

Supervisor: Professor Christl Donnelly Epidemiology of infectious disease; Real-time analysis of outbreaks; Biostatistics; Disease ecology; Applied statistics.

Supervisor: Professor Jotun Hein Algorithms in Bioinformatics, Computational Biology, Stochastic Models of Genealogies and Sequence Evolution, Mathematical Models of the Origin of Life, Stochastic Models of Network Evolution, Genome Analysis

Supervisor: Professor Simon Myers Statistical and population genomics (fine-scale population structure and migrations, recombination, natural selection on complex traits, association testing, demographic history), statistical approaches for single-cell data (RNA-seq, ATAC-seq), genetic determinants of speciation and fertility in mammals.

Supervisor: Professor Pier Palamara Computational methods for population genetics (natural selection, demographic history); statistical genetics (complex trait heritability, association); scalable methods for large genomic data sets.

Statistical Theory and Methodology

Supervisor:  Professor Robin Evans Graphical models; Causal inference; Marginal modelling; Combining causal information from different experimental settings; Confounding and selection bias; High-dimensional model selection, and low dimensional model selection in the presence of high-dimensional confounders.

Supervisor:  Professor Geoff Nicholls Applied Bayesian Statistics and Statistical Methods, focusing on building and fitting models for complex stochastic systems. Computational Statistics, in particular Monte Carlo Algorithms. Current projects: Multiple imputation and model misspecification; Monte Carlo filtering and inference for partial orders from rank data; Spatial Statistics and the location of texts; Phylogenetic inference for cultural traits.

Computational Biology and Bioinformatics

Queries about the DPhil in Statistics or MSc by Research in Statistics should be sent to [email protected]

Discover More

Our research.

Read about the research carried out at the Department, and find out more about our research groups.

Research Degrees FAQ

Find the answers to the most common questions about our research degrees.

Doctor of Philosophy

UW Biostatistics PhD student presenting at the Biostatistics Colloqium

The Doctor of Philosophy is an advanced degree, preparing you for careers such as independent investigators, collaborative biostatisticians, and educators.  A PhD in Biostatistics opens many opportunities for work in academia, government, and private industry.

Learn statistical theory, skills and techniques, and develop theory and applications of biostatistics. You will learn from internationally recognized faculty in UW’s Department of Biostatistics, and complete course work in biostatistics, statistics, and one or more public health or biomedical fields. As a PhD student, you will undertake research that advances the field of biostatistics and write a dissertation presenting your work. Earning a PhD in Biostatistics opens many opportunities for careers in academia, government, non-profit organizations, and private industry.

Course Pathway Options

The PhD program offers two pathways and both are typically completed in 4 to 5 years.

  • Standard Pathway - The PhD in Biostatistics Standard Pathway includes coursework in biostatistics, statistics, and one or more public health or biomedical fields.
  • Statistical Genetics Pathway -The PhD in Biostatistics Statistical Genetics Pathway provides training in the areas of statistical genetics, population genetics, and computational molecular biology.

Earn your degree from the top-ranked public biostatistics program

At UW, you will enter a strong community of cohorts. Capstone students move through classes together which builds friendships and provides a dynamic, interactive learning environment. Plus capstone students interact with MS Thesis and PhD students in both formal and informal settings.

Antonio Olivas Martinez

“Biostatistics students may get involved in a breadth of research areas. Our faculty collaborators which include those at UW, as well as Fred Hutch, Seattle Children's, and other organizations, are working on amazing projects and are willing to involve us in their areas of expertise, so I feel like I have opportunities to work on practically any topic.” "

  • Requirements
  • Courses & Timeline
  • 2023-24 PhD in Biostatistics Program Costs and Financial Support (pdf)
  • 2023-24 Student Fees (pdf)

Learn what it's like to be a student in the program.

Seattle waterfront

  • Bibliography
  • More Referencing guides Blog Automated transliteration Relevant bibliographies by topics
  • Automated transliteration
  • Relevant bibliographies by topics
  • Referencing guides

Dissertations / Theses on the topic 'Statistical genetics'

Create a spot-on reference in apa, mla, chicago, harvard, and other styles.

Consult the top 50 dissertations / theses for your research on the topic 'Statistical genetics.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

Qiao, Dandi. "Statistical Approaches for Next-Generation Sequencing Data." Thesis, Harvard University, 2012.

Bruen, Trevor Cormac Vincent. "Discrete and statistical approaches to genetics." Thesis, McGill University, 2006. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=102964.

Baillie, John Kenneth. "Statistical genetics in infectious disease susceptibility." Thesis, University of Edinburgh, 2013.

Oldmeadow, Christopher. "Latent variable models in statistical genetics." Thesis, Queensland University of Technology, 2009.

Mitchell, Brittany L. "Statistical genetic analyses of neuropsychological traits." Thesis, Queensland University of Technology, 2022.

Hudson, Julie. "Maternal Gene-Environment Effects: An Evaluation of Statistical Approaches to Detect Effects and an Investigation of the Effect of Violations of Model Assumptions." Thesis, Université d'Ottawa / University of Ottawa, 2019.

Casale, Francesco Paolo. "Multivariate linear mixed models for statistical genetics." Thesis, University of Cambridge, 2016.

Csilléry, Katalin. "Statistical inference in population genetics using microsatellites." Thesis, University of Edinburgh, 2009.

Sperrin, Matthew. "Statistical methodology motivated by problems in genetics." Thesis, Lancaster University, 2010.

Lange, Christoph. "Generalized estimating equation methods in statistical genetics." Thesis, University of Reading, 2000.


Wright, David Jonathan. "Investigating statistical homogeneity of a human chromosome." Thesis, Queen Mary, University of London, 1995.

Ngong, Chiano Mathias. "Statistical problems in human genetic linkage analysis." Thesis, University of Cambridge, 1994.

Liesch, Rahel. "Statistical Genetics for the Budset in Norway Spruce." Thesis, Uppsala University, Department of Mathematics, 2005.

Jung, Min Kyung. "Statistical methods for biological applications." [Bloomington, Ind.] : Indiana University, 2007.

Choy, Yan-tsun. "Statistical evaluation of mixed DNA stains." Click to view the E-thesis via HKUTO, 2009.

Yu, Xiaoqing. "Statistical Methods and Analyses for Next-generation Sequencing Data." Case Western Reserve University School of Graduate Studies / OhioLINK, 2014.

Yung, Godwin Yuen Han. "Statistical methods for analyzing genetic sequencing association studies." Thesis, Harvard University, 2016.

Zang, Yong, and 臧勇. "Robust tests under genetic model uncertainty in case-control association studies." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2011.

Choy, Yan-tsun, and 蔡恩浚. "Statistical evaluation of mixed DNA stains." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2009.

Shringarpure, Suyash. "Statistical Methods for studying Genetic Variation in Populations." Research Showcase @ CMU, 2012.

Cordell, Heather Jane. "Statistical methods in the genetic analysis of type 1 diabetes." Thesis, University of Oxford, 1995.

Mathieson, Iain. "Genes in space : selection, association and variation in spatially structured populations." Thesis, University of Oxford, 2013.

Ahiska, Bartu. "Reference-free identification of genetic variation in metagenomic sequence data using a probabilistic model." Thesis, University of Oxford, 2012.

Bos, David H. "Statistical genetics and molecular evolution of major histocompatibility complex genes." Thesis, University of Canterbury. Biological Sciences, 2005.

Lundell, Jill F. "Tuning Hyperparameters in Supervised Learning Models and Applications of Statistical Learning in Genome-Wide Association Studies with Emphasis on Heritability." DigitalCommons@USU, 2019.

Vaez, Torshizi Rasoul. "Quantitative genetic analyses of production and reproduction traits in Australian merino sheep." Thesis, The University of Sydney, 1996.

Lee, Yiu-fai, and 李耀暉. "Analysis for segmental sharing and linkage disequilibrium: a genomewide association study on myopia." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2009.

Lu, Li. "Some actuarial and statistical investigations into topics on genetics and insurance." Thesis, Heriot-Watt University, 2006.

Shen, Xia. "Novel Statistical Methods in Quantitative Genetics : Modeling Genetic Variance for Quantitative Trait Loci Mapping and Genomic Evaluation." Doctoral thesis, Uppsala universitet, Beräknings- och systembiologi, 2012.

Golding, Pauline Lindsay. "Development of a statistical method for the identification of gene-environment interactions." Thesis, University of Edinburgh, 2012.

Zorrilla, Luc. "Beyond high mutation highrecombination limit in statisticalgenetics." Thesis, KTH, Fysik, 2021.

Ciampa, Julia Grant. "Multilocus approaches to the detection of disease susceptibility regions : methods and applications." Thesis, University of Oxford, 2012.

Guturu, Harendra. "Deciphering human gene regulation using computational and statistical methods." Thesis, Stanford University, 2014.

It is estimated that at least 10-20% of the mammalian genome is dedicated towards regulating the 1-2% of the genome that codes for proteins. This non-coding, regulatory layer is a necessity for the development of complex organisms, but is poorly understood compared to the genetic code used to translate coding DNA into proteins. In this dissertation, I will discuss methods developed to better understand the gene regulatory layer. I begin, in Chapter 1, with a broad overview of gene regulation, motivation for studying it, the state of the art with a historically context and where to look forward.

In Chapter 2, I discuss a computational method developed to detect transcription factor (TF) complexes. The method compares co-occurring motif spacings in conserved versus unconserved regions of the human genome to detect evolutionarily constrained binding sites of rigid transcription factor (TF) complexes. Structural data were integrated to explore overlapping motif arrangements while ensuring physical plausibility of the TF complex. Using this approach, I predicted 422 physically realistic TF complex motifs at 18% false discovery rate (FDR). I found that the set of complexes is enriched in known TF complexes. Additionally, novel complexes were supported by chromatin immunoprecipitation sequencing (ChIP-seq) datasets. Analysis of the structural modeling revealed three cooperativity mechanisms and a tendency of TF pairs to synergize through overlapping binding to the same DNA base pairs in opposite grooves or strands. The TF complexes and associated binding site predictions are made available as a web resource at

Next, in Chapter 3, I discuss how gene enrichment analysis can be applied to genome-wide conserved binding sites to successfully infer regulatory functions for a given TF complex. A genomic screen predicted 732,568 combinatorial binding sites for 422 TF complex motifs. From these predictions, I inferred 2,440 functional roles, which are consistent with known functional roles of TF complexes. In these functional associations, I found interesting themes such as promiscuous partnering of TFs (such as ETS) in the same functional context (T cells). Additionally, functional enrichment identified two novel TF complex motifs associated with spinal cord patterning genes and mammary gland development genes, respectively. Based on these predictions, I discovered novel spinal cord patterning enhancers (5/9, 56% validation rate) and enhancers active in MCF7 cells (11/19, 53% validation rate). This set replete with thousands of additional predictions will serve as a powerful guide for future studies of regulatory patterns and their functional roles.

Then, in Chapter 4, I outline a method developed to predict disease susceptibility due to gene mis-regulation. The method interrogates ensembles of conserved binding sites of regulatory factors disrupted by an individual's variants and then looks for their most significant congregation next to a group of functionally related genes. Strikingly, when the method is applied to five different full human genomes, the top enriched function for each is reflective of their very different medical histories. These results suggest that erosion of gene regulation results in function specific mutation loads that manifest as disease predispositions in a familial lineage. Additionally, this aggregate analysis method addresses the problem that although many human diseases have a genetic component involving many loci, the majority of studies are statistically underpowered to isolate the many contributing loci.

Finally, I conclude in Chapter 5 with a summary of my findings throughout my research and future directions of research based on my findings.

Hu, Xianghong. "Statistical methods for Mendelian randomization using GWAS summary data." HKBU Institutional Repository, 2019.

Li, Yong-Jun. "The application of statistical physics in bioinformatics /." View Abstract or Full-Text, 2003.

Allchin, Lorraine Doreen May. "Statistical methods for mapping complex traits." Thesis, University of Oxford, 2014.

McCaskie, Pamela Ann. "Multiple-imputation approaches to haplotypic analysis of population-based data with applications to cardiovascular disease." University of Western Australia. School of Population Health, 2008.

Silver, Matthew. "Statistical methods in neuroimaging genetics : pathways sparse regression and cluster size inference." Thesis, Imperial College London, 2013.

Kecskemetry, Peter D. "Computationally intensive methods for hidden Markov models with applications to statistical genetics." Thesis, University of Oxford, 2014.

Dilthey, Alexander Tilo. "Statistical HLA type imputation from large and heterogeneous datasets." Thesis, University of Oxford, 2012.

Sharif, Maarya. "Statistical issues in modelling the ancestry from Y-chromosome and surname data." Thesis, University of Glasgow, 2012.

Fernandez, Daniel. "Cell States and Cell Fate: Statistical and Computational Models in (Epi)Genomics." Thesis, Harvard University, 2015.

Crisci, Jessica L. "On Identifying Signatures of Positive Selection in Human Populations: A Dissertation." eScholarship@UMMS, 2013.

Crisci, Jessica L. "On Identifying Signatures of Positive Selection in Human Populations: A Dissertation." eScholarship@UMMS, 2006.


Ramasamy, Adaikalavan. "Increasing statistical power and generalizability in genomics microarray research." Thesis, University of Oxford, 2009.

Silva, Heyder Diniz. "Aspectos biométricos da detecção de QTL'S ("Quantitative Trait Loci") em espécies cultivadas." Universidade de São Paulo, 2001.


Su, Zhan. "Statistical methods for the analysis of genetic association studies." Thesis, University of Oxford, 2008.


  1. Lecture

  2. genetics terminology

  3. Genetics introduction

  4. Bio121 Lecture 25 Mendelian Genetics

  5. Thank you from Dr Mathias Seviiri

  6. Phylogenetic comparative approaches to uncover the genomic basis of species’ phenotypic differences


  1. Biostatistics Dissertations

    1947-2022. Harvard affiliates with an ID number and PIN can get free download of dissertations, both Harvard and other, on the Digital Access to Scholarship at Harvard site via DASH . Click on "By Collections" in the left hand menu under Statistics, and select "FAS Theses and Dissertations" in the list. Choose the option "This Collection" under ...

  2. Statistical Genetics

    PhD with the Statistical Genetics Pathway. Students in our Doctor of Philosophy (PhD) in Biostatistics program may choose a pathway in statistical genetics which provides rigorous training in the areas of statistical genetics, population genetics, and computational molecular biology. Learn more.

  3. PhD Theses

    PhD Theses. 2023. Title. Author. Supervisor. Statistical Methods for the Analysis and Prediction of Hierarchical Time Series Data with Applications to Demography. Daphne Liu. Adrian E Raftery. Exponential Family Models for Rich Preference Ranking Data.

  4. Statistical Methods and Computational Tools for Genetics and Genomics Data

    My dissertation centers around developing statistical methods and analyzing diverse omics data types. In this dissertation, we propose several effective and efficient statistical and computational methods to address critical biological problems encountered in distinct genomics fields including spatial transcriptomics, single cell, and bulk RNA ...

  5. PDF Statistical Methods for Imaging Genetics

    This thesis focuses on genetic associations for multiple imaging phenotypes: em-bracing issues arising from current research using secondary phenotypes and developing two novel statistical methods using multiple imaging phenotypes. For the rst work, we illustrate a cautionary note on using secondary phenotypes in imaging genetics. Anal-

  6. Master of Science Thesis Course Electives

    Statistical Genetics I: Mendelian Traits (3) Offered: jointly with STAT 550; Spring: BIOST 551: Statistical Genetics II: Quantitative ... *This course does not count as an elective for PhD or Thesis students entering in 2024 or later. BIOST 504 or PHI 500 is a requirement for this cohort. Students should register for either BIOST 504 or PHI 500 ...

  7. PDF Statistical Methodology Motivated by Problems in Genetics

    1.3 Focus of Thesis This work focuses on the development of two distinct statistical techniques of current interest for genetic data. We rst consider nite Bayesian mixture models. Mixture models are thought to have been rst used by ?, and are now widely used in genetics. They can be used to model sub-

  8. Statistical Genetics

    Statistical geneticists at SPH develop statistical methods for understanding the genetic basis of human diseases and traits. These methods involve large-scale data sets from candidate-gene, genome-wide and resequencing studies, using both unrelated and related individuals. SPH statistical geneticists collaborate with other investigators at SPH ...

  9. PDF Statistical Genetics

    Statistical sampling: The variation among repeated samples from the same population ("fixed" sampling). Inferences can be made about that particular population. Genetic sampling: The variation among replicate (conceptual) populations ("random" sampling). Inferences are made to all populations with the same history. 23

  10. Statistical Genetics for Genomic Data Analysis

    Abstract. In this chapter, we briefly summarize the emerging statistical concepts and approaches that have been recently developed and applied to the analysis of genomic data such as microarray gene expression data. In the first section we introduce the general background and critical issues in statistical sciences for genomic data analysis.

  11. Statistical and ascertainment problems in human genetics

    This thesis was scanned from the print manuscript for digital preservation and is copyright the author. Researchers can access this thesis by asking their local university, institution or public library to make a request on their behalf. Monash staff and postgraduate students can use the link in the References field.

  12. Master of Science Thesis Requirements for 2023

    The Master of Science in Biostatistics Thesis is a 63-credit degree program. This page provides an overview of requirements for students entering the program in 2023 or prior. Course Credits. 25 credits of core curriculum courses; 11 credits of elective courses; 6 seminar credits; 3 credits for Statistical Consulting (BIOST 590) 18 credits for ...

  13. Statistical Analysis Of Genetic Sequence Variants In Whole Exome

    Statistical Bioinformatics. In no particular order, I would also like to thank my thesis committee panel members viz, Dr. Jonathon E. Mohl, Dr. Amy Wagler and Dr. Lela Vukovic for their invaluable co-operation and support towards the success of this thesis dissertation. Worth

  14. Dissertation or Thesis

    Third, given imaging-genetic data from more than one studies, we theoretically compared the ensembled learner and merged learner in the prediction problem, where learners are trained using the multivariate varying coefficient model and multi-study data are assumed to come from a mixed model, where the mixed effect represents inter-study ...

  15. Statistical Genetics

    Thesis and Dissertations >> MS Thesis; PhD Dissertation; Statistics Club; Slideshow. Statistical Genetics. Personnel. Liang Liu Professor. My research interests include phylogenetics, modeling biological data, statistical analysis of molecular data, and parallel computing. I am particularly interested in reconstructing species phylogenies from ...

  16. The Application of Statistical Modeling to Identify Genetic

    The thesis contains a paper intended for publication in a peer reviewed journal. The paper contains a description of the data and the genetic association analysis results, along with details of the overall statistical modeling framework that includes steps for data curation, demographic analysis, assumption checking, the linear mixed modeling

  17. Mathematics, genetics and evolution

    The importance of mathematics and statistics in genetics is well known. Perhaps less well known is the importance of these subjects in evolution. The main problem that Darwin saw in his theory of evolution by natural selection was solved by some simple mathematics. It is also not a coincidence that the re-writing of the Darwinian theory in Mendelian terms was carried largely by mathematical ...

  18. 8 Powerful Statistical Methods in Genetics Research

    In this blog post, we will take a look at 8 statistical methods that have been used in genetics research, particularly for big data. Lasso penalized regression and association mapping. Imagine searching for a needle in a haystack. Well, that's precisely what researchers often face when trying to identify genetic variants associated with ...

  19. Computational Methods for Population Genetics

    The field of population genetics has seen an unprecedented growth driven by the advancement of sequencing technologies, resulting in volumes of massive datasets. As a result, efficient computational methods backed by theoretical foundations are required to analyze and understand the intricate details of complex biological processes captured in the genetic code. To this end, we developed novel ...

  20. DPhil in Statistics

    The final thesis is normally submitted for examination during the fourth year and is followed by the viva examination. ... Computational methods for population genetics (natural selection, demographic history); statistical genetics (complex trait heritability, association); scalable methods for large genomic data sets.

  21. Doctor of Philosophy

    Learn statistical theory, skills and techniques, and develop theory and applications of biostatistics. You will learn from internationally recognized faculty in UW's Department of Biostatistics, and complete course work in biostatistics, statistics, and one or more public health or biomedical fields. As a PhD student, you will undertake research that advances the field of biostatistics and ...

  22. Statistical genetics

    Statistical genetics is a scientific field concerned with the development and application of statistical methods for drawing inferences from genetic data. The term is most commonly used in the context of human genetics. Research in statistical genetics generally involves developing theory or methodology to support research in one of three ...

  23. Dissertations / Theses: 'Statistical genetics'

    Dissertations / Theses on the topic 'Statistical genetics' To see the other types of publications on this topic, follow the link: Statistical genetics. Author: Grafiati. Published: 4 June 2021 Last updated: 20 February 2023 Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles ...

  24. ScienceDirect

    ScienceDirect is a leading platform for peer-reviewed scientific research, covering a wide range of disciplines and topics. If you are looking for an article published in 2020 in the Journal of the American College of Cardiology, you can use the advanced search function to filter by journal, year, and keyword. You can also browse related webpages to find more articles of interest.