--- title: "Liability threshold MAPIT" output: rmarkdown::html_vignette description: > Learn how to convert binary traits to liabilities. vignette: > %\VignetteIndexEntry{Liability threshold MAPIT} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup} library(mvMAPIT) ``` In this tutorial we will illustrate how to use the Liability Threshold MArginal ePIstasis Test (LT-MAPIT) by [Crawford and Zhou (2018)](https://doi.org/10.1101/374983)$^1$. For this purpose we will first simulate synthetic data and then analyze it. The data are single nucleotide polymorphisms (SNPs) with simulated genotypes. For the simulation we choose the following set of parameters: 1. **population_size** - # of samples in the population 2. **n_snps** - number of SNPs or variants 3. **PVE** - phenotypic variance explained/broad-sense heritability ($H^2$) 4. **rho** - measures the portion of $H^2$ that is contributed by the marignal (additive) effects 5. **disease_prevalence** - assumed disease prevelance in the population 6. **sample_size** - # of samples to analyze ```{r parameters, eval = F} population_size <- 1e4 n_snps <- 2e3 pve <- 0.6 rho <- 0.5 disease_prevalence <- 0.3 sample_size <- 500 ``` ## Simulate random genotypes Simulate the genotypes such that all variants have minor allele frequency (MAF) > 0.05. ```{r random_genotypes, eval = F} maf <- 0.05 + 0.45 * runif(n_snps) random_genotypes <- (runif(population_size * n_snps) < maf) + (runif(population_size * n_snps) < maf) random_genotypes <- matrix(as.double(random_genotypes), population_size, n_snps, byrow = TRUE) ``` ## Simulate liabilities We can use the mvMAPIT function `simulate_traits` to simulate the liabilities. See the tutorial on simulations for more details on that. ```{r simulate_traits, eval = F} n_causal <- 100 n_epistatic <- 10 simulated_data <- simulate_traits( random_genotypes, n_causal = n_causal, n_trait_specific = n_epistatic, n_pleiotropic = 0, d = 1, H2 = pve, rho = rho ) ``` Now that we have the liabilities, we can assign case-control labels according to the disease prevelance parameter. We will treat this like the LT-MAPIT paper and take an equal number of cases and controls. ```{r case_control, eval = F} liabilities <- simulated_data$trait threshold <- qnorm(1 - disease_prevalence, mean = 0, sd = 1) case_control_trait <- rep(0, population_size) case_control_trait[liabilities > threshold] <- 1 # Subsample a particular number of cases and controls cases <- sample(which(liabilities > threshold), sample_size / 2, replace = FALSE) controls <- sample(which(liabilities <= threshold), sample_size / 2, replace = FALSE) y <- as.integer(case_control_trait[c(cases, controls)]) X <- simulated_data$genotype[c(cases, controls), ] ``` ## Run MAPIT with Case-Control trait To run MAPIT with case-control traits, we need to convert the traits back to liabilities. The function `binary_to_liability` provides this conversion. **NOTE:** The `binary_to_liability` function is an approximation that is only suited for low prevalence in the disease trait. ```{r mapit, eval = F} y_liabilities <- binary_to_liability(y, disease_prevalence) lt_mapit <- mvmapit( t(X), t(y_liabilities), test = "hybrid" ) ``` # References 1: Lorin Crawford, Xiang Zhou (2018) Genome-wide Marginal Epistatic Association Mapping in Case-Control Studies bioRxiv 374983;