In this tutorial we will illustrate how to use the Liability Threshold MArginal ePIstasis Test (LT-MAPIT) by Crawford and Zhou (2018)1. For this purpose we will first simulate synthetic data and then analyze it.
The data are single nucleotide polymorphisms (SNPs) with simulated genotypes. For the simulation we choose the following set of parameters:
population_size <- 1e4
n_snps <- 2e3
pve <- 0.6
rho <- 0.5
disease_prevalence <- 0.3
sample_size <- 500
Simulate the genotypes such that all variants have minor allele frequency (MAF) > 0.05.
We can use the mvMAPIT function simulate_traits
to
simulate the liabilities. See the tutorial on simulations for more
details on that.
n_causal <- 100
n_epistatic <- 10
simulated_data <- simulate_traits(
random_genotypes,
n_causal = n_causal,
n_trait_specific = n_epistatic,
n_pleiotropic = 0,
d = 1,
H2 = pve,
rho = rho
)
Now that we have the liabilities, we can assign case-control labels according to the disease prevelance parameter. We will treat this like the LT-MAPIT paper and take an equal number of cases and controls.
liabilities <- simulated_data$trait
threshold <- qnorm(1 - disease_prevalence, mean = 0, sd = 1)
case_control_trait <- rep(0, population_size)
case_control_trait[liabilities > threshold] <- 1
# Subsample a particular number of cases and controls
cases <- sample(which(liabilities > threshold), sample_size / 2, replace = FALSE)
controls <- sample(which(liabilities <= threshold), sample_size / 2, replace = FALSE)
y <- as.integer(case_control_trait[c(cases, controls)])
X <- simulated_data$genotype[c(cases, controls), ]
To run MAPIT with case-control traits, we need to convert the traits
back to liabilities. The function binary_to_liability
provides this conversion. NOTE: The
binary_to_liability
function is an approximation that is
only suited for low prevalence in the disease trait.
1: Lorin Crawford, Xiang Zhou (2018) Genome-wide Marginal Epistatic Association Mapping in Case-Control Studies bioRxiv 374983; https://doi.org/10.1101/374983