Liability threshold MAPIT

library(mvMAPIT)

In this tutorial we will illustrate how to use the Liability Threshold MArginal ePIstasis Test (LT-MAPIT) by Crawford and Zhou (2018)1. For this purpose we will first simulate synthetic data and then analyze it.

The data are single nucleotide polymorphisms (SNPs) with simulated genotypes. For the simulation we choose the following set of parameters:

  1. population_size - # of samples in the population
  2. n_snps - number of SNPs or variants
  3. PVE - phenotypic variance explained/broad-sense heritability (H2)
  4. rho - measures the portion of H2 that is contributed by the marignal (additive) effects
  5. disease_prevalence - assumed disease prevelance in the population
  6. sample_size - # of samples to analyze
population_size <- 1e4
n_snps <- 2e3
pve <- 0.6
rho <- 0.5
disease_prevalence <- 0.3
sample_size <- 500

Simulate random genotypes

Simulate the genotypes such that all variants have minor allele frequency (MAF) > 0.05.

maf <- 0.05 + 0.45 * runif(n_snps)
random_genotypes   <- (runif(population_size * n_snps) < maf) + (runif(population_size *
                                                                         n_snps) < maf)
random_genotypes   <- matrix(as.double(random_genotypes), population_size, n_snps, byrow = TRUE)

Simulate liabilities

We can use the mvMAPIT function simulate_traits to simulate the liabilities. See the tutorial on simulations for more details on that.

n_causal <- 100
n_epistatic <- 10
simulated_data <- simulate_traits(
    random_genotypes,
    n_causal = n_causal,
    n_trait_specific = n_epistatic,
    n_pleiotropic = 0,
    d = 1,
    H2 = pve,
    rho = rho
)

Now that we have the liabilities, we can assign case-control labels according to the disease prevelance parameter. We will treat this like the LT-MAPIT paper and take an equal number of cases and controls.

liabilities <- simulated_data$trait
threshold <- qnorm(1 - disease_prevalence, mean = 0, sd = 1)
case_control_trait <- rep(0, population_size)
case_control_trait[liabilities > threshold] <- 1

# Subsample a particular number of cases and controls
cases <- sample(which(liabilities > threshold), sample_size / 2, replace = FALSE)
controls <- sample(which(liabilities <= threshold), sample_size / 2, replace = FALSE)
y <- as.integer(case_control_trait[c(cases, controls)])
X <- simulated_data$genotype[c(cases, controls), ]

Run MAPIT with Case-Control trait

To run MAPIT with case-control traits, we need to convert the traits back to liabilities. The function binary_to_liability provides this conversion. NOTE: The binary_to_liability function is an approximation that is only suited for low prevalence in the disease trait.

y_liabilities <- binary_to_liability(y, disease_prevalence)

lt_mapit <- mvmapit(
        t(X),
        t(y_liabilities),
        test = "hybrid"
)

References

1: Lorin Crawford, Xiang Zhou (2018) Genome-wide Marginal Epistatic Association Mapping in Case-Control Studies bioRxiv 374983; https://doi.org/10.1101/374983