Meta simu

This file describe use cases related to meta-analysis.

GWAMA

To run the example provided with GWAMA software, use this following:

cd $COMORMENT/containers/reference/examples/gwama
apptainer exec --home $PWD:/home $SIF/gwas.sif GWAMA -qt

Each GWA study file has mandatory column headers: MARKERNAME –snp name EA – effect allele NEA – non effect allele OR – odds ratio OR_95L – lower confidence interval of OR OR_95U – upper confidence interval of OR study files might also contain columns: N – number of samples EAF – effect allele frequency STRAND – marker strand (if the column is missing then the program expects all markers being on positive strand) IMPUTED – if marker is imputed of not (if the column is missing then all markers are counted as directly genotyped ones)

A GWAMA pointer file is needed, which simply is a .txt file that contains the path of the summary statistics files for the meta analysis. The above columns are needed for GWAMA analysis.

GWAMA \
-i /GWAMA_pointer_file.txt \
--name_marker ID \
--name_n OBS_CT \
--name_ea ea \
--name_nea nea \
--name_or OR \
--name_or_95l L95 \
--name_or95u U95 \
-o /output_path

For more informatino about GWAMA, see genomics.ut.ee/en/tools.

METAL

METAL tool for meta-analysis (METAL_Documentation) is available in gwas.sif container and can be executed as follows:

apptainer exec --home $PWD:/home $SIF/gwas.sif metal

Here is an example script for METAL analysis. If this file is named metal_script.txt and it’s stored in your local folder, you may then trigger the analysis using apptainer exec --home $PWD:/home $SIF/gwas.sif metal metal_script.txt command. See comments below, and refer to the METAL documentation for more information.

# choose STDERR scheme (variance-based meta-analysis)
SCHEME   STDERR

# define variables that need to be accumulated across GWASes for each SNP.
# such CUSTOMVARIABLE columns are optional, but it's a good practice to accumulate per-SNP sample size across studies.
# it's also reasonale to compute per-study effective sample size, i.e. 4/(1/nca + 1/nco), and accumulate this value across studies.
CUSTOMVARIABLE NCASE
CUSTOMVARIABLE NCONTROL

# Best to meta-analyze raw summary statistics. It case of a major problems 
with lambdaGC this needs to be looked into manually.
# GENOMICCONTROL ON

# Define how columns are named
MARKER   SNP
ALLELE   A1 A2
EFFECT   log(OR)
STDERR   SE
PVAL     PVAL

# Process a summary statistics file using the above configuration
PROCESS <path>/PGC_MDD_2018_no23andMe.sumstats.gz

# change configuration before processing the next file. This is optional if 
# all files have the same column names in this example only BETA has changed, 
# so it would be fine to omit MARKER, ALLELE, STDERR and PVAL lines.
MARKER   SNP
ALLELE   A1 A2
EFFECT   BETA
STDERR   SE
PVAL     PVAL

# Process the next summary stats file. Keep adding PROCESS command for each 
# sumstat file that needs to be meta-analysed.
PROCESS <path>/23andMe_MDD_2016.sumstats.gz

# define output file name
OUTFILE <path>/PGC_MDD_2018_with23andMe_ .csv

ANALYZE