EpiClass’s documentation!

Overview:

EpiClass

Optimizing and predicting performance of DNA methylation biomarkers using sequence methylation density information.

Quick Install:

We recommend to make conda environment with python==3.7 first:

conda create -n name python==3.7 pip

pip install EpiClass

Installation from GitHub:

A copy of the package can be obtained by downloading the git repository:

git clone https://github.com/bmill3r/EpiClass

Required dependencies can be found in epiclass_env.yml. Importantly, epiclass is meant for python 3.7.

samtools 1.6 was used in its development. It is required for running epiclass READtoMD on .bam sequence alignment files.

We first recommend installing in a fresh python virtual environment, either mediated by conda or virtualenv, using the epiclass_env.yml.

For conda:

cd EpiClass

conda env create -f epiclass_env.yml

conda activate epiclass

Then install:

python setup.py build

python setup.py install

Alternatively:

pip install EpiClass-*.*.*.tar.gz

(found in: /EpiClass/dist/)

Check that epiclass is installed with:

epiclass -V

or:

epiclass -h

The Vignette:

For a deeper insight into how the code works and generating the manuscript figures, check out the vignette and associated jupyter notebooks:

https://github.com/bmill3r/EpiClass/blob/master/manuscript_figures/vignette/README_Vignette.ipynb

Quick Usage:

Generate methylation density table ({DREAMtoMD|READtoMD}.DT.{timestamp}.csv)

From either sequencing alignment reads or DREAMing methylation melt data.

For sequences:

  1. Each file must be its own .sam or .bam file. Point to the directory that contains the files to analyze.

  2. Alignment files must be from Bismark (https://www.bioinformatics.babraham.ac.uk/projects/bismark/).

  3. Command to process the sequencing reads:

    epiclass READtoMD -i path/to/files --interval BED/with/genome/coordinates
    

The interval file is a bed file that has genomic coordinates for which methylation density information will be extracted for the samples. If multiple intervals passed, the analysis will be done using pooled information from all of them. Otherwise, can pass “chr:start-stop” for a single genomic region.

For DREAMing data:

  1. Use a raw melt temp .csv file as input. A custom template is provided.

  2. Command to process the DREAMing melt data:

    epiclass DREAMtoMD -i rawmelt.csv -temps tempsToMDs.csv -cpg numberCpGs
    

tempsToMDs.csv should be a two column file indicating the methylation density a melting temperature corresponds to for the given locus for which DREAMing was performed. numberCpGs indicates the number of internal CpGs within the given locus for which DREAMing was performed. Both of these are required.

Methylation Density Binary Classifier (MDBC) performance

Compute the optimal methylation density cutoff for the case and control samples, given the sequence methylation density profiles for the genomic region of interest (or pooled genomic regions:

epiclass MDBC -i densityTable.csv -a cases -b controls

By default, returns summary tables containing the TPR, 1 - FPR, AUC, and optimal read cutoff for each methylation density cutoff (MDC). Will also return ROCs and boxplots for the optimal MD.

This is done using either the sample read counts or the sample read fractions. As in, the number of reads with a given methylation density or higher, or the sample fraction of reads with a given methylation density or higher.

More extensive plots and sample information can be obtained with additional command flags.

Samples can be normalized based on relative sample fractions, and specific methylation density cutoffs can also be used.

For more information:

use:

epiclass -h

epiclass READtoMD -h

epiclass DREAMtoMD -h

epiclass MDBC -h

Troubleshooting:

Overall: 1. Using full, absolute paths is probably a good idea…

Even more detailed information:

Follow the Documentation link under ‘Contents” below:

Indices and tables