A framework for genome-wide characterization of DNA methylation and variant effect prediction

berry Unified 5mC methylation compendium

A unified inter-consortium 5mC methylation dataset.

File Name Description Download
berry_for_ML.tar.gz
  • global_mapping_ML.bed - BED file of coordinates (hg38)
  • global_mapping_tracks.h5 - HDF5 file of methylation profile values
  • global_mapping_track.labels.txt - profile names
Download

berry MeC

Methylation classes.

File Name Description Download
Methylation_classes.tsv The are four columns included:
  • chrom - Chromosome identifier
  • bin_start - Start position of the genomic bin (hg38)
  • bin_end - End position of the genomic bin (hg38)
  • methylation_class - Methylation classes id. The annotation is available in the Supplementary Table 2.
Download

hedgehog Hedgehog

The training/evaluation data for the Hedgehog sequence model, the model weights and the demo case applying the model.

File Name Description Download
hedgehog_h5_datasets.tar.gz Hedgehog training and evaluation HDF5 datasets Download
hedgehog.pth The final model checkpoint (.pth) encoding the Hedgehog model weights. Download
predict_example.tar.gz Variant effect prediction input and outputs in accompany with the example in the GitHub repo. Download
predict_data.tar.gz Variant effect prediction data folder in accompany with the GitHub repo.
  • README.md contains detailed descriptions of the files.
Download
HGMD_predictions.tar.gz Complete prediction results for 1,121 curated regulatory noncoding pathogenic variants. For each variant, all CpG loci within the sequence window are included.
  • README.md contains detailed descriptions of the files.
Download
Help

Please post in the GitHub issue with questions about the data, provide feedback, etc.

License

The detailed license is documented here. If you are interested in obtaining the software for commercial use, please contact Office of Technology Licensing, Princeton University (Cortney L Cavanaugh, ccavanaugh@princeton.edu or otl@princeton.edu).