Beruflich Dokumente
Kultur Dokumente
By Siddharth Krishnakumar
Thomas Jefferson High School For Science and Technology, Alexandria, Virginia
The variant call format (VCF) is a common genetic file in text format that lists variants in genomes. It VCFDataPy is a software program written in the Python programming language. It extracts certain fields from the VCF for all individuals, including chromosome, position,
includes sequencing data from one or more individuals. Each row corresponds to a genetic variant genotypes, read depth, and allelic depth, and creates a table. VCFDataPy provides a visualization of identity-by-state (IBS) between pairs of individuals in a
including its chromosome, position, the nature of the variant, and annotation such as read depth. A VCF mother/father/child trio. The visualization of IBS states between any two individuals identifies related individuals such as parent child relationships and distantly related
file has from thousands to millions of rows. There are relatively few genetic visualization tools for this file individuals. IBS is calculated by the number of alleles the 2 individuals have in common. VCFDataPy also uses the calculated IBS values to find the type and origin of the
format. Additionally, each genetic tool typically analyzes just one type of genetic abnormality, so it is very inheritance patterns such as bi-parental inheritance, uniparental disomy, and Mendelian inconsistency. IBS and inheritance analyses can help identify deletions and single
time consuming to get a full picture of an individual’s genome. Therefore, a genetic visualization tool that nucleotide variants. VCFDataPy plots the IBS2* fraction, a metric to evaluate relatedness between 2 individuals. The formula is IBS2*/ (IBS2*+IBS0) yielding a ratio close
works on the VCF format and includes many of the types of chromosomal analysis (duplications, to 1 for parent-child relationships and 2/3 for unrelated individuals (such as mother-father). VCFDataPy plots the read depth of the father, mother, and child to analyze copy
deletions, genetic relatedness, etc.) is needed. number variations between the individuals as read depth is a proxy for copy number. If the read depth level at a certain place in the chromosome is different for the child
compared to the parents, then there is a chance of a copy number variation (duplication or deletion) in the child. VCFDataPy also plots the allelic depth of the father’s copy
passed down to the child and the mother’s copy passed down to the child. We calculated the p-values for a sliding window to determine if chromosomal abnormalities were
significant in the allelic depth and IBS plots or if they were just due to chance.
RESULTS
The father-child and mother-child
relationships have very little IBS=0(noise) These are the IBS2* fraction plots for a maternal
but the father-mother relationship has large deletion. The IBS2* fraction for the mother-child
amounts of IBS=0. IBS=3 is a proxy for a no relationship Suddenly drops to .25. This indicates
call, where the read depth is not enough to a maternal deletion.
definitively state nucleotide at the position.
These are the IBS 2* plots for a normal This is a read depth plot of a chromosome with
chromosome. The father-child one and a deletion in the child close to the end of the ACKNOWLEDGEMENTS
mother-child one have IBS2* Fractions chromosome. The difference between the
close to 1, while the father-mother one was parent plots and the child’s plot signifies a
around 2/3. Position represents the window deletion. I would like to thank Dr. Jonathan Pevsner of the Kennedy Kreiger Institute for guiding me in this project.
number