I am a software engineer with a background in High Performance Computing. I have developed parallel software (both multi-threaded and distributed with MPI) in a variety of languages including C, C++, and Python. In addition to HPC projects, I have developed a variety of analysis and data management tools in Python (both web applications and command line). I have also helped design and am currently the primary maintainer of the Civet pipeline framework, which drives analysis pipelines used for the JAX Cancer Treatment Profile and PDX.
We have created a high-density SNP resource encompassing 7.87 million polymorphic loci across 49 inbred mouse strains of the laboratory mouse by combining data available from public databases and training a hidden Markov model to impute missing genotypes in the combined data. The strong linkage disequilibrium found in dense sets of SNP markers in the laboratory mouse provides the basis for accurate imputation. Using genotypes from eight independent SNP resources, we empirically validated the quality of the imputed genotypes and demonstrated that they are highly reliable for most inbred strains. The imputed SNP resource will be useful for studies of natural variation and complex traits. It will facilitate association study designs by providing high-density SNP genotypes for large numbers of mouse strains. We anticipate that this resource will continue to evolve as new genotype data become available for laboratory mouse strains. The data are available for bulk download or query at http://cgd.jax.org /.