Authors:
M. Maiers (Minneapolis, US)
M. Halagan (Minneapolis, MN , US)
W. Wang (Minneapolis, MN , US)
P. Bashyal (Minneapolis, MN , US)
E. Pearson (Minneapolis, MN , US)
F. Sheel (Ulm, DE)
M. Heuer (Berkeley, US)
Y. Bolon (Minneapolis, MN , US)
B. Milius (Minneapolis, MN , US)
C. Kennedy (Minneapolis, MN , US)
S. Mack (Oakland, US)
New sequencing technologies have increased demand for tools and methods for annotating and analyzing sequence data. The extreme allelic and structural polymorphisms present in HLA and KIR renders general genetic variation nomenclatures, as well as those used within the immunogenetics field, only marginally useful for describing 1) consensus sequence with partial phasing, 2) incomplete gene sequence coverage and 3) novel variants, especially intronic variants. In preparation for the 17th IHIWS, we have introduced open source web services that perform automated analysis of NGS consensus sequences and deliver Gene Feature Enumeration (GFE) strings, a computable shorthand description of consensus sequences. This GFE service [http://gfe.b12x.org/] accepts (curated or pre-curated) consensus sequences, performs alignment and annotation and leverages a simpler system for persisting sequence data called “feature service” [http://feature.nmdp-bioinformatics.org/]. Feature Service has been developed to authoritatively assign a unique identifier to any sequence indexed by its locus (any gene in the list maintained by the Human Genome Organization (HUGO)) and feature (any term in list maintained Sequence Ontology (SO)). We have demonstrated the utility of these services through the analysis of sequences generated from over 500K genotyping results from HLA, KIR, ABO and other blood group antigen gene families with a variety of levels of coverage and phasing. In situations where targeted sequencing is used (e.g. exons only) we have extended and applied the Genotype List format and GL Service (gl.nmdp.org) for representing and persisting information about phase and allelic ambiguity. Applied together, these tools become a new platform for accelerating the development NGS data analysis for population genetics (LD, HWE), disease association, peptide binding, expression and clinical histocompatibility.