A systematic analysis of the gene and variation content of the extended HLA region
M. Dorak (Liverpool, GB)
We explored the unique genomic features of the extended HLA (xHLA) region (chr6: 25,726,131 to 33,400,601bp) in the latest genome assembly (GRCh38) to gain insight into the gene content. xHLA makes up 0.24% of the genome, and contains 674 genes (1.1% of total genes) of which 453 are protein-coding genes (2.3% of total). The protein-coding gene proportion within xHLA is 67.2%, which is higher than the rest of the genome (32.7%, P<0.0001). The non-coding genes make up only 8.0% of all genes within xHLA (42.6% in the rest of the genome; P<0.0001) with only 13 microRNA and seven recognised long non-coding RNA (lncRNA) genes. The pseudogene content of xHLA is similar to the rest of the genome (25.5 vs 24.0%). We extracted the current SNP list from Ensembl (n=470,343). xHLA contains 0.4% of all SNPs in the human genome. The most SNP-dense region is the HLA-DR region (18071 SNPs in 32.5 to 32.6 Mb) followed by the -DQ region (12189 in 32.6 to 32.7Mb). xHLA contains a higher proportion of missense SNPs (7.4%) than the rest of the genome (2.7%) as reported by NCBI ENTREZ SNP. We used PredictSNP2 algorithm to assess SNPs functionality of xHLA SNPs, and found that 45,302 (11.2%) of them were deleterious. The majority of deleterious SNPs were intergenic (18,610 or 41.1%). Rare nonsense mutations consisted of 2.7% (n=1240) of the deleterious SNPs within xHLA. Of all xHLA SNPs, 8139 were present in the COSMIC database as somatic cancer mutations. The proportion of COSMIC SNPs among the deleterious SNPs was higher (2.5 vs 1.9%, P<0.0001). Plotting the density of deleterious SNPs across xHLA and sliding window analysis identified a hotspot (305/477 = 63.9%) for deleterious SNPs between 31,274 kb and 31,281 kb centromeric to HLA-C and containing two pseudogenes (USP8P1, RPL3P2). The deleterious SNPs of this region included risk markers for type 1 diabetes (rs2524067), multiple sclerosis (rs7382297) and psoriasis (rs3132486) as well as strong eQTLs for HCG22 (rs7382307, rs9264731, rs3930575, rs7382297). Only three of the 305 deleterious SNPs in this region were also cancer somatic mutations. In summary, xHLA makes up 0.24% of the genome, but contains 2.3% of protein-coding genes (but only 0.2% of non-coding genes) and 0.4% of all SNPs with a high missense SNP proportion. We also show that deleterious SNP distribution is not homogeneous across xHLA.
A bioinformatics algorithm to detect and quantify low frequency HLA-genotypes in NGS data
S. Bentink (Dresden, DE)
Next-generation sequencing (NGS) has evolved into a cost and time efficient approach for routine HLA typing. The corresponding data analysis strategies are fundamentally based on the assumption of a diploid genotype with balanced numbers of sequencing reads from one (homozygous) or two (heterozygous) alleles. However, chimeric samples with more than two haplotypes may arise in particular after partially HLA-mismatched hematopoietic stem cell transplantation, and their detection may be diagnostically relevant for instance in the context of post-transplant relapse. To aim at the detection of alleles at very low copy numbers, the strategy for NGS data analysis needs to be adapted. Here, we describe and validate a novel bioinformatics algorithm to detect rare HLA sequences. First, sequencing reads are mapped against the expected alleles in a chimeric sample. Next, only the polymorphic positions differentiating the expected alleles are excised from the alignment. Finally, reads are classified based on the frequency of discriminatory di-nucleotides. The classification algorithm was validated using in vitro admixtures of DNA samples down to allelic frequencies of 0.5%. Exons 2 and 3 of 6 HLA-loci (HLA-A, -B, -C, -DQB1, -DRB1, and -DPB1) were amplified and sequenced by NGS. We found that sensitivity and specificity are allele and locus-specific and depend on the number of sequence differences between the alleles. In each of the analyzed admixture samples a subset of the 12 amplicons harbored sufficient sequence differences to detect and discriminate genotypes at dilutions as low as 0.5% and to predict allele frequencies at different dilutions with high accuracy (R^2 = 0.93). The proposed bioinformatics solution may support the application of cost effective NGS for the sensitive detection of HLA microchimerism in different settings including transplantation and pregnancy.
Integrative genome-wide analysis identify variants located at microRNA binding sites associated with pemphigus foliaceus
D. Augusto (Curitiba, BR)
Pemphigus foliaceus (PF) is an autoimmune blistering skin disease characterized by the production of IgG4 antibodies against desmoglein 1, a cell adhesion protein of the desmosome. Its endemic form reaches the prevalence of 3.2% in the Terena tribe, an Amerindian population of Brazil. PF is a complex disease, with multiple genes and environmental factors associated with its susceptibility. Approximately 90% of single nucleotide polymorphisms (SNPs) associated with complex diseases are located in non-coding regions of the genome, including microRNA (miRNA) target sites. MiRNAs are involved in many biological processes and SNPs located within the 3′ untranslated region of their target genes can alter miRNA binding sites (miRSNP), contributing to the pathogenesis of diseases. To identify miRSNPs that could be associated with PF, we used genome-wide genotyping data (Illumina platform) of 6,916 individuals (235 cases and 6,681 controls). To select miRSNPs we developed a Naïve Bayesian classifier approach that integrated results from three different algorithms (PolyMirts, MirSNP and mirSNP score). Considering a score > 0.7, 338,894 SNPs were predicted to affect the binding sites of 3,017 microRNAs. We found 2,596 miRSNPs in the genome-wide chip array, which were used for further investigation. Additionally, we performed a logistic regression analysis using 4 principal components as covariants to correct for possible population structure. We found 5 miRSNPs associated with PF (p-value < 10-3) predicted to affect the binding of 15 miRNAs and 5 genes. The strongest association SNP (rs7195, p = 1.2 x 10-4, OR = 0.5) is predicted to disrupt a binding site of HLA-DRA. Overall, these results suggest that SNPs that alter miRNA binding sites are associated with PF. Therefore, our findings indicate that miRNAs may regulate genes that influence PF etiology and contribute to its pathogenesis.
Easy-HLA: a package of applications to reveal the full details of HLA typing.
A. Walencik (Nantes, FR)
The major limiting factor of retrospective analysis of large datasets with HLA genotypes is the difficulty in using archived typings recorded in their initial state (low/mid resolution) and used for clinical purposes. However, for research use and population level analysis, assumptions and computations can be made to facilitate large scale study. Easy-HLA is a web application software based on validated and published HLA haplotype frequencies. It combines several modules for different research uses: Easymatch-R (for likelihood computation), HLA-upgrade (statistical resolution of HLA typing ambiguity), HLA-2-haplo (Haplotype inference), HLA-expr (statistical imputation of HLA expression), HLA-AA (Amino Acid equivalence of HLA allele) and HLA-KIRlig (classification of KIR Ligand). Easymatch-R gives the likelihood of finding a specific genotype. It was validated retrospectively in 200 individuals from an unrelated stem cell donor database. HLA-upgrade can statistically resolve ambiguous typings and NMDP codes, it predicts a complete HLA-A, -B, -C, -DRB1, -DQB1 genotype from a limited typing knowledge. It was further validated in 1500 samples with all 5 loci typed at 4 digits. HLA-2-Haplo predicts the pair of haplotypes from a given genotype and their frequencies in different populations. It was validated on 300 family-determined haplotypes (from the parents). HLA-expr gives the HLA-C predicted expression (based on allele specific mean HLA-C expression). HLA-AA gives the amino acids of the HLA alleles as well as their haplotypes. HLA-KIRlig tells in which group belong the HLA alleles of an individual in term of KIR binding: C1/C2 groups, Bw4/Bw6 and KIR2DL2 ligand. Easy-HLA is a package built to uncover the full details of the HLA genotypes in an individual (single request) or in a cohort (batch request) and delivers data in the form of database compatible related tables which can be easily used to perform analyses. Easy-HLA is available online: http://hla.univ-nantes.fr.
An updated neXtype algorithm for high-throughput KIR genotyping delivering allele-level resolution
J. Pruschke (Tuebingen, DE)
The KIR gene family plays an important role in the immune system and has been reported to impact the outcome of hematopoietic stem cell transplantation. Due to the polymorphism of these genes, allele-level typing results are difficult to achieve. Previously, neXtype delivered absence/presents calls for the KIR genes. Now, the software has been extended to report KIR at the allelic level in genotype list (GL) string format. We employ a short amplicon approach for next generation sequencing to obtain data from exons 3, 4, 5, 7, 8, and 9 with Illumina sequencers. Allele calling is based on the IPD-KIR library. Copy numbers of the allelic variants per exon are determined employing calibrated read counts. For each gene, neXtype computes a score for each possible combination of exon-specific allele calls as potential results. The scoring evaluates the deviation between the observed copy number per exon and the copy number as needed for a specific result. Special attention is needed for gene-bridging exons as the corresponding sequences cannot be assigned to a specific gene a priori. Validation with a set of 93 pre-typed samples revealed a concordance rate of >99%. To further verify consistency, a set of 190 samples (16 loci per sample) was processed 10 times. Out of the 3040 loci, 3025 were typed consistently. 2043 of these results yielded allelic level, 222 loci could be typed as present only, and 760 loci as absent. Of the 15 erroneous results, 13 are due to discrepant copy numbers while two are due to discrepancies on absence/presence level. In our high-throughput workflow, 197,071 samples were assigned a valid KIR typing within 3 months. Of these, 84% were processed automatically. For 1,895,256 loci, allelic level was achieved while 389,881 were reported as present, and 867,999 as absent. In summary, we show that high-throughput KIR allele-level typing is possible with a low error rate. Consequently, high-resolution KIR genotyping is included into DKMS recruitment typing profile.
The influence of HLA resolution level on population comparisons strongly depends on loci and geographic ranges
D. Di (Geneva 4, CH)
For most HLA classical loci, namely HLA-A, -B, -C, -DRB1, -DQA1 and -DQB1, 1st-field and 2nd-field are the two main resolution levels of the frequency data reported in population studies, depending on the method used for HLA typing. First-field level data are sometimes considered as less accurate and potentially introducing bias in the results of population comparisons, although they are still much more abundant than 2nd-field data. To better understand how the resolution level influences the results of HLA population studies, we explored a huge data set from several International Histocompatibility and Immunogenetics Workshops (IHIW) and retained 524 worldwide populations for which 2nd-field data were available. First-field level typing was obtained for these data by re-coding the 2nd-field level and several population genetics statistics were calculated at both levels. Our results show that at the six HLA loci aforementioned, both the genetic diversity within populations and the genetic distances among populations computed on the 1st-field data are strongly correlated to those estimated on the 2nd-field data , despite a general underestimation of both measures using 1st-field data. Interestingly, the degree of correlation varies among the loci: the strongest association is found at HLA-A, which may be related to this locus being less affected by balancing selection pressure. Our approach also reveals that the impact of the resolution level substantially increases when populations from different continents are compared, because similar 1st-field level alleles observed at comparable frequencies often correspond to distinct 2nd-field level alleles across distant geographic regions. However, despite systematic distortions for some populations, the overall patterns of multi-dimensional population plots are conserved between 1st- and 2nd-field levels. In addition to their relevance for population genetic studies, these results increase our understanding of the molecular evolution of the different HLA loci.
Studies of potential miRNAs associated with the regulation of the immune checkpoint HLA-G molecule.
E. Donadi (São Paulo, BR)
The HLA-G molecule presents a limited tissue distribution and a restricted group of cell lineages have been used to study HLA-G regulation. It is known that microRNAs are involved in post-transcriptional gene regulation, and that the HLA-G 3’ untranslated region (3’UTR) is associated with several disorders. Also, in silico studies have revealed several microRNAs that potentially target HLA-G 3’UTR polymorphic and non-polymorphic regions. In light of this, we evaluated the microRNA profiles of cells that constitutively HLA-G1-7 isoforms (JEG-3cells), cells that lost HLA-G1 and express HLA-G2 (FON-), cells transfected with HLA-G5 (M8G5), and cells that do not express HLA-G (untransfected M8, M8 transfected with an empty plasmid and U251 cells). Next Generation Sequencing was used to identify the differential microRNA profiles, and various bioinformatics analyses were performed to evaluate the differential expression of miRNAs. Only microRNAs exhibiting fold change (FC) >1.5, with P values <0.05 and false discovery rates (FDR) <0.05 were considered when evaluating the microRNA profile. A total of 1495 miRNAs were identified. The multiple comparisons revealed the following numbers of differentially expressed microRNAs: i) JEG-3 versus FON- (414 microRNAs), ii) JEG-3 versus M8 (463) and iii) JEG-3 versus U251 (387), iv) FON- versus M8 (412), v) FON- versus U251 (286), and vi) U251 versus M8 (325 miRNAs. According to in silico studies, some of the differentially expressed microRNAs have been previously described to have a high affinity and/or a high specificity (hsa-miR-148b-3p, hsa-miR-365b-5p, hsa-miR-148a-3p, hsa-miR-152) for polymorphic sites observed at HLA-G 3’UTR, after comparing cells that express HLA-G with cells that do not express HLA-G. Other differentially expressed microRNAs also targeted non-polymorphic segments of the HLA-G 3’UTR. These microRNAs are good candidates to be studied in the context of HLA-G regulation.
Additional polymorphism of HLA-C alleles: Alternative splicing of HLA-C results in exon 5 skipping
C. Voorter (Maastricht, NL)
Human Leukocyte Antigen (HLA) molecules play a pivotal role in the immune system by presenting peptides to immune cells. The majority of HLA alleles currently known are based on incomplete DNA analysis, and despite the 17th International Histocompatibility Workshop aim of encouraging full-length gene sequences, data on mRNA transcript sequences are still lacking. In the IPD-IMGT/HLA Database, a number of structural splice variants have been described and annotated, based on polymorphism in the splice site consensus sequence. However, the lack of RNA/cDNA SBT analysis underestimates the actual number of HLA splice variants. In the present study, we performed cDNA sequencing to study alternative splicing of HLA-C. This revealed the presence of different mRNA splice variants. One of the most intriguing splice variants involves exon 5 skipping. This mRNA variant was identified in freshly isolated peripheral blood mononuclear cells of healthy individuals and in B cell lines and is co-expressed together with the normally spliced mature transcript. For each HLA-C allele group 5 different individuals have been tested and the presence of the mRNA variant is dependent on the allele group present, implicating a role for polymorphism in this alternative splicing process. Furthermore, since exon 5 encodes the transmembrane region, the translation of the mRNA variant might result in soluble HLA-C molecules. The functional implication on the HLA-C expression and influence on NK cell recognition is further investigated.