一般的HLA分析软件需求输入fastq文件或bam文件,更建议是从fastq出发。使用全外的数据或大panel数据比对后,再从比对完成后的bam文件中提取出HLA区域的reads,形成较小的fastq,方便后续分析。
可以使用samtools进行提取,例
samtools view sample.bam \
chr6:28477797-33448354 chr6_apd_hap1:1-4622290 \
-b > sample.HLA.bam
samtools view sample.bam -bh -f 12 -@ 8 > sample.unmapped.bam
samtools merge sample.merge.bam sample.HLA.bam sample.unmapped.bam
samtools sort -n sample.merge.bam -@ 8 -o sample.sort.bam
samtools fastq sample.sort.bam \
-1 sample.HLA.R1.fastq \
-2 sample.HLA.R2.fastq \
-s /dev/null \
-@ 8
提取区域例子修改为以下区域
hg19 |
---|
chr6:28477797-33448354 |
chr6_apd_hap1:1-4622290 |
chr6_cox_hap2:1-4795371 |
chr6_dbb_hap3:1-4610396 |
chr6_mann_hap4:1-4683263 |
chr6_mcf_hap5:1-4833398 |
chr6_qbl_hap6:1-4611984 |
chr6_ssto_hap7:1-4928567 |
hg38的参考了不同的HLA基因,范围缩小了一点
hg38 |
---|
chr6:28510120-33480577 |
chr6_GL000250v2_alt:1066038-4433734 |
chr6_GL000251v2_alt:1283988-4540572 |
chr6_GL000252v2_alt:1063230-4372611 |
chr6_GL000253v2_alt:1062914-4548533 |
chr6_GL000254v2_alt:1062887-4416229 |
chr6_GL000255v2_alt:1063190-4323464 |
chr6_GL000256v2_alt:1106450-4577757 |