HLA区域reads提取

一般的HLA分析软件需求输入fastq文件或bam文件,更建议是从fastq出发。使用全外的数据或大panel数据比对后,再从比对完成后的bam文件中提取出HLA区域的reads,形成较小的fastq,方便后续分析。

可以使用samtools进行提取,例

samtools view sample.bam \
	chr6:28477797-33448354 chr6_apd_hap1:1-4622290 \
	-b > sample.HLA.bam
samtools view sample.bam -bh -f 12 -@ 8 > sample.unmapped.bam
samtools merge sample.merge.bam sample.HLA.bam sample.unmapped.bam
samtools sort -n sample.merge.bam -@ 8 -o sample.sort.bam
samtools fastq sample.sort.bam \
	-1 sample.HLA.R1.fastq \
	-2 sample.HLA.R2.fastq \
	-s /dev/null \
	-@ 8

提取区域例子修改为以下区域

hg19
chr6:28477797-33448354
chr6_apd_hap1:1-4622290
chr6_cox_hap2:1-4795371
chr6_dbb_hap3:1-4610396
chr6_mann_hap4:1-4683263
chr6_mcf_hap5:1-4833398
chr6_qbl_hap6:1-4611984
chr6_ssto_hap7:1-4928567

hg38的参考了不同的HLA基因,范围缩小了一点

hg38
chr6:28510120-33480577
chr6_GL000250v2_alt:1066038-4433734
chr6_GL000251v2_alt:1283988-4540572
chr6_GL000252v2_alt:1063230-4372611
chr6_GL000253v2_alt:1062914-4548533
chr6_GL000254v2_alt:1062887-4416229
chr6_GL000255v2_alt:1063190-4323464
chr6_GL000256v2_alt:1106450-4577757