猿问

将一行的一部分与python中另一个文件中的每一行进行比较

我正在尝试比较一个文件中的一行,并将每条匹配的行放入输出文件中的另一文件中。例如,这是第一个文件。


chr8    18      .       T       T       *       *

chr8    29      .       C       T       .       .

chr9    21      .       TA      T       .       .

chr18    22      .       C       T       .       .

chr18    23      .       A       G       .       .           

然后是另一个文件:


chr8    ensembl CDS     1       1042    .       -       0       gene_id "ENSCAFG00000031632"; gene_version "1"; transcript_id "ENSCAFT00000048171"; transcript_version "1"; exon_number "1"; gene_source "ensembl"; gene_biotype "protein_coding"; transcript_source "ensembl"; transcript_biotype "protein_coding"; protein_id "ENSCAFP00000042624"; protein_version "1";

chr8    ensembl CDS     27     1227    .       +       0       gene_id "ENSCAFG00000032228"; gene_version "1"; transcript_id "ENSCAFT00000037896"; transcript_version "2"; exon_number "1"; gene_source "ensembl"; gene_biotype "protein_coding"; transcript_source "ensembl"; transcript_biotype "protein_coding"; protein_id "ENSCAFP00000033535"; protein_version "2";

因此,我想获取第一个文件的每一行并查找每一行,然后搜索第一列是否匹配,如果第1列匹配,则文件1中的第二个数字在第4列和第5列的范围内。然后,如果它们匹配,则在第一个文件中的各行下写一个新文件,文件2下的所有匹配行都在其下。这是我尝试过的:


opt=''

with open('file1.vcf') as vfh:

    with open('file2.gtf') as gfh:

        for line in vfh:

                ct=0

                vll=line.split('\t')

                for gline in gfh:

                    gll=gline.split('\t')

                    if vll[0] == gll[0]:

                        if (int(vll[1]) > int(gll[3])) and (int(vll[1]) < int(gll[4])):

                            while ct < 1:

                                opt+=line

                                ct+=1

                            opt+=gline

with open('out.txt','w') as fh:

    fh.write(opt)

但是我从来没有得到想要的输出。


繁花如伊
浏览 231回答 2
2回答

慕桂英546537

找到了问题,只需要用公开声明来移动我的问题即可。另外,我在原始文件中添加了一些处理一些注释的内容:with open('a1.vcf') as vfh:&nbsp; &nbsp; for line in vfh:&nbsp; &nbsp; &nbsp; &nbsp; if '#' not in line[0]:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; ct=0&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; vll=line.split('\t')&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; with open('cds.gtf') as gfh:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; for gline in gfh:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; gll=gline.split('\t')&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if vll[0] == gll[0]:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if (int(vll[1]) > int(gll[3])) and (int(vll[1]) < int(gll[4])):&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; while ct < 1:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; opt+=line&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; ct+=1&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; opt+=gline

一只甜甜圈

我相信您的索引是错误的。if&nbsp;(int(vll[1])&nbsp;>&nbsp;int(gll[3]))&nbsp;and&nbsp;(int(vll[1])&nbsp;<&nbsp;int(gll[4])):“ vll [1]”是18“ gll [3]”是1042,因为“ ensembl CDS”似乎用“”而不是“ \ t”分隔。请尝试使用调试器并验证索引。
随时随地看视频慕课网APP

相关分类

Python
我要回答