├── DIR1
│ ├── smp1.fastq.gz
│ ├── smp1_fastqc/
│ ├── smp2.fastq.gz
│ └── smp2_fastqc/
└── DIR2
├── smp3.fastq.gz
├── smp3_fastqc/
├── smp4.fastq.gz
└── smp4_fastqc/
我想按样本计算读取次数,然后按目录连接所有计数。
我创建一个字典,将样本 1 和 2 链接到目录 1,将样本 3 和 4 链接到目录 2
DIRS,SAMPLES = glob_wildcards(INDIR+'/{dir}/{smp}.fastq.gz')
# Create samples missing
def filter_combinator(combinator, authlist):
def filtered_combinator(*args, **kwargs):
for wc_comb in combinator(*args, **kwargs):
if frozenset(wc_comb) in authlist:
yield wc_comb
return filtered_combinator
# Authentification
combine_dir_samples = []
for dir in DIRS:
samples, = glob_wildcards(INDIR+'/'+dir+'/{smp}.fastq.gz')
for smp in samples:
combine_dir_samples.append( { "dir" : dir, "smp" : smp} )
combine_dir_samples = { frozenset( x.items() ) for x in combine_dir_samples }
dir_samples = filter_combinator(product, combine_dir_samples)
然后,我创建一个规则来按样本计算我的读取次数
rule all:
input:
expand(INDIR+'/{dir}/{smp}_Nreads.txt', dir_samples, dir=DIRS, smp=SAMPLES)
rule countReads:
input:
INDIR+'/{dir}/{smp}_fastqc/fastqc_data.txt'
output:
INDIR+'/{dir}/{smp}_Nreads.txt'
shell:
"grep 'Total\ Sequences' {input} | awk '{{print {wildcards.dir},$3}}' > {output}"
---------------------------------------------------------------
# result ok
├── DIR1
│ ├── smp1_Nreads.txt
│ └── smp2_Nreads.txt
└── DIR2
├── smp3_Nreads.txt
└── smp4_Nreads.txt
> cat smp1_Nreads.txt
DIR1 15082186
我尝试为我的连接规则使用不同的输入语法
expand(OUTFastq+'/{dir}/FastQC/{{smp}}_Nreads.txt', dir_samples, dir=DIRS)
lambda wildcards: expand(OUTFastq+'/{dir}/FastQC/{wildcards.smp}_Nreads.txt', dir_samples, dir=DIRS, smp=SAMPLES)
expand(OUTFastq+'/{dir}/FastQC/{wildcards.smp}_Nreads.txt', dir_samples, dir=DIRS, smp=SAMPLES)
我没有找到任何解决方案,就像它不关心我的字典中的这条规则一样。
慕村225694
相关分类