sample | TCGA-CA-5256-01 | TCGA-AZ-6599-11 | TCGA-AA-3655-01 | TCGA-A6-6137-01 | TCGA-CK-4952-01 | TCGA-A6-5657-01 | TCGA-AD-6963-01 | TCGA-AA-3663-11 |
ARHGEF10L | 10.1616 | 11.1212 | 11.0245 | 11.0576 | 10.566 | 10.4189 | 10.8635 | 11.0543 |
HIF3A | 3.7172 | 2.3437 | 2.0858 | 6.0759 | 1.9506 | 5.4777 | 4.4634 | 8.4492 |
RNF17 | 0 | 0 | 0.5495 | 0 | 0 | 0 | 0 | 0 |
上面第一行显示样本名字,样本名字最后的两个数字:01代表癌组织,11代表正常组织,有办法只提取正常组织的列吗?万分感谢
刚学R不太会,用python写的把csv文件里你的要求写到一个新的csv文集里
import re
input_file=open('yourfile.csv')
output_file=open('result.csv','w')
table=[]
for line in input_file:
line=line.strip().split(',')
table.append(line)
new_table=zip(*table)
pattern=re.compile(".*-11")
result_table=[]
for line in new_table:
match=pattern.match(line[0])
if match:
result_table.append(line)
result_table=zip(*result_table)
for line in result_table:
line=','.join(line)+'\n'
output_file.write(line)
output_file.close()
input_file.close()