读取 JSON 文件并将其格式化为 CSV

我必须读取一个 json 文件并提取数据以生成一个 CSV 文件。


服务器是 Redhat 7,python 是 Python 2.7.5


import time

import os

import sys

import json


with open('abcdc04_abcd11_ig_Host_metrics.json') as data_file:

    data = json.load(data_file)



with open('abcdc04_abcd11_ig_Host_metrics.txt', 'w') as f:


    for row in data:

        symmetrixID= row['symmetrixID']

        HostID= row['HostID']

        HostMBReads= row['HostMBReads']

        timestamp= row['timestamp']

        joined = ",".join([symmetrixID , HostID, HostMBReads , timestamp])

        f.write(joined)

结果是:


Traceback (most recent call last):

  File "./json_scv", line 23, in <module>

    symmetrixID= row['symmetrixID']

TypeError: string indices must be integers

我想要一个像这样的 csv 格式:


SymmID,HostName,TimeStamp,HostIOs,HostMBs,ResponseTime,Reads,Writes,HostMBReads,HostMBWrites,ReadResponseTime,WriteResponseTime SyscallCount

000123401234,jupiter_ig,1553637600000,0.12666667,0.000494792,0.15257895,0.12666667,0,0.000494792,0,0.15257895,0,0.21333334

000123401234,jupiter_ig, 1553637600000,0.1264559,0.000493968,0.15828949,0.1264559,0,0.000493968,0,0.15828949,0,0.123128116

000123401234,jupiter_ig,1553637600000,0 ,0,0,0,0,0,0,0,0,0.2


心有法竹
浏览 216回答 1
1回答

慕沐林林

您的名称变量data最终应该是字典,而不是列表。因此,当您尝试执行“ for row in data:”时,您是在说“对字典中的每个键执行以下操作”,而不是针对列表中的项目!字典没有排序,但无论哪个键首先被选择为row,该命令都会失败,因为它无法在其中找到任何名为 " symmetrixID" 的内容。HostID例如,如果是循环中选取的第一个键,则row['symmetrixID']表示data['HostID']['symmetrixID']。如果你仔细观察,字典中只有一个列表可以迭代,那就是data["perf_data"]. 所以尝试那里的循环。所以暂时把你的数据放在一个字符串中:s = """{&nbsp; "symmetrixID": "000123401234",&nbsp;&nbsp; "HostID": "jupiter_ig",&nbsp;&nbsp; "perf_data": [&nbsp; &nbsp; {&nbsp; &nbsp; &nbsp; "HostMBReads": 0.00024720083,&nbsp;&nbsp; &nbsp; &nbsp; "timestamp": 1553637300000,&nbsp;&nbsp; &nbsp; &nbsp; "Writes": 0.0,&nbsp;&nbsp; &nbsp; &nbsp; "ReadResponseTime": 0.15273508,&nbsp;&nbsp; &nbsp; &nbsp; "Reads": 0.06328341,&nbsp;&nbsp; &nbsp; &nbsp; "WriteResponseTime": 0.0,&nbsp;&nbsp; &nbsp; &nbsp; "ResponseTime": 0.15273508,&nbsp;&nbsp; &nbsp; &nbsp; "SyscallCount": 0.09326678,&nbsp;&nbsp; &nbsp; &nbsp; "HostMBWrites": 0.0,&nbsp;&nbsp; &nbsp; &nbsp; "HostIOs": 0.06328341,&nbsp;&nbsp; &nbsp; &nbsp; "MBs": 0.00024720083&nbsp; &nbsp; },&nbsp;&nbsp; &nbsp; {&nbsp; &nbsp; &nbsp; "HostMBReads": 0.0004939684,&nbsp;&nbsp; &nbsp; &nbsp; "timestamp": 1553637600000,&nbsp;&nbsp; &nbsp; &nbsp; "Writes": 0.0,&nbsp;&nbsp; &nbsp; &nbsp; "ReadResponseTime": 0.15828949,&nbsp;&nbsp; &nbsp; &nbsp; "Reads": 0.1264559,&nbsp;&nbsp; &nbsp; &nbsp; "WriteResponseTime": 0.0,&nbsp;&nbsp; &nbsp; &nbsp; "ResponseTime": 0.15828949,&nbsp;&nbsp; &nbsp; &nbsp; "SyscallCount": 0.123128116,&nbsp;&nbsp; &nbsp; &nbsp; "HostMBWrites": 0.0,&nbsp;&nbsp; &nbsp; &nbsp; "HostIOs": 0.1264559,&nbsp;&nbsp; &nbsp; &nbsp; "MBs": 0.0004939684&nbsp; &nbsp; },&nbsp;&nbsp; &nbsp; {&nbsp; &nbsp; &nbsp; "HostMBReads": 0.0,&nbsp;&nbsp; &nbsp; &nbsp; "timestamp": 1553637900000,&nbsp;&nbsp; &nbsp; &nbsp; "Writes": 0.0,&nbsp;&nbsp; &nbsp; &nbsp; "ReadResponseTime": 0.0,&nbsp;&nbsp; &nbsp; &nbsp; "Reads": 0.0,&nbsp;&nbsp; &nbsp; &nbsp; "WriteResponseTime": 0.0,&nbsp;&nbsp; &nbsp; &nbsp; "ResponseTime": 0.0,&nbsp;&nbsp; &nbsp; &nbsp; "SyscallCount": 0.2,&nbsp;&nbsp; &nbsp; &nbsp; "HostMBWrites": 0.0,&nbsp;&nbsp; &nbsp; &nbsp; "HostIOs": 0.0,&nbsp;&nbsp; &nbsp; &nbsp; "MBs": 0.0&nbsp; &nbsp; }&nbsp; ],&nbsp;&nbsp; "reporting_level": "Host"}"""这是我如何格式化数据:import jsondata = json.loads(s)symmetrixID= data['symmetrixID']HostID= data['HostID']for row in data['perf_data']:&nbsp; &nbsp; HostMBReads = row['HostMBReads']&nbsp; &nbsp; timestamp = row['timestamp']&nbsp; &nbsp; joined = ",".join([str(c) for c in [symmetrixID, HostID, HostMBReads, timestamp]])&nbsp; &nbsp; print(joined)注意我改变了你的joined表情。如果您不先将所有这些浮点值更改为字符串,join则将不起作用。无论如何,您可以用print您需要的写入命令替换该命令。
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python