如何将 AWS Athena 的多记录多行 JSON 转换为每记录单行 JSON?

我想将 json 文件与 AWS Athena 结合使用,但 Athena 不支持多行 JSON。


我有以下内容(其中一个值是 XML)


{

  "id" : 10,

  "name" : "bob",

  "data" : "<some> \n <xml> \n <in here>"

},

{

  "id" : 20,

  "name" : "jane",

  "data" : "<other> \n <xml> \n <in here>"

}

我需要雅典娜以下内容


{ "id" : 10, "name" : "bob", "data" : "<some> <xml> <in here>" },

{ "id" : 20, "name" : "jane", "data" : "<other> <xml> <in here>" }

我正在使用 RazorSQL 从 DB2 导出数据,并尝试使用 Python 编写一些代码来“扁平化”它,但尚未成功。


谢谢你!


BIG阳
浏览 155回答 3
3回答

慕尼黑8549860

我最终做了一些快速而肮脏的事情import jsonwith open('data.json') as jfile:&nbsp; &nbsp; data = json.load(jfile)&nbsp; &nbsp; for d in data:&nbsp; &nbsp; &nbsp; &nbsp; print(json.dumps(d) + ',')哪个打印{'id': 200, 'name': 'bob', 'data': '<other> \n <xml> \n <data>'},{"id": 200, "name": "bob", "data": "<other> \n <xml> \n <data>"},刚刚将输出保存到另一个文件:P结果失败了,因为文件太大了,但是嘿..已经很接近了!

LEATH

使用正则表达式import rehtml = '''{&nbsp; "id" : 10,&nbsp; "name" : "bob",&nbsp; "data" : "<some> \n <xml> \n <in here>"},{&nbsp; "id" : 20,&nbsp; "name" : "jane",&nbsp; "data" : "<other> \n <xml> \n <in here>"}'''def replaceReg(html, regex, new):&nbsp; &nbsp; return re.sub(re.compile(regex), new, html)html = replaceReg(html,' \n ',' ')html = replaceReg(html,'{[\s]+','{ ')html = replaceReg(html,'[\s]+}',' }')html = replaceReg(html,',[\s]+',', ')html = replaceReg(html,'}, ','\n')print (html)结果:{ "id" : 10, "name" : "bob", "data" : "<some> <xml> <in here>"&nbsp;{ "id" : 20, "name" : "jane", "data" : "<other> <xml> <in here>" }

HUH函数

您只需要在写入另一个文件时替换结束换行符(\n ):s=''with open('input.txt','r') as f_in, open('output.txt', 'w') as f_out:&nbsp; &nbsp; for line in f_in:&nbsp; &nbsp; &nbsp; &nbsp;&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; s += line.replace('\n', '')&nbsp; &nbsp; f_out.write(s)其中 input.txt 具有以下数据:{&nbsp; "id" : 10,&nbsp; "name" : "bob",&nbsp; "data" : "<some> \n <xml> \n <in here>"},{&nbsp; "id" : 20,&nbsp; "name" : "jane",&nbsp; "data" : "<other> \n <xml> \n <in here>"}
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python