用Python(或者其他语言)怎么把如下文件中的中文词条提取出来,并把这些中文做成json文件?

-------------------------------------------------------------------------------
File:D:\svn\aCenter\windows\dap\store\vdidc\web\vue-ui\src\components\datetime_range.vue
content:                'default': '至'
Line: 24
Time: 2018-03-26 08:46:13

-------------------------------------------------------------------------------
File:D:\svn\aCenter\windows\dap\store\vdidc\web\vue-ui\src\components\piece.vue
content:                <div><span class="branch-num">{{checkBranchNum}}</span><lang>个</lang><
Line: 6
Time: 2018-03-26 08:46:13

-------------------------------------------------------------------------------
File:D:\svn\aCenter\windows\dap\store\vdidc\web\vue-ui\src\components\piece.vue
content:                <div class="branch"><lang>分支</lang></div>
Line: 7
Time: 2018-03-26 0
........

比如文本中的,“至”,“个”,“分支”,做成json:

“至”:“至”,

“个”:“个”,

“分支”:“分支”

},

各位有什么骚代码都甩出来把。。。

慕莱坞森
浏览 596回答 3
3回答

GCT1015

import re s = '''File:D:\svn\aCenter\windows\dap\store\vdidc\web\vue-ui\src\components\datetime_range.vue content: 'default': '至' Line: 24 Time: 2018-03-26 08:46:13 ------------------------------------------------------------------------------- File:D:\svn\aCenter\windows\dap\store\vdidc\web\vue-ui\src\components\piece.vue content: <div><span class="branch-num">{{checkBranchNum}}</span><lang>个</lang>< Line: 6 Time: 2018-03-26 08:46:13 ------------------------------------------------------------------------------- File:D:\svn\aCenter\windows\dap\store\vdidc\web\vue-ui\src\components\piece.vue content: <div class="branch"><lang>分支</lang></div> Line: 7 Time: 2018-03-26 0''' p2 = re.compile(r'[^\u4e00-\u9fa5]') result = {i: i for i in " ".join(p2.split(s)).strip().split()} # {'个': '个', '至': '至', '分支': '分支'} 优雅的写在本地,比如你的文件是1.txt import re p2 = re.compile(r'[^\u4e00-\u9fa5]') with open('1.txt', 'r') as r: result = {i: i for i in ' '.join(p2.split(''.join(r.readlines()))).strip().split()} print(result) # {'个': '个', '分支': '分支', '至': '至'}

HUWWW

用规制式啊,字符编码在中文范围内的。这个关键是提取,用go语言好像比较方便,因为其内的规制式有中文标签 go处理中文

狐的传说

不是很推荐楼主的方案,不是很适合用中文作为键…
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Java