如何拆分字符串并将所有拆分添加到一个长列中?

我有一个包含一列和多行的数据框。每行包含一首歌曲的歌词,行由“\n”分隔,到目前为止我所拥有的是


with open('Lyrics_Pavement.json') as json_data:

data = json.load(json_data)

df = pd.DataFrame(data['songs'])

df1 = df.lyrics.str.split(pat="\n")

然后 df1 包含一个 1 列数据帧,其中歌词已被删除并被“[]”包围。


1    [It's the shouting, it's the shouting, It's the Dutchman, it's the Dutchman shout, Get it away, I don't need your shaft, It's the shouting, it's the shouting, It's the shouting, it's the Dutchman shout, Give it away, I don't need your shaft, (yes I do), It's the shouting, it's the shouting, It's the shouting, it's the Dutchman shout, Get it away, I don't need your shaft] 

这是第 1 行的示例。我如何让数据显示为这样:


It's the shouting,

It's the shouting,

It's the dutchman

等等。上面的每一新行都是数据帧的一行。然后对于第 2 行,将相同的歌词附加到该数据帧。


谢谢!

慕标琳琳
浏览 91回答 3
3回答

GCT1015

尝试:df1 = df.lyrics.str.split(pat="\n").explode()

森林海

我从你的帖子中得知,歌词df1只是一长串,而不是实际的list?如果是这种情况,那么我只需使用内置字符串方法将该split字符串用逗号连接起来,然后重新组装成数据帧:s = "[It's the shouting, it's the shouting, It's the Dutchman, it's the Dutchman shout, Get it away, I don't need your shaft, It's the shouting, it's the shouting, It's the shouting, it's the Dutchman shout, Give it away, I don't need your shaft, (yes I do), It's the shouting, it's the shouting, It's the shouting, it's the Dutchman shout, Get it away, I don't need your shaft]"lines = [i.strip() for i in s[1:-1].split(',')]df = pd.DataFrame(lines)输出:                          00         It's the shouting1         it's the shouting2         It's the Dutchman3   it's the Dutchman shout4               Get it away5   I don't need your shaft6         It's the shouting7         it's the shouting8         It's the shouting9   it's the Dutchman shout10             Give it away11  I don't need your shaft12               (yes I do)13        It's the shouting14        it's the shouting15        It's the shouting16  it's the Dutchman shout17              Get it away18  I don't need your shafts[1:-1]省略括号.split(',')用逗号分隔.strip()删除多余的空格lines = s[1:-1].split(', ')如果您知道每首歌词之间总是有一个逗号+一个空格,您也可以这样做。如果您的完整歌词是 的一部分df1,您可以loc(或w/e)访问该字符串并遵循此答案。

繁星淼淼

import pandas as pdlongstring = '''It's the shouting, it's the shouting, It's the Dutchman, it's the Dutchman shout, Get it away, I don't need your shaft, It's the shouting, it's the shouting, It's the shouting, it's the Dutchman shout, Give it away, I don't need your shaft, (yes I do), It's the shouting, it's the shouting, It's the shouting, it's the Dutchman shout, Get it away, I don't need your shaft'''splitstring = [e.strip()+"," for e in longstring.split(",")]splitstring[-1] = splitstring[-1].replace(",","")df1 = pd.DataFrame(splitstring)print(df1)  #                           0#0         It's the shouting,#1         it's the shouting,#2         It's the Dutchman,#3   it's the Dutchman shout,#4               Get it away,#5   I don't need your shaft,#6         It's the shouting,#7         it's the shouting,#8         It's the shouting,#9   it's the Dutchman shout,#10             Give it away,#11  I don't need your shaft,#12               (yes I do),#13        It's the shouting,#14        it's the shouting,#15        It's the shouting,#16  it's the Dutchman shout,#17              Get it away,#18   I don't need your shaft
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python