我是 RE 的新手,我正在尝试提取歌词并分离出诗句标题、和声和主声:
下面是一些歌词的例子:
[Intro]
D.A. got that dope!
[Chorus: Travis Scott]
Ice water, turned Atlantic (Freeze)
Nightcrawlin' in the Phantom (Skrrt, Skrrt)...
经文标题包括方括号和它们之间的任何单词。他们可以成功地隔离
r'\[{1}.*?\]{1}'
和声与诗歌标题相似,但介于 () 之间。他们已通过以下方式成功隔离:
r'\({1}.*?\){1}'
对于主唱,我使用了
r'\S+'
这确实隔离了 main_vocals,但也隔离了诗句标题和和声。我不知道如何用简单的 RE 仅隔离主人声。
这是一个 python 脚本,它可以获得我想要的输出,但我想用 RE 来做(作为学习练习)并且无法通过文档弄清楚。
import re
file = 'D:/lyrics.txt'
with open(file, 'r') as f:
lyrics = f.read()
def find_spans(pattern, string):
pattern = re.compile(pattern)
return [match.span() for match in pattern.finditer(string)]
verses = find_spans(r'\[{1}.*?\]{1}', lyrics)
backing_vocals = find_spans(r'\({1}.*?\){1}', lyrics)
main_vocals = find_spans(r'\S+', lyrics)
exclude = verses
exclude.extend(backing_vocals)
not_main_vocals = []
for span in exclude:
start, stop = span
not_main_vocals.extend(list(range(start, stop)))
main_vocals_temp = []
for span in main_vocals:
append = True
start, stop = span
for i in range(start, stop):
if i in not_main_vocals:
append = False
continue
if append == True:
main_vocals_temp.append(span)
main_vocals = main_vocals_temp
明月笑刀无情
慕斯王
相关分类