我最近获取了当地健身房的数据,我正在尝试对数据进行标准化,以便可以创建一个“健身房注册”对象,其中包含注册该会话的所有人员。
文本文件如下所示:
Sep 30th '20 at 9:00AM Until Sep 30th '20 at 10:00AM
JD John Doe
AW Alice Wonderland
IM Iron Man
Sep 30th '20 at 8:00AM Until Sep 30th '20 at 9:00AM
JD John Doe
AW Alice Wonderland
IM Iron Man
我已经能够使用 pandas 按列 [姓名首字母,姓名] 分隔注册,但我不知道如何检测何时一行对应于时间段而不是注册的人。
因此,程序运行后,每一行都应包含 [姓名首字母、姓名、时间段] 列
对我来说处理这些数据最简单的方法就是采用这种格式,
JD John Doe Sep 30th '20 at 9:00AM Until Sep 30th '20 at 10:00AM
AW Alice Wonderland Sep 30th '20 at 9:00AM Until Sep 30th '20 at 10:00AM
IM Iron Man Sep 30th '20 at 9:00AM Until Sep 30th '20 at 10:00AM
JD John Doe Sep 30th '20 at 8:00AM Until Sep 30th '20 at 9:00AM
AW Alice Wonderland Sep 30th '20 at 8:00AM Until Sep 30th '20 at 9:00AM
IM Iron Man Sep 30th '20 at 8:00AM Until Sep 30th '20 at 9:00AM
我尝试遍历每一行,一旦出现一个时隙行,我就会将该行附加到下一行,直到出现新的时隙。
def testSort():
with open("1-weak-gym.txt") as fp:
id= []
totalSheet=[]
timeSlot = []
lastLine=[]
for ln in fp:
if ln.startswith("Sep"): ##this is a time slot
timeSlot.clear()
timeSlot.append(ln[0:]) ##save that time slot as the lastDate variable
else:
if (timeSlot):
totalSheet.append(timeSlot) ##append the time slot
totalSheet.append(ln[0:]) ##append the name line
else:
print('Hello eror')
print(totalSheet, file=open("newOuput.txt","a"))
慕少森
相关分类