如何使用 pandas 解析文本文件并创建列表

3回答

动漫人物

我认为您必须执行此操作才能从文件中提取所有记录并获取审核/摘要值。您不需要数据框。#create a dictionary to store the list of review summary valuesd = {'review summary':[]}#function to extract only the review_summary from the linedef split_review_summary(full_line):        #find review/text and exclude it from the line    found = full_line.find('review/text:')    if found >= 0:        full_line = full_line[:found]    #find review summary. All text to the right is review summary    #add this to the dictionary    found = full_line.find('review/summary:')    if found >= 0:        review_summary = full_line[(found + 15):]        d['review summary'].append(review_summary)#open the file for readingwith open ("xyz.txt","r") as f:    #read the first line    new_line = f.readline().rstrip('\n')    #loop through the rest of the lines    for line in f:        #remove newline from the data        line = line.rstrip('\n')                #if the line starts with product/productId, then its a new entry        #process the previous line and strip out the review_summary        #to do that, call split_review_summary function                if line[:17] == 'product/productId':            split_review_summary(new_line)            #reset new_line to the current line            new_line = line        else:            #append to the new_line as its part of the previous record            new_line += line#the last full record has not been processed#So send it to split_review_summary to extract review summarysplit_review_summary(new_line)#now dictionary d has all the review summary itemsprint (d)其输出将是：{'review summary': [' Good Quality Dog Food ', ' Not as Advertised ']}我认为你的问题范围还包括写入新文件。您可以打开一个文件并将字典写入一行。这将包含所有细节。我将把这部分留给你来解决。

30秒到达战场

CSV 文件代表逗号分隔值。我在你的文件中没有看到任何逗号。它看起来像一本损坏的字典（每个条目缺少分隔逗号）：my_dict ={ 'productid': 12312312, 'some_key': 'I am the key!',}

白猪掌柜的

我查看了 S.Ghoshal 提供的链接并得出以下结论：#Opening your fileyour_file = open('foods.txt')#Reading every linereviews = your_file.readlines()reviews_array = []dictionary = {}#We are going through every line and skip it when we see that it's a blank linefor review in reviews:    this_line = review.split(":")    if len(this_line) > 1:        #The blank lines are less than 1 in length after the split        dictionary[this_line[0]] = this_line[1].strip()        #Every first part before ":" is the key of the dictionary, and the second part id the content.    else:        #If a blank linee was found lets save the object in the array and reset it        #for the next review        reviews_array.append(dictionary)        dictionary = {}#Append the last object because it goes out the last elsereviews_array.append(dictionary)f1=open("output.txt","a")for r in reviews_array:    print(r['review/text'], file=f1)f1.close()现在，以 review/text 开头的行中的所有单词都将转储到文件中。接下来我需要创建一个包含所有独特单词的列表。