如何在Python中读取带有标签的txt文件?

我是 Python 的初学者。我有一个像下面这样的文本文件,里面有数千个文档(从 id=1 到 id=10000):


<doc id=1>

    <label>1</label>

    <summary>

        I think you are right

    </summary>

    <short_text>

        I think you are right. Because I have once read the book in the same topic.

    </short_text>

</doc>

有没有什么方便的方法来读取文本文件并将内容存储在实例中?


class ShortText:

    def __init__(self, my_id, human_label, summary, short_text):

        self.id = my_id         

        self.human_label = human_label    

        self.summary = summary 

        self.short_text = short_text

    def __str__(self):

        '''

        For printing purposes.

        '''

        return '%d\t%s\t%s\t%s' % (self.id, self.human_label, self.summary, self.short_text)


def load_file(filename):

    #retrieve the original text 

    with codecs.open(filename, encoding='utf-8') as f:

        data = f.read()

    #how to get values from tags and put it below?

        my_id = 

        human_label = 

        summary = 

        short_text = 

        instances[my_id] = ShortText(my_id, human_label, summary, short_text)

    return instances


慕森王
浏览 277回答 2
2回答
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python