将网页抓取结果存储在DataFrame或字典中

我正在上在线课程,并且试图使捕获个人笔记的课程结构的过程自动化,并将其保存在Markdown文件中。

这是一个示例章节:

http://img1.mukewang.com/60a366720001750208430546.jpg

以下是HTML外观的示例:


  <!-- Header of the chapter -->

  <div class="chapter__header">

      <div class="chapter__title-wrapper">

        <span class="chapter__number">

          <span class="chapter-number">1</span>

        </span>

        <h4 class="chapter__title">

          Introduction to Experimental Design

        </h4>

          <span class="chapter__price">

            Free

          </span>

      </div>

      <div class="dc-progress-bar dc-progress-bar--small chapter__progress">

        <span class="dc-progress-bar__text">0%</span>

        <div class="dc-progress-bar__bar chapter__progress-bar">

          <span class="dc-progress-bar__fill" style="width: 0%;"></span>

        </div>

      </div>

  </div>

  <p class="chapter__description">

    An introduction to key parts of experimental design plus some power and sample size calculations.

  </p>

  <!-- !Header of the chapter -->


<!-- Body of the chapter -->

  <ul class="chapter__exercises hidden">

      <li class="chapter__exercise ">

        <a class="chapter__exercise-link" href="https://campus.datacamp.com/courses/experimental-design-in-r/introduction-to-experimental-design?ex=1">

          <span class="chapter__exercise-icon exercise-icon ">

            <img width="23" height="23" src="https://cdn.datacamp.com/main-app/assets/courses/icon_exercise_video-3b15ea50771db747f7add5f53e535066f57d9f94b4b0ebf1e4ddca0347191bb8.svg" alt="Icon exercise video" />

          </span>

          <h5 class="chapter__exercise-title" title='Intro to Experimental Design'>Intro to Experimental Design</h5>

          <span class="chapter__exercise-xp">

            50 xp

          </span>

</a>      </li>


我的问题是:构造此数据的最佳方法是什么,以便以后在编写文本文件时可以轻松访问它?它会更好,以与列的数据帧chapterlessonlesson_link?一个具有MultiIndex的DataFrame?嵌套字典?如果是字典,我应该给键命名什么?还是我错过了另一种选择?某种数据库?

慕慕森
浏览 235回答 1
1回答
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python