numpy 中的多个插入,其中配对元素没有潜台词

这个问题是在@ecortazar 回答的上一篇帖子的后续问题。但是,我还想在不包含特定字符串的 pd.Series 中的两个元素之间粘贴,仅使用 Pandas / Numpy。注:href文中带 的所有行均不同。


import pandas as pd

import numpy as np


table = pd.Series(


        ["<td class='test'>AA</td>",                  # 0 

        "<td class='test'>A</td>",                    # 1

        "<td class='test'><a class='test' href=...",  # 2

        "<td class='test'>B</td>",                    # 3

        "<td class='test'><a class='test' href=...",  # 4

        "<td class='test'>BB</td>",                   # 5

        "<td class='test'>C</td>",                    # 6

        "<td class='test'><a class='test' href=...",  # 7 

        "<td class='test'>F</td>",                    # 8

        "<td class='test'>G</td>",                    # 9 

        "<td class='test'><a class='test' href=...",  # 10 

        "<td class='test'>X</td>"])                   # 11



dups = ~table.str.contains('href') & table.shift(-1).str.contains('href') 

array = np.insert(table.values, dups[dups].index, "None")

pd.Series(array)



# OUTPUT:

# 0                      <td class='test'>AA</td>

# 1                                          None

# 2                       <td class='test'>A</td>

# 3     <td class='test'><a class='test' href=...

# 4                                          None Incorrect

# 5                       <td class='test'>B</td>

# 6     <td class='test'><a class='test' href=...

# 7                      <td class='test'>BB</td>

# 8                                          None

# 9                       <td class='test'>C</td>

# 10    <td class='test'><a class='test' href=...

# 11                      <td class='test'>F</td>

# 12                                         None

# 13                      <td class='test'>G</td>

# 14    <td class='test'><a class='test' href=...

# 15                      <td class='test'>X</td>



守着一只汪
浏览 143回答 2
2回答

SMILET

您可以执行与以前相同的程序。唯一需要注意的是,您必须在换班前使用 not (~) 运算符。原因是这种转变将在您的系列的第一个位置创建一个 np.nan ,它将系列定义为浮点数,从而在 not 操作上失败。import pandas as pdimport numpy as nptable = pd.Series(&nbsp; &nbsp; &nbsp; &nbsp; ["<td class='test'>AA</td>",&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # 0&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; "<td class='test'>A</td>",&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # 1&nbsp; &nbsp; &nbsp; &nbsp; "<td class='test'><a class='test' href=...",&nbsp; # 2&nbsp; &nbsp; &nbsp; &nbsp; "<td class='test'>B</td>",&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # 3&nbsp; &nbsp; &nbsp; &nbsp; "<td class='test'><a class='test' href=...",&nbsp; # 4&nbsp; &nbsp; &nbsp; &nbsp; "<td class='test'>BB</td>",&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;# 5&nbsp; &nbsp; &nbsp; &nbsp; "<td class='test'>C</td>",&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # 6&nbsp; &nbsp; &nbsp; &nbsp; "<td class='test'><a class='test' href=...",&nbsp; # 7&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; "<td class='test'>F</td>",&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # 8&nbsp; &nbsp; &nbsp; &nbsp; "<td class='test'>G</td>",&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # 9&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; "<td class='test'><a class='test' href=...",&nbsp; # 10&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; "<td class='test'>X</td>"])&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;# 11not_contain = ~table.str.contains('href')cond = not_contain & not_contain.shift(1)array = np.insert(table.values, cond[cond].index, "None")pd.Series(array)

眼眸繁星

这解决了上述问题,但没有 Numpy 和 Pandas。如果你能用他们重新创造,我会给你正确的答案。import pandas as pdimport numpy as nptable = pd.Series(&nbsp; &nbsp; &nbsp; &nbsp; ["<td class='test'>AA</td>",&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # 0&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; "<td class='test'>A</td>",&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # 1&nbsp; &nbsp; &nbsp; &nbsp; "<td class='test'><a class='test' href=...",&nbsp; # 2&nbsp; &nbsp; &nbsp; &nbsp; "<td class='test'>B</td>",&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # 3&nbsp; &nbsp; &nbsp; &nbsp; "<td class='test'><a class='test' href=...",&nbsp; # 4&nbsp; &nbsp; &nbsp; &nbsp; "<td class='test'>BB</td>",&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;# 5&nbsp; &nbsp; &nbsp; &nbsp; "<td class='test'>C</td>",&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # 6&nbsp; &nbsp; &nbsp; &nbsp; "<td class='test'><a class='test' href=...",&nbsp; # 7&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; "<td class='test'>F</td>",&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # 8&nbsp; &nbsp; &nbsp; &nbsp; "<td class='test'>G</td>",&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # 9&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; "<td class='test'><a class='test' href=...",&nbsp; # 10&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; "<td class='test'>X</td>"])&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;# 11insertAt = []for i in range(0, len(table)-1):&nbsp; # print('count ', i)&nbsp; if i == 1:&nbsp; &nbsp; if 'href' not in table[0] and 'href' not in table[1]:&nbsp; &nbsp; &nbsp; print(i, ' starts with tag')&nbsp; &nbsp; &nbsp; print(i, ' is duplicated')&nbsp; &nbsp; &nbsp; insertAt.append(True)&nbsp;&nbsp; &nbsp; &nbsp; insertAt.append(True)&nbsp;&nbsp; &nbsp; &nbsp; next&nbsp; &nbsp; elif 'href' not in table[0] and 'href' in table[1]:&nbsp; &nbsp; &nbsp; print(i, ' not start with tag')&nbsp; &nbsp; &nbsp; print(i, ' is not duplicated')&nbsp; &nbsp; &nbsp; insertAt.append(True)&nbsp;&nbsp; &nbsp; &nbsp; insertAt.append(False)&nbsp;&nbsp; &nbsp; &nbsp; next&nbsp; &nbsp; else:&nbsp; &nbsp; &nbsp; print(i, ' not start with tag')&nbsp; &nbsp; &nbsp; print(i, ' is not duplicated')&nbsp; &nbsp; &nbsp; insertAt.append(False)&nbsp;&nbsp; &nbsp; &nbsp; insertAt.append(False)&nbsp;&nbsp; &nbsp; &nbsp; next&nbsp; if i > 1:&nbsp; &nbsp; if 'href' not in table[i-1] and 'href' not in table[i]:&nbsp;&nbsp; &nbsp; &nbsp; print(i + 1, ' is duplicated')&nbsp; &nbsp; &nbsp; insertAt.append(True)&nbsp; &nbsp; else:&nbsp; &nbsp; &nbsp; print(i + 1, ' is not duplicated')&nbsp; &nbsp; &nbsp; insertAt.append(False)insertAt = pd.Series(insertAt)array = np.insert(table.values, insertAt[insertAt].index, "None")pd.Series(array) # back to series if necessary
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python