Pandas 在 2 个数据帧上使用 netaddr 来查看 ip 列是否属于带有布尔结果的

我正在使用 netaddr python 库。我有 2 个数据帧,一个带有可转换为 CIDR 表示法的 IP 范围,另一个带有我想查看它们是否属于任何范围的 IP 地址。


创建范围数据框:


import pandas as pd

import netaddr

from netaddr import *


a = {'StartAddress': ['65.14.88.64', '148.77.37.88', '65.14.41.128', '65.14.40.0'],

 'EndAddress': ['65.14.88.95', '148.77.37.95','65.14.41.135', '65.14.40.255']}

df1 = pd.DataFrame(data=a)


#Convert range to netaddr cidr format

def rangetocidr(row):

    return netaddr.iprange_to_cidrs(row.StartAddress, row.EndAddress)    


df1["CIDR"] = df1.apply(rangetocidr, axis=1)


df1

    StartAddress  EndAddress    CIDR

0   65.14.88.64   65.14.88.95   [65.14.88.64/27]

1   148.77.37.88  148.77.37.95  [148.77.37.88/29]

2   65.14.41.128  65.14.41.135  [65.14.41.128/29]

3   65.14.40.0    65.14.40.255  [65.14.40.0/24]


df1["CIDR"].iloc[0]

[IPNetwork('65.14.88.64/27')]

创建 IP 数据帧:


b = {'IP': ['65.13.88.64', '148.65.37.88','65.14.88.65','148.77.37.93','66.15.41.132']}

df2 = pd.DataFrame(data=b)


#Convert ip to netaddr format

def iptonetaddrformat (row):

    return netaddr.IPAddress(row.IP)


df2["IP_Format"] = df2.apply(iptonetaddrformat, axis=1)

df2

    IP            IP_Format

0   65.13.88.64   65.13.88.64

1   148.65.37.88  148.65.37.88

2   65.14.88.65   65.14.88.65

3   148.77.37.93  148.77.37.93

4   66.15.41.132  66.15.41.132


df2["IP_Format"].iloc[0]

IPAddress('65.13.88.64')

我期待中添加一列df2,如果IP地址是从CIDR块df1。所以它看起来像:


df2

    IP            IP_Format     IN_CIDR

0   65.13.88.64   65.13.88.64   False

1   148.65.37.88  148.65.37.88  False

2   65.14.88.65   65.14.88.65   True

3   148.77.37.93  148.77.37.93  True

4   66.15.41.132  66.15.41.132  False

我更愿意仅使用 2 个数据帧中的列来执行此操作,但已通过将列转换为列表并使用以下内容进行了尝试,但这似乎不起作用:


df2list = repr(df2[['IP_Format']])

df1list = df[['CIDR']]


def ipincidr (row):

    return netaddr.largest_matching_cidr(df2list, df1list)


df2['INRANGE'] = df2.apply(ipincidr, axis=1)


慕容森
浏览 191回答 1
1回答

开满天机

以下解决方案基于这样的假设:只有第四组 IP 发生变化,而前三组 IP 保持不变,如问题所示。# Splitting IP into 2 parts __.__.__ and __.&nbsp;# Doing this for IP from df2 along with Start and End columns from df1ip = pd.DataFrame(df2.IP.str.rsplit('.', 1, expand=True))ip.columns = ['IP_init', 'IP_last']start = pd.DataFrame(df1.StartAddress.str.rsplit('.', 1, expand=True))start.columns = ['start_init', 'start_last']end = pd.DataFrame(df1.EndAddress.str.rsplit('.', 1, expand=True))end.columns = ['end_init', 'end_last']df = pd.concat([ip, start, end], axis=1)# Checking if any IP belongs to any of the given blocks, if yes, note their indexindex = []for idx, val in enumerate(df.itertuples()):&nbsp; &nbsp; for i in range(df.start_init.count()):&nbsp; &nbsp; &nbsp; &nbsp; if df.loc[idx, 'IP_init'] == df.loc[i, 'start_init']:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if df.loc[idx, 'IP_last'] >= df.loc[i, 'start_last']&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; and df.loc[idx, 'IP_last'] <= df.loc[i, 'end_last']:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; index.append(idx)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; break# Creating column IN_CIDR and marking True against the row which exists in IP blockdf2['IN_CIDR'] = Falsedf2.loc[index, 'IN_CIDR'] = Truedf2&nbsp; &nbsp; IP&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; IP_Format&nbsp; &nbsp; &nbsp;IN_CIDR0&nbsp; &nbsp;65.13.88.64&nbsp; &nbsp;65.13.88.64&nbsp; &nbsp;False1&nbsp; &nbsp;148.65.37.88&nbsp; 148.65.37.88&nbsp; False2&nbsp; &nbsp;65.14.88.65&nbsp; &nbsp;65.14.88.65&nbsp; &nbsp;True3&nbsp; &nbsp;148.77.37.93&nbsp; 148.77.37.93&nbsp; True4&nbsp; &nbsp;66.15.41.132&nbsp; 66.15.41.132&nbsp; False注意 - 您也可以使用which results np.whereto 跳过第一次迭代,因此您以后可以只关注行,从而减少开销。np.where(df.IP_init.isin(df.start_init), True, False)[False, False,&nbsp; True,&nbsp; True, False]True
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python