熊猫：在数据框中重命名“未命名：*”或“ NaN”

首页课程实战体系课手记专栏慕课教程

熊猫：在数据框中重命名“未命名：*”或“ NaN”

到目前为止，这是我的代码：

import numpy as np

import pandas as pd

df = pd.read_excel(r'file.xlsx', index_col=0)

看起来是这样的：

我想将“未命名：*”列重命名为最后一个有效名称。

这是我尝试过的结果：

df.columns = df.columns.str.replace('Unnamed.*', method='ffill')

---------------------------------------------------------------------------

TypeError Traceback (most recent call last)

<ipython-input-253-c868b8bff7c7> in <module>()

----> 1 df.columns = df.columns.str.replace('Unnamed.*', method='ffill')

TypeError: replace() got an unexpected keyword argument 'method'

如果我这样做，这是“有效的”

df.columns = df.columns.str.replace('Unnamed.*', '')

但是我有空白值或NaN（如果我将'替换为'NaN'。然后我尝试：

df.columns = df.columns.fillna('ffill')

哪个没有效果。所以我尝试了inplace = True：

df.columns = df.columns.fillna（'ffill'，inplace = True）

---------------------------------------------------------------------------

TypeError Traceback (most recent call last)

<ipython-input-279-cce486472d5b> in <module>()

----> 1 df.columns = df.columns.fillna('ffill', inplace=True)

TypeError: fillna() got an unexpected keyword argument 'inplace'

然后我尝试了另一种方式：

i = 0

while i < len(df.columns):

if df.columns[i] == 'NaN':

df.columns[i] = df.columns[i-1]

print(df.columns[i])

i += 1

这给了我这个错误：

Oil

158 RGN Mistura

Access West Winter Blend

---------------------------------------------------------------------------

TypeError Traceback (most recent call last)

<ipython-input-246-bc8fa6881b1a> in <module>()

2 while i < len(df.columns):

3 if df.columns[i] == 'NaN':

----> 4 df.columns[i] = df.columns[i-1]

5 print(df.columns[i])

6 i += 1

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexes\base.py in __setitem__(self, key, value)

2048

2049 def __setitem__(self, key, value):

-> 2050 raise TypeError("Index does not support mutable operations")

2051

2052 def __getitem__(self, key):

TypeError: Index does not support mutable operations

暮色呼如

浏览 352回答 3

3回答

郎朗坤

您遇到的问题与列和索引是pd.Index对象这一事实有关。pandas Index的fillna方法采用的参数与pandas Series或DataFrame的fillna方法采用的参数不同。我在下面做了一个玩具示例：import pandas as pdimport numpy as npdf = pd.DataFrame(         {'a':[1], 'Unnamed:1':[1], 'Unnamed:2':[1], 'b':[1], 'Unnamed:3':[1]},          columns=['a', 'Unnamed:3', 'Unnamed:1', 'b', 'Unnamed:2']))df #   a  Unnamed:3  Unnamed:1  b  Unnamed:2#0  1          1          1  1          1您原始的正则表达式无法捕获整个列名，我们来解决这个问题。df.columns.str.replace('Unnamed:*', '') #Index(['a', '3', '1', 'b', '2'], dtype='object')df.columns.str.replace('Unnamed:\d+', '')#Index(['a', '', '', 'b', ''], dtype='object')df.columns.str.replace('Unnamed:.+', '')#Index(['a', '', '', 'b', ''], dtype='object')现在，让我们将索引转换为一系列，以便我们可以使用和的一个正则表达式的.replace和.fillna方法，pd.Series将相关的列名替换为ffill。最后，我们将其转换为pd.Indexpd.Index(    pd.Series(        df.columns    ).replace('Unnamed:\d+', np.nan, regex=True).fillna(method='ffill'))#Index(['a', 'a', 'a', 'b', 'b'], dtype='object')df.columns = pd.Index(pd.Series(df.columns).replace('Unnamed:\d+', np.nan, regex=True).fillna(method='ffill'))df.head() #   a  a  a  b  b#0  1  1  1  1  1

0 0

随时随地看视频慕课网APP

相关分类

Python