熊猫:在数据框中重命名“未命名:*”或“ NaN”

到目前为止,这是我的代码:


import numpy as np

import pandas as pd

df = pd.read_excel(r'file.xlsx', index_col=0)

看起来是这样的:

http://img.mukewang.com/607011e4000123ad10270450.jpg

我想将“未命名:*”列重命名为最后一个有效名称。


这是我尝试过的结果:


df.columns = df.columns.str.replace('Unnamed.*', method='ffill')

---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

<ipython-input-253-c868b8bff7c7> in <module>()

----> 1 df.columns = df.columns.str.replace('Unnamed.*', method='ffill')


TypeError: replace() got an unexpected keyword argument 'method'

如果我这样做,这是“有效的”


df.columns = df.columns.str.replace('Unnamed.*', '')

但是我有空白值或NaN(如果我将'替换为'NaN'。然后我尝试:


df.columns = df.columns.fillna('ffill')

哪个没有效果。所以我尝试了inplace = True:


df.columns = df.columns.fillna('ffill',inplace = True)


---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

<ipython-input-279-cce486472d5b> in <module>()

----> 1 df.columns = df.columns.fillna('ffill', inplace=True)


TypeError: fillna() got an unexpected keyword argument 'inplace'

然后我尝试了另一种方式:


i = 0

while i < len(df.columns):

    if df.columns[i] == 'NaN':

        df.columns[i] = df.columns[i-1]

    print(df.columns[i])

    i += 1

这给了我这个错误:


Oil

158 RGN Mistura

Access West Winter Blend 


---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

<ipython-input-246-bc8fa6881b1a> in <module>()

      2 while i < len(df.columns):

      3     if df.columns[i] == 'NaN':

----> 4         df.columns[i] = df.columns[i-1]

      5     print(df.columns[i])

      6     i += 1


~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexes\base.py in __setitem__(self, key, value)

   2048 

   2049     def __setitem__(self, key, value):

-> 2050         raise TypeError("Index does not support mutable operations")

   2051 

   2052     def __getitem__(self, key):


TypeError: Index does not support mutable operations


暮色呼如
浏览 325回答 3
3回答

郎朗坤

您遇到的问题与列和索引是pd.Index对象这一事实有关。pandas Index的fillna方法采用的参数与pandas Series或DataFrame的fillna方法采用的参数不同。我在下面做了一个玩具示例:import pandas as pdimport numpy as npdf = pd.DataFrame(&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;{'a':[1], 'Unnamed:1':[1], 'Unnamed:2':[1], 'b':[1], 'Unnamed:3':[1]},&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;columns=['a', 'Unnamed:3', 'Unnamed:1', 'b', 'Unnamed:2']))df&nbsp;#&nbsp; &nbsp;a&nbsp; Unnamed:3&nbsp; Unnamed:1&nbsp; b&nbsp; Unnamed:2#0&nbsp; 1&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 1&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 1&nbsp; 1&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 1您原始的正则表达式无法捕获整个列名,我们来解决这个问题。df.columns.str.replace('Unnamed:*', '')&nbsp;#Index(['a', '3', '1', 'b', '2'], dtype='object')df.columns.str.replace('Unnamed:\d+', '')#Index(['a', '', '', 'b', ''], dtype='object')df.columns.str.replace('Unnamed:.+', '')#Index(['a', '', '', 'b', ''], dtype='object')现在,让我们将索引转换为一系列,以便我们可以使用和的一个正则表达式的.replace和.fillna方法,pd.Series将相关的列名替换为ffill。最后,我们将其转换为pd.Indexpd.Index(&nbsp; &nbsp; pd.Series(&nbsp; &nbsp; &nbsp; &nbsp; df.columns&nbsp; &nbsp; ).replace('Unnamed:\d+', np.nan, regex=True).fillna(method='ffill'))#Index(['a', 'a', 'a', 'b', 'b'], dtype='object')df.columns = pd.Index(pd.Series(df.columns).replace('Unnamed:\d+', np.nan, regex=True).fillna(method='ffill'))df.head()&nbsp;#&nbsp; &nbsp;a&nbsp; a&nbsp; a&nbsp; b&nbsp; b#0&nbsp; 1&nbsp; 1&nbsp; 1&nbsp; 1&nbsp; 1
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python