使用 one-hot 编码分割字符串并将 df 从长格式转换为宽格式

以下是相关 df 的简化版本的脚本:


df = pd.DataFrame({ 

               'id' : [1,1,2,2,3,3], 

               'feature': ['colour','interior_features','colour','interior_features','colour','interior_features'],

               'feature_value' : ['blue','cd_player<->sat_nav<->usb_port','red','cd_player<->usb_port','red','cd_player<->sat_nav<->sub_woofer'],

                 })

df


   id   feature             feature_value

0   1   colour              blue

1   1   interior_features   cd_player<->sat_nav<->usb_port

2   2   colour              red

3   2   interior_features   cd_player<->usb_port

4   3   colour              red

5   3   interior_features   cd_player<->sat_nav<->sub_woofer

首先,我想将'interior_features'中的字符串转换 为一个列表,其中'<->'是分隔符,如下所示:


    id  feature             feature_value

0   1   colour              blue

1   1   interior_features   [cd_player, sat_nav, usb_port]

2   2   colour              red

3   2   interior_features   [cd_player, usb_port]

4   3   colour              red

5   3   interior_features   [cd_player, sat_nav, sub_woofer]

然后我想取消该列表的嵌套,并使用 one-hot 编码将二进制值分配给“feature_value”列中的“interior_features”。


预期DF:


    id  feature     feature_value

0   1   colour      blue

1   1   cd_player   1

2   1   sat_nav     1

3   1   usb_port    1

4   1   sub_woofer  0

5   2   colour      red

6   2   cd_player   1

7   2   sat_nav     0

8   2   usb_port    1

9   2   sub_woofer  0

10  3   colour      red

11  3   cd_player   1

12  3   sat_nav     1

13  3   usb_port    0

14  3   sub_woofer  1

任何帮助将非常感激。


开满天机
浏览 108回答 1
1回答

慕尼黑5688855

split然后您可以尝试explode填充crosstab每个 id 的缺失行df1 = df.loc[df['feature']=='colour']&nbsp;# slice out the row do not need to unnestdf2 = df.drop(df1.index)&nbsp; &nbsp;&nbsp;df2['feature'] = df2['feature_value'].str.split('<->')s = df2.explode('feature')&nbsp;s = pd.crosstab(s['id'],s['feature']).stack().reset_index(name='feature_value')out = pd.concat([df1,s]).sort_values('id')outOut[356]:&nbsp;&nbsp; &nbsp; id&nbsp; &nbsp; &nbsp;feature feature_value0&nbsp; &nbsp; 1&nbsp; &nbsp; &nbsp; colour&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; blue0&nbsp; &nbsp; 1&nbsp; &nbsp;cd_player&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;11&nbsp; &nbsp; 1&nbsp; &nbsp; &nbsp;sat_nav&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;12&nbsp; &nbsp; 1&nbsp; sub_woofer&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;03&nbsp; &nbsp; 1&nbsp; &nbsp; usb_port&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;12&nbsp; &nbsp; 2&nbsp; &nbsp; &nbsp; colour&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;red4&nbsp; &nbsp; 2&nbsp; &nbsp;cd_player&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;15&nbsp; &nbsp; 2&nbsp; &nbsp; &nbsp;sat_nav&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;06&nbsp; &nbsp; 2&nbsp; sub_woofer&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;07&nbsp; &nbsp; 2&nbsp; &nbsp; usb_port&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;14&nbsp; &nbsp; 3&nbsp; &nbsp; &nbsp; colour&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;red8&nbsp; &nbsp; 3&nbsp; &nbsp;cd_player&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;19&nbsp; &nbsp; 3&nbsp; &nbsp; &nbsp;sat_nav&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;110&nbsp; &nbsp;3&nbsp; sub_woofer&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;111&nbsp; &nbsp;3&nbsp; &nbsp; usb_port&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;0
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python