以下是相关 df 的简化版本的脚本:
df = pd.DataFrame({
'id' : [1,1,2,2,3,3],
'feature': ['colour','interior_features','colour','interior_features','colour','interior_features'],
'feature_value' : ['blue','cd_player<->sat_nav<->usb_port','red','cd_player<->usb_port','red','cd_player<->sat_nav<->sub_woofer'],
})
df
id feature feature_value
0 1 colour blue
1 1 interior_features cd_player<->sat_nav<->usb_port
2 2 colour red
3 2 interior_features cd_player<->usb_port
4 3 colour red
5 3 interior_features cd_player<->sat_nav<->sub_woofer
首先,我想将'interior_features'中的字符串转换 为一个列表,其中'<->'是分隔符,如下所示:
id feature feature_value
0 1 colour blue
1 1 interior_features [cd_player, sat_nav, usb_port]
2 2 colour red
3 2 interior_features [cd_player, usb_port]
4 3 colour red
5 3 interior_features [cd_player, sat_nav, sub_woofer]
然后我想取消该列表的嵌套,并使用 one-hot 编码将二进制值分配给“feature_value”列中的“interior_features”。
预期DF:
id feature feature_value
0 1 colour blue
1 1 cd_player 1
2 1 sat_nav 1
3 1 usb_port 1
4 1 sub_woofer 0
5 2 colour red
6 2 cd_player 1
7 2 sat_nav 0
8 2 usb_port 1
9 2 sub_woofer 0
10 3 colour red
11 3 cd_player 1
12 3 sat_nav 1
13 3 usb_port 0
14 3 sub_woofer 1
任何帮助将非常感激。
慕尼黑5688855
相关分类