在熊猫数据框中插入或更新
我想合并 storage_df 和 processes_df ,如下所示。假设 phone 是主键: 1. 如果值存在则字段(并创建剩余的列,如下例中的性别) 2. 如果值不存在,则将该值插入最终数据帧中,如示例中的 382837371
请注意,随着我们处理更多信息,该列会不断增加。但是有 32 列的限制,直到 processes_df/storage_df 会增加
storage_df
________________________
Phone Name
918348483 Sumit
874647474 Saurabh
238362633 NA
Processed_df
_______________________________
Phone Name Gender
874647474 Saurabh Male
238362633 NA Female
382837371 NA Male
final_df
_______________________________
Phone Name Gender
918348483 Sumit NA
874647474 Saurabh Male
238362633 NA Female
382837371 NA Male
为此,我使用了熊猫的 combine_first:
final_df = processed_df.set_index('phone').combine_first(storage_df.set_index('phone'))
但是随着数据帧大小的增加,系统内存不足(16Gb 内存并且无法组合形状(88488, 6)和形状(7307, 8)
可以使用 sqlite 在 sql 中存储两个数据帧,然后使用 UPSERT。你能指导我这样做的语法吗?虽然我真的很想在内存中而不是在数据库中。
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py", line 5364, in combine_first
return self.combine(other, combiner, overwrite=False)
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py", line 5229, in combine
this, other = self.align(other, copy=False)
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py", line 3792, in align
broadcast_axis=broadcast_axis)
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py", line 8423, in align
fill_axis=fill_axis)
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py", line 8459, in _align_frame
allow_dups=True)
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py", line 4490, in _reindex_with_indexers
copy=copy)
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 1220, in reindex_indexer
self._consolidate_inplace()
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 929, in _consolidate_inplace
芜湖不芜
PIPIONE
泛舟湖上清波郎朗
相关分类