如何在 pandas 数据框中找到缺失的一对并用虚拟值填充

使用MultiIndex.from_productby 列中所有组合的级别 byMultiIndex.levels传递到DataFrame.reindex：df = df.set_index(['Name','Type'])df = df.reindex(pd.MultiIndex.from_product(df.index.levels), fill_value='0000-00-00')print (df) DateName Type A X 2019-08-06 Y 2019-08-08 Z 0000-00-00B X 0000-00-00 Y 2019-08-01 Z 0000-00-00C X 0000-00-00 Y 0000-00-00 Z 2019-10-12 编辑：错误意味着,ValueError:cannot handle a non-unique multi-index!中存在重复对，处理数据的解决方案是：NameTypedf = pd.DataFrame({'Date':['2019-08-06','2019-08-08','2019-08-01','2019-10-12'], 'Name':['A','A','B','C'], 'Type':['X','X','Y','Z'], 'col':list('abcd')})print (df) Date Name Type col0 2019-08-06 A X a1 2019-08-08 A X b <-duplicated pair `A, X` - Name, Type2 2019-08-01 B Y c3 2019-10-12 C Z d解决方案是先通过删除重复项DataFrame.duplicated，然后应用于reindex所有组合：mask = df.duplicated(['Name','Type'])df1 = df[~mask].set_index(['Name','Type'])df1 = (df1.reindex(pd.MultiIndex.from_product(df1.index.levels)) .fillna({'Date':'0000-00-00', 'col':'missing'}).reset_index())print (df1) Name Type Date col0 A X 2019-08-06 a1 A Y 0000-00-00 missing2 A Z 0000-00-00 missing3 B X 0000-00-00 missing4 B Y 2019-08-01 c5 B Z 0000-00-00 missing6 C X 0000-00-00 missing7 C Y 0000-00-00 missing8 C Z 2019-10-12 d最后添加所有重复的行concat：df = pd.concat([df1, df[mask]]).sort_values(['Name','Type'], ignore_index=True)print (df) Name Type Date col0 A X 2019-08-06 a1 A X 2019-08-08 b2 A Y 0000-00-00 missing3 A Z 0000-00-00 missing4 B X 0000-00-00 missing5 B Y 2019-08-01 c6 B Z 0000-00-00 missing7 C X 0000-00-00 missing8 C Y 0000-00-00 missing9 C Z 2019-10-12 d

如何在 pandas 数据框中找到缺失的一对并用虚拟值填充

1回答