猿问

重新组合熊猫 df 中的列值

我有一个script是分配价值为基础的假两件columns的pandas df。下面的代码能够实现第一步,但我正在为第二步而苦苦挣扎。


所以脚本最初应该:


1)分配Person为每个单独的string在[Area]与所述第一3 unique values中[Place]


2)寻找重新分配People小于3 unique values 示例。在df下面有6 unique values中[Area]和[Place]。但是3 People被分配了。理想情况下,2人们将2 unique values每个


d = ({

    'Time' : ['8:03:00','8:17:00','8:20:00','10:15:00','10:15:00','11:48:00','12:00:00','12:10:00'],                 

   'Place' : ['House 1','House 2','House 1','House 3','House 4','House 5','House 1','House 1'],                 

    'Area' : ['X','X','Y','X','X','X','X','X'],    

     })


df = pd.DataFrame(data=d)


def g(gps):

        s = gps['Place'].unique()

        d = dict(zip(s, np.arange(len(s)) // 3 + 1))

        gps['Person'] = gps['Place'].map(d)

        return gps


df = df.groupby('Area', sort=False).apply(g)

s = df['Person'].astype(str) + df['Area']

df['Person'] = pd.Series(pd.factorize(s)[0] + 1).map(str).radd('Person ')

输出:


       Time    Place Area    Person

0   8:03:00  House 1    X  Person 1

1   8:17:00  House 2    X  Person 1

2   8:20:00  House 1    Y  Person 2

3  10:15:00  House 3    X  Person 1

4  10:15:00  House 4    X  Person 3

5  11:48:00  House 5    X  Person 3

6  12:00:00  House 1    X  Person 1

7  12:10:00  House 1    X  Person 1

如您所见,第一步工作正常。或者每个人stringin [Area],第一个3 unique valuesin[Place]都分配给一个Person。这使得Person 1有3 values,Person 2与1 value和Person 3带2 values。


第二步是我挣扎的地方。


如果 aPerson少于3 unique values分配给他们,请更改此设置,以便每个Person人最多3 unique values


预期输出:


       Time    Place Area    Person

0   8:03:00  House 1    X  Person 1

1   8:17:00  House 2    X  Person 1

2   8:20:00  House 1    Y  Person 2

3  10:15:00  House 3    X  Person 1

4  10:15:00  House 4    X  Person 2

5  11:48:00  House 5    X  Person 2

6  12:00:00  House 1    X  Person 1

7  12:10:00  House 1    X  Person 1


慕姐4208626
浏览 136回答 3
3回答

慕田峪4524236

据我了解,您对 Person 分配之前的一切都感到满意。所以这里有一个即插即用的解决方案来“合并”少于 3 个唯一值的人,所以每个人最终都有 3 个唯一值,除了最后一个显然(基于你发布的倒数第二个 df(“输出:”),没有触摸那些已经有 3 个唯一值的值,然后合并其他值。编辑:大大简化的代码。同样,将您的 df 作为输入:n = 3df['complete'] = df.Person.apply(lambda x: 1 if df.Person.tolist().count(x) == n else 0)df['num'] = df.Person.str.replace('Person ','')df.sort_values(by=['num','complete'],ascending=True,inplace=True) #get all persons that are complete to the topc = 0person_numbers = []for x in range(0,999): #Create the numbering [1,1,1,2,2,2,3,3,3,...] with n defining how often a person is 'repeated'    if x % n == 0:        c += 1            person_numbers.append(c) df['Person_new'] = person_numbers[0:len(df)] #Add the numbering to the dfdf.Person = 'Person ' + df.Person_new.astype(str) #Fill the person column with the new numberingdf.drop(['complete','Person_new','num'],axis=1,inplace=True)

慕无忌1623718

第 2 步的情况如何:def reduce_df(df):&nbsp; &nbsp; values = df['Area'] + df['Place']&nbsp; &nbsp; df1 = df.loc[~values.duplicated(),:] # ignore duplicate values for this part..&nbsp; &nbsp; person_count = df1.groupby('Person')['Person'].agg('count')&nbsp; &nbsp; leftover_count = person_count[person_count < 3] # the 'leftovers'&nbsp; &nbsp; # try merging pairs together&nbsp; &nbsp; nleft = leftover_count.shape[0]&nbsp; &nbsp; to_try = np.arange(nleft - 1)&nbsp; &nbsp; to_merge = (leftover_count.values[to_try] +&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; leftover_count.values[to_try + 1]) <= 3&nbsp; &nbsp; to_merge[1:] = to_merge[1:] & ~to_merge[:-1]&nbsp; &nbsp; to_merge = to_try[to_merge]&nbsp; &nbsp; merge_dict = dict(zip(leftover_count.index.values[to_merge+1],&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; leftover_count.index.values[to_merge]))&nbsp; &nbsp; def change_person(p):&nbsp; &nbsp; &nbsp; &nbsp; if p in merge_dict.keys():&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; return merge_dict[p]&nbsp; &nbsp; &nbsp; &nbsp; return p&nbsp; &nbsp; reduced_df = df.copy()&nbsp; &nbsp; # update df with the merges you found&nbsp; &nbsp; reduced_df['Person'] = reduced_df['Person'].apply(change_person)&nbsp; &nbsp; return reduced_dfprint(&nbsp; &nbsp; reduce_df(reduce_df(df)) # call twice in case 1,1,1 -> 2,1 -> 3)输出:Area&nbsp; &nbsp; Place&nbsp; &nbsp; &nbsp; Time&nbsp; &nbsp; Person0&nbsp; &nbsp; X&nbsp; House 1&nbsp; &nbsp;8:03:00&nbsp; Person 11&nbsp; &nbsp; X&nbsp; House 2&nbsp; &nbsp;8:17:00&nbsp; Person 12&nbsp; &nbsp; Y&nbsp; House 1&nbsp; &nbsp;8:20:00&nbsp; Person 23&nbsp; &nbsp; X&nbsp; House 3&nbsp; 10:15:00&nbsp; Person 14&nbsp; &nbsp; X&nbsp; House 4&nbsp; 10:15:00&nbsp; Person 25&nbsp; &nbsp; X&nbsp; House 5&nbsp; 11:48:00&nbsp; Person 26&nbsp; &nbsp; X&nbsp; House 1&nbsp; 12:00:00&nbsp; Person 17&nbsp; &nbsp; X&nbsp; House 1&nbsp; 12:10:00&nbsp; Person 1
随时随地看视频慕课网APP

相关分类

Python
我要回答