我正在尝试使用一起购买的物品的次数来制作一个非常简单的物品推荐系统,
所以首先我创建了一个像计数器一样的 item2item 字典
# people purchased A with B 4 times, A with C 3 times.
item2item = {'A': {'B': 4, 'C': 3}, 'B': {'A': 4, 'C': 2}, 'C':{'A': 3, 'B': 2}}
# recommend user who purchased A and C
samples_list = [['A', 'C'], ...]
因此,对于 samples = ['A', 'C'],我建议最大 item2item['A'] + item2item['C']。
但是,对于大型矩阵,合并很重,所以我尝试使用如下的多处理
from operator import add
from functools import reduce
from concurrent.futures import ProcessPoolExecutor
from collections import Counter
with ProcessPoolExecutor(max_workers=10) as pool:
for samples in samples_list:
# w/o PoolExecutor
# combined = reduce(add, [item2item[s] for s in samples], Counter())
future = pool.submit(reduce, add, [item2item[s] for s in samples], Counter())
combined = future.result()
然而,这根本没有加快这个过程。
我怀疑在Python multiprocessing 和 shared counter和https://docs.python.org/3/library/multiprocessing.html#sharing-state-between-processes中,reduce 函数中的 Counter 未共享。
任何帮助表示赞赏。
收到一只叮咚
相关分类