www说
由于问题没有提供简单的数据,我想该wordCounts变量是通过以下代码准备的。import pprintfrom pyspark.context import SparkContextsc = SparkContext('local', 'test')pairs = sc.parallelize([("a", 1), ("b", 1), ("b", 1), ("b", 1), ("b", 1), ("b", 1), ("d", 1), ("e", 1), ("a", 1), ("f", 1), ("c", 1)])wordCounts = pairs.reduceByKey(lambda x, y: x + y)您可以通过以下任一方式打印 wordCounts 中的值:print(wordCounts.collect()[:5]) #Pick 5 elementsprint(wordCounts.take(5)) #Pick 5 elementsprint(sorted(wordCounts.collect())[:5]) #Sort the tuples, and pick the first 5 elementsprint(sorted(wordCounts.collect(), key=lambda x: x[1], reverse=False)[:5]) #Sort by the second entry (i.e. count) in ascending order, and pick the first 5 elements哪个产生[('a', 2), ('b', 5), ('d', 1), ('e', 1), ('f', 1)][('a', 2), ('b', 5), ('d', 1), ('e', 1), ('f', 1)][('a', 2), ('b', 5), ('c', 1), ('d', 1), ('e', 1)][('d', 1), ('e', 1), ('f', 1), ('c', 1), ('a', 2)]强烈建议您下次提供一个最小的可重现示例。