如何将多列附加到一列?

我正在对地址进行分组,其中有多种类型的地址,但我需要对它们进行分组并应用计数和排序。

输入

Address1  Address2   Address3
    a1       b1        c1 
    b2       a2        e2

输出要求

Address4
a1
b2
b1
a2
c1
e2



青春有我
浏览 178回答 3
3回答

沧海一幻觉

给出的实现相同,但使用更高级别的数据集 api。&nbsp; &nbsp; &nbsp; &nbsp; Dataset<Row> Ad1 = df.select(functions.col("Address1").as("Address4"));&nbsp; &nbsp; &nbsp; &nbsp; Dataset<Row> Ad2 = df.select("Address2");&nbsp; &nbsp; &nbsp; &nbsp; Dataset<Row> Ad3 = df.select("Address3");&nbsp; &nbsp; &nbsp; &nbsp; Dataset<Row> Union_DS = Ad1.union(Ad2).union(Ad3);&nbsp; &nbsp; &nbsp; &nbsp; Union_DS.show();&nbsp; &nbsp; &nbsp; &nbsp; Dataset<Row> Union_Sorted = Union_DS&nbsp; &nbsp; &nbsp; &nbsp; .groupBy("Address4")&nbsp; &nbsp; &nbsp; &nbsp; .agg(functions.count(functions.col("Address4")).as("Count"))&nbsp; &nbsp; &nbsp; &nbsp; .sort(functions.desc("Count"))&nbsp; &nbsp; &nbsp; &nbsp; ;&nbsp; &nbsp; &nbsp; &nbsp; Union_Sorted.show();

慕娘9325324

您应该能够使用UNIONSpark SQL 来解决这个问题:spark.sql(&nbsp; """&nbsp; &nbsp; |SELECT Address4&nbsp; &nbsp; |FROM (&nbsp; &nbsp; | SELECT Address1 FROM table&nbsp; &nbsp; | UNION&nbsp; &nbsp; | SELECT Address2 FROM table&nbsp; &nbsp; | UNION&nbsp; &nbsp; | SELECT Address3 FROM table&nbsp; &nbsp; | )&nbsp; """.stripMargin).show()

慕的地8271018

#You can be able to do it with the below approachval input_rdd = spark.sparkContext.parallelize(List(("a1", "b1", "c1"), ("a1", "b2", "c1"), ("a1", "b1", "c2"), ("a2", "b2", "c2")))&nbsp; &nbsp; val input_df = input_rdd.toDF("Address1", "Address2", "Address3")&nbsp; &nbsp; input_df.show()+--------+--------+--------+|Address1|Address2|Address3|+--------+--------+--------+|&nbsp; &nbsp; &nbsp; a1|&nbsp; &nbsp; &nbsp; b1|&nbsp; &nbsp; &nbsp; c1||&nbsp; &nbsp; &nbsp; a1|&nbsp; &nbsp; &nbsp; b2|&nbsp; &nbsp; &nbsp; c1||&nbsp; &nbsp; &nbsp; a1|&nbsp; &nbsp; &nbsp; b1|&nbsp; &nbsp; &nbsp; c2||&nbsp; &nbsp; &nbsp; a2|&nbsp; &nbsp; &nbsp; b2|&nbsp; &nbsp; &nbsp; c2|+--------+--------+--------+&nbsp; &nbsp; val out_address1_df = input_df.groupBy("Address1").agg(count(input_df("Address1")).as("count_address1")).&nbsp; &nbsp; &nbsp; select(input_df("Address1").as("ADDRESS"), col("count_address1").as("COUNT"))&nbsp; &nbsp; //out_address1_df.show()&nbsp; &nbsp; val out_address2_df = input_df.groupBy("Address2").agg(count(input_df("Address2")).as("count_address2")).&nbsp; &nbsp; &nbsp; select(input_df("Address2").as("ADDRESS"), col("count_address2").as("COUNT"))&nbsp; &nbsp; //out_address2_df.show()&nbsp; &nbsp; val out_address3_df = input_df.groupBy("Address3").agg(count(input_df("Address3")).as("count_address3")).&nbsp; &nbsp; &nbsp; select(input_df("Address3").as("ADDRESS"), col("count_address3").as("COUNT"))&nbsp; &nbsp; val output_df = out_address1_df.unionAll(out_address2_df).unionAll(out_address3_df)&nbsp; &nbsp; output_df.show()+-------+-----+|ADDRESS|COUNT|+-------+-----+|&nbsp; &nbsp; &nbsp;a2|&nbsp; &nbsp; 1||&nbsp; &nbsp; &nbsp;a1|&nbsp; &nbsp; 3||&nbsp; &nbsp; &nbsp;b2|&nbsp; &nbsp; 2||&nbsp; &nbsp; &nbsp;b1|&nbsp; &nbsp; 2||&nbsp; &nbsp; &nbsp;c1|&nbsp; &nbsp; 2||&nbsp; &nbsp; &nbsp;c2|&nbsp; &nbsp; 2|+-------+-----+
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Java