Spark:减去同一数据集行中的值

给定以下数据集:


| title | start | end

| bla   | 10    | 30

我想找到两个数字之间的差异(开始 - 结束)并将它们设置为一个新列,使其看起来像:


| title | time_spent |

 | bla   | 20 |

正如我在这个问题中看到的那样,数据的类型是Dataset<Row>

dataset = dataset.withColumn("millis spent: ", col("end") - col("start")).as("Time spent");

我希望它能够工作的,但它没有,可能是因为该线程是关于 DataFrames 而不是 DataSets,或者可能是因为 Scala 允许它在 Java 中是非法的?


眼眸繁星
浏览 153回答 1
1回答

千巷猫影

您可以考虑静态方法。简而言之:import static org.apache.spark.sql.functions.expr;...df = df&nbsp; &nbsp; .withColumn("time_spent", expr("end - start"))&nbsp; &nbsp; .drop("start")&nbsp; &nbsp; .drop("end");expr()将评估您的列中的值。这是正确导入的完整示例。抱歉,示例的大部分内容是关于创建数据框。package net.jgp.books.sparkInAction.ch12.lab990Others;import static org.apache.spark.sql.functions.expr;import java.util.ArrayList;import java.util.List;import org.apache.spark.sql.Dataset;import org.apache.spark.sql.Row;import org.apache.spark.sql.RowFactory;import org.apache.spark.sql.SparkSession;import org.apache.spark.sql.types.DataTypes;import org.apache.spark.sql.types.StructField;import org.apache.spark.sql.types.StructType;/**&nbsp;* Use of expr().&nbsp;*&nbsp;&nbsp;* @author jgp&nbsp;*/public class ExprApp {&nbsp; /**&nbsp; &nbsp;* main() is your entry point to the application.&nbsp; &nbsp;*&nbsp;&nbsp; &nbsp;* @param args&nbsp; &nbsp;*/&nbsp; public static void main(String[] args) {&nbsp; &nbsp; ExprApp app = new ExprApp();&nbsp; &nbsp; app.start();&nbsp; }&nbsp; /**&nbsp; &nbsp;* The processing code.&nbsp; &nbsp;*/&nbsp; private void start() {&nbsp; &nbsp; // Creates a session on a local master&nbsp; &nbsp; SparkSession spark = SparkSession.builder()&nbsp; &nbsp; &nbsp; &nbsp; .appName("All joins!")&nbsp; &nbsp; &nbsp; &nbsp; .master("local")&nbsp; &nbsp; &nbsp; &nbsp; .getOrCreate();&nbsp; &nbsp; StructType schema = DataTypes.createStructType(new StructField[] {&nbsp; &nbsp; &nbsp; &nbsp; DataTypes.createStructField(&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; "title",&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; DataTypes.StringType,&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; false),&nbsp; &nbsp; &nbsp; &nbsp; DataTypes.createStructField(&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; "start",&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; DataTypes.IntegerType,&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; false),&nbsp; &nbsp; &nbsp; &nbsp; DataTypes.createStructField(&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; "end",&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; DataTypes.IntegerType,&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; false) });&nbsp; &nbsp; List<Row> rows = new ArrayList<Row>();&nbsp; &nbsp; rows.add(RowFactory.create("bla", 10, 30));&nbsp; &nbsp; Dataset<Row> df = spark.createDataFrame(rows, schema);&nbsp; &nbsp; df.show();&nbsp; &nbsp; df = df&nbsp; &nbsp; &nbsp; &nbsp; .withColumn("time_spent", expr("end - start"))&nbsp; &nbsp; &nbsp; &nbsp; .drop("start")&nbsp; &nbsp; &nbsp; &nbsp; .drop("end");&nbsp; &nbsp; df.show();&nbsp; }}
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Java