如何使用 Java 将 BigQuery 读取到 Apache Spark

我想使用 Java 将 Google BigQuery 中的表中的数据读取到 Spark 中。我如何在 Java 中做到这一点,我需要什么依赖项以及生成的数据类型是什么?

我能找到的所有东西都在 Scala 中,但我需要 Java 中的。


FFIVE
浏览 33回答 1
1回答

动漫人物

下面是 Scala Shakespeare 示例的 Java 等效项:import org.apache.spark.sql.Dataset;import org.apache.spark.sql.Row;import org.apache.spark.sql.SparkSession;public class JavaShakespeare {&nbsp; public static void main(String[] args) {&nbsp; &nbsp; SparkSession spark = SparkSession.builder()&nbsp; &nbsp; &nbsp; &nbsp; .appName("spark-bigquery-demo")&nbsp; &nbsp; &nbsp; &nbsp; .getOrCreate();&nbsp; &nbsp; // Use the Cloud Storage bucket for temporary BigQuery export data used&nbsp; &nbsp; // by the connector. This assumes the Cloud Storage connector for&nbsp; &nbsp; // Hadoop is configured.&nbsp; &nbsp; String bucket = spark.sparkContext().hadoopConfiguration().get("fs.gs.system.bucket");&nbsp; &nbsp; spark.conf().set("temporaryGcsBucket", bucket);&nbsp; &nbsp; // Load data in from BigQuery.&nbsp; &nbsp; Dataset<Row> wordsDF = spark.read().format("bigquery")&nbsp; &nbsp; &nbsp; &nbsp; .option("table", "publicdata.samples.shakespeare").load().cache();&nbsp; &nbsp; wordsDF.show();&nbsp; &nbsp; wordsDF.printSchema();&nbsp; &nbsp; wordsDF.createOrReplaceTempView("words");&nbsp; &nbsp; // Perform word count.&nbsp; &nbsp; Dataset<Row> wordCountDF = spark.sql(&nbsp; &nbsp; &nbsp; &nbsp; "SELECT word, SUM(word_count) AS word_count FROM words GROUP BY word");&nbsp; &nbsp; // Saving the data to BigQuery&nbsp; &nbsp; wordCountDF.write().format("bigquery").option("table", "wordcount_dataset.wordcount_output")&nbsp; &nbsp; &nbsp; &nbsp; .save();&nbsp; }}
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Java