数据框到rdd python / spark / pyspark

我使用的是旧的pyspark脚本。我正在尝试将数据帧df转换为rdd。


#Importing the required libraries

import pandas as pd

from pyspark.sql.types import *

from pyspark.ml.regression import RandomForestRegressor

from pyspark.mllib.util import MLUtils

from pyspark.ml import Pipeline

from pyspark.ml.tuning import CrossValidator, ParamGridBuilder

from pyspark.ml.evaluation import RegressionEvaluator

from pyspark.ml.linalg import Vectors

from pyspark.ml import Pipeline

from pyspark.ml.tuning import CrossValidator, ParamGridBuilder

from pyspark.mllib.fpm import *

from pyspark.sql import SparkSession


spark = SparkSession     .builder     .appName("Python Spark")     .config("spark.some.config.option", "some-value")


# read the data

df = pd.read_json("events.json")


df = (df.rdd.map(lambda x: (x[1],[x[0]])).reduceByKey(lambda x,y: x+y).sortBy(lambda k_v: (k_v[0], sorted(k_v[1], key=lambda x: x[1], reverse=True))).collect())           

继承人错误输出: AttributeError:'DataFrame'对象没有属性'rdd'


我想念什么?如何将数据帧转换为rdd?


我安装了anaconda 3.6.1和spark 2.3.1


慕妹3242003
浏览 204回答 1
1回答
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python