- from pyspark.sql import SparkSession
- APP_NAME = "DataFrames"
- SPARK_URL = "local[*]"
- spark = SparkSession.builder.appName(APP_NAME) \
- .config('spark.ui.showConsoleProgress', 'false') \
- .getOrCreate()
- taxi = spark.read.load('/datasets/pickups_terminal_5.csv',
- format='csv', header='true', inferSchema='true')
- taxi = taxi.fillna(0)
- taxi.registerTempTable("taxi")
- print(spark.sql('SELECT hour, AVG(pickups) FROM taxi '
- 'GROUP BY hour ORDER BY AVG(pickups) DESC LIMIT 10').show())
Untitled
From Alex, 4 Months ago, written in Plain Text, viewed 101 times.
This paste will explode in 1 Second.
URL http://codebin.org/view/fac01e65
Embed
Download Paste or View Raw
— Expand Paste to full width of browser