Untitled

From Alex, 4 Months ago, written in Plain Text, viewed 101 times. This paste will explode in 1 Second.
URL http://codebin.org/view/fac01e65 Embed
Download Paste or View Raw
  1. from pyspark.sql import SparkSession
  2.  
  3. APP_NAME = "DataFrames"
  4. SPARK_URL = "local[*]"
  5.  
  6. spark = SparkSession.builder.appName(APP_NAME) \
  7.         .config('spark.ui.showConsoleProgress', 'false') \
  8.         .getOrCreate()
  9.  
  10. taxi = spark.read.load('/datasets/pickups_terminal_5.csv',
  11.                        format='csv', header='true', inferSchema='true')
  12.  
  13. taxi = taxi.fillna(0)
  14.  
  15. taxi.registerTempTable("taxi")
  16.  
  17. print(spark.sql('SELECT hour, AVG(pickups) FROM taxi '
  18.                 'GROUP BY hour ORDER BY AVG(pickups) DESC LIMIT 10').show())

Reply to "Untitled"

Here you can reply to the paste above