Example Python for pyspark mapreduce dataframe-283279

Search

SCRIPT & CODE EXAMPLE

CODE EXAMPLE FOR PYTHON

pyspark mapreduce dataframe

df.rdd 
  .filter(lambda x: x[1] == "france")  # only french stations
  .map(lambda x: (x[0], x[2]))  # select station & temp
  .mapValues(lambda x: (x, 1))  # generate count
  .reduceByKey(lambda x, y: (x[0]+y[0], x[1]+y[1]))  # calculate sum & count
  .mapValues(lambda x: x[0]/x[1])  # calculate average
  .sortBy(lambda x: x[1], ascending = False)  # sort
  .take(100)

Source by stackoverflow.com #

PREVIOUS	NEXT

Tagged: #pyspark #mapreduce #dataframe

ADD COMMENT