Search
 
SCRIPT & CODE EXAMPLE
 
CODE EXAMPLE FOR PYTHON

pyspark mapreduce dataframe

df.rdd 
  .filter(lambda x: x[1] == "france")  # only french stations
  .map(lambda x: (x[0], x[2]))  # select station & temp
  .mapValues(lambda x: (x, 1))  # generate count
  .reduceByKey(lambda x, y: (x[0]+y[0], x[1]+y[1]))  # calculate sum & count
  .mapValues(lambda x: x[0]/x[1])  # calculate average
  .sortBy(lambda x: x[1], ascending = False)  # sort
  .take(100)
Source by stackoverflow.com #
 
PREVIOUS NEXT
Tagged: #pyspark #mapreduce #dataframe
ADD COMMENT
Topic
Name
8+2 =