Search
 
SCRIPT & CODE EXAMPLE
 
CODE EXAMPLE FOR PYTHON

Group the values for each key in the RDD into a single sequence.

rdd = sc.parallelize([("a", 1), ("b", 1), ("a", 1)])
sorted(rdd.groupByKey().mapValues(len).collect())
# [('a', 2), ('b', 1)]
sorted(rdd.groupByKey().mapValues(list).collect())
# [('a', [1, 1]), ('b', [1])]
Source by spark.apache.org #
 
PREVIOUS NEXT
Tagged: #Group #values #key #RDD #single
ADD COMMENT
Topic
Name
2+8 =