Search
 
SCRIPT & CODE EXAMPLE
 

PYTHON

pyspark split dataframe by rows

from pyspark.sql.window import Window
from pyspark.sql.functions import monotonically_increasing_id, ntile

values = [(str(i),) for i in range(100)]
df = spark.createDataFrame(values, ('value',))

def split_by_row_index(df, num_partitions=4):
    # Let's assume you don't have a row_id column that has the row order
    t = df.withColumn('_row_id', monotonically_increasing_id())
    # Using ntile() because monotonically_increasing_id is discontinuous across partitions
    t = t.withColumn('_partition', ntile(num_partitions).over(Window.orderBy(t._row_id))) 
    return [t.filter(t._partition == i+1).drop('_row_id', '_partition') for i in range(partitions)]

[i.collect() for i in split_by_row_index(df)]
Comment

PREVIOUS NEXT
Code Example
Python :: continual vs continuous 
Python :: python chrome 
Python :: calculate age python 
Python :: np vstack 
Python :: pandas groupby aggregate 
Python :: extract pdf with python 
Python :: pandas change dtype to timestamp 
Python :: print groupby dataframe 
Python :: python printing to a file 
Python :: create and use python classes 
Python :: python background function 
Python :: how to round off values in columns in pandas in excel 
Python :: how to get date in numbers using python 
Python :: how to change column name in pandas 
Python :: how to earse special chrat¥cter from string in python 
Python :: how to resize tkinter window 
Python :: python tar a directory 
Python :: how to select a single cell in a pandas dataframe 
Python :: python check if int 
Python :: convert a column to int pandas 
Python :: post to instagram from pc python 
Python :: python checking if something is equal to NaN 
Python :: colorbar min max matplotlib 
Python :: full form of rom 
Python :: make pickle file python 
Python :: tkinter entry 
Python :: pytest multi thread 
Python :: create limit using matplotlib 
Python :: flask error 
Python :: Sorting Dataframes by Column Python Pandas 
ADD CONTENT
Topic
Content
Source link
Name
3+1 =