Search
 
SCRIPT & CODE EXAMPLE
 
CODE EXAMPLE FOR PYTHON

dask read csv

df = dd.read_csv('s3://bucket/myfiles.*.csv')  
>>> df = dd.read_csv('hdfs:///myfiles.*.csv')  
>>> df = dd.read_csv('hdfs://namenode.example.com/myfiles.*.csv')

"""
urlpathstring or list
Absolute or relative filepath(s). Prefix with a protocol like s3:// to read from alternative filesystems. To read from multiple files you can pass a globstring or a list of paths, with the caveat that they must all have the same protocol.

blocksizestr, int or None, optional
Number of bytes by which to cut up larger files. Default value is computed based on available physical memory and the number of cores, up to a maximum of 64MB. Can be a number like 64000000 or a string like "64MB". If None, a single block is used for each file.

sampleint, optional
Number of bytes to use when determining dtypes

assume_missingbool, optional
If True, all integer columns that aren’t specified in dtype are assumed to contain missing values, and are converted to floats. Default is False.

storage_optionsdict, optional
Extra options that make sense for a particular storage connection, e.g. host, port, username, password, etc.

include_path_columnbool or str, optional
Whether or not to include the path to each particular file. If True a new column is added to the dataframe called path. If str, sets new column name. Default is False.

**kwargs
Extra keyword arguments to forward to pandas.read_csv().
"""
Source by docs.dask.org #
 
PREVIOUS NEXT
Tagged: #dask #read #csv
ADD COMMENT
Topic
Name
6+4 =