Search
 
SCRIPT & CODE EXAMPLE
 

PYTHON

Read large SAS file ilarger than memory n Python

import pandas as pd
import pyreadstat
filename = 'foo.SAS7BDAT'
CHUNKSIZE = 50000
offset = 0
allChunk,_ = getChunk(row['filePath'], row_limit=CHUNKSIZE, row_offset=offset)
allChunk = allChunk.astype('category')

while True:
    offset += CHUNKSIZE
    # for xpt data, use pyreadstat.read_xpt()
    chunk, _ = pyreadstat.read_sas7bdat(filename, row_limit=CHUNKSIZE, row_offset=offset)
    if chunk.empty: break  # if chunk is empty, it means the entire data has been read, so break

    for eachCol in chunk:  #converting each column to categorical 
        colUnion = pd.api.types.union_categoricals([allChunk[eachCol], chunk[eachCol]])
        allChunk[eachCol] = pd.Categorical(allChunk[eachCol], categories=colUnion.categories)
        chunk[eachCol] = pd.Categorical(chunk[eachCol], categories=colUnion.categories)

    allChunk = pd.concat([allChunk, chunk])  #Append each chunk to the resulting dataframe
Comment

PREVIOUS NEXT
Code Example
Python :: python selenium for desktop application 
Python :: list in pythom 
Python :: get all methods of an instance 
Python :: copy element dynamo revit 
Python :: discord.py get user input (simplified) 
Python :: como filtrar los vacios, NaN, null en python 
Python :: qtoverlay 
Python :: import sys locate python = sys.exec_prefix print(locate python) 
Python :: codeforces 233 a solution python 
Python :: python counting subfolders on specific level 
Python :: mechanize python #12 
Python :: how to get data from multiple tables in django 
Python :: take substring of every element in dataframe 
Python :: how to get the original start_url in scrapy 
Python :: torch remove part of array 
Python :: Run flask on docker with postgres and guinicorn 
Python :: .format() multiple placeholders 
Python :: # filter a list 
Python :: how to make a new df from old 
Python :: get ggplot colorpalette python 
Python :: import image files from folders 
Python :: Command to import Required, All, Length, and Range from voluptuous 
Python :: foreach on sysargv 
Python :: Using *args to pass the variable-length arguments to the function 
Python :: Algorithms and Data Structures in Python (INTERVIEW Q&A) 
Python :: find all html files in a current directory using regular expression in python 
Python :: how to print hello world in python stack overflow 
Python :: how to change the color of console output in python to green 
Python :: codeforces problem 200B 
Python :: TemplateDoesNotExist at / 
ADD CONTENT
Topic
Content
Source link
Name
6+4 =