Search
 
SCRIPT & CODE EXAMPLE
 

PYTHON

arabic text recognition from pdf using python

import os
from os import chdir, getcwd, listdir, path
import codecs
import pyPdf
from time import strftime

def check_path(prompt):
    ''' (str) -> str
    Verifies if the provided absolute path does exist.
    '''
    abs_path = raw_input(prompt)
    while path.exists(abs_path) != True:
        print "
The specified path does not exist.
"
        abs_path = raw_input(prompt)
    return abs_path    

print "
"

folder = check_path("Provide absolute path for the folder: ")

list=[]
directory=folder
for root,dirs,files in os.walk(directory):
    for filename in files:
        if filename.endswith('.pdf'):
            t=os.path.join(directory,filename)

            list.append(t)

m=len(list)
print (m)
i=0
while i<=m-1:

    path=list[i]
    print(path)
    head,tail=os.path.split(path)
    var=""

    tail=tail.replace(".pdf",".txt")
    name=head+var+tail

    content = ""
    # Load PDF into pyPDF
    pdf = pyPdf.PdfFileReader(file(path, "rb"))
            # Iterate pages
    for j in range(0, pdf.getNumPages()):
        # Extract text from page and add to content
        content += pdf.getPage(j).extractText() + "
"
    print strftime("%H:%M:%S"), " pdf  -> txt "
    f=open(name,'w')
    content.encode('utf-8')
    f.write(content)
    f.close
    i=i+1
Comment

PREVIOUS NEXT
Code Example
Python :: python when to use pandas series, numpy ndarrays or simply python dictionaries 
Python :: django multi column index 
Python :: string times python 
Python :: python define propery by null 
Python :: reopen closed file python 
Python :: pydrive set parents 
Python :: a guide to numpy and pandas 
Python :: how to write statements in python 
Python :: torch split classes stratified 
Python :: sqlite to python list 
Python :: scatter plot actual vs predicted python 
Python :: zoom in geopandas polot 
Python :: dataframeclient influxdb example 
Python :: numpy print full array to srdout 
Python :: how to element into the first index python 
Python :: python import cache (testing grepper, maybe not a helpful solution) 
Python :: enregistremen en pythin picklr 
Python :: Python zonale statictics on raster 
Python :: qrcode how to add logo inside python 
Python :: how to use query in ms access with python 
Python :: pandas pivot table margins percentage 
Python :: plotly scroll zoom 
Python :: Python RegEx Escape – re.escape() Syntax 
Python :: svm classification involving pipelines 
Python :: linke dlists in python 
Python :: Python Code for Checking if a number is an Odd number 
Python :: Read a string with digits from the input and convert each number to an integer. Create a list in which you should include only odd digits. 
Python :: delta lake with spark 
Python :: newton backward interpolation python code 
Python :: Closing small holes in the binary image with opencv 
ADD CONTENT
Topic
Content
Source link
Name
6+7 =