Search
 
SCRIPT & CODE EXAMPLE
 

PYTHON

extract text from a pdf python

# pip3 install pdfplumber
import pdfplumber

# a single page
with pdfplumber.open(r'test.pdf') as pdf:
    first_page = pdf.pages[-0]
    print(first_page.extract_text())

# for every page
# with pdfplumber.open(r'test.pdf') as pdf:
#     for pages in pdf.pages:
#         print(pages.extract_text())
Comment

extract text from pdf python

# using PyMuPDF
import sys, fitz
fname = sys.argv[1]  # get document filename
doc = fitz.open(fname)  # open document
out = open(fname + ".txt", "wb")  # open text output
for page in doc:  # iterate the document pages
    text = page.get_text().encode("utf8")  # get plain text (is in UTF-8)
    out.write(text)  # write text of page
    out.write(bytes((12,)))  # write page delimiter (form feed 0x0C)
out.close()
Comment

PREVIOUS NEXT
Code Example
Python :: django prepopulated_fields 
Python :: selenium keep window open python 
Python :: pandas to json without index 
Python :: how to use random in python 
Python :: to int in pandas 
Python :: pygame font 
Python :: python print error traceback 
Python :: write set to txt python 
Python :: get max pixel value python 
Python :: django python install 
Python :: normalise list python 
Python :: virtualenv -p python3 
Python :: no module named pyplot 
Python :: print terminal url 
Python :: list map lambda python 
Python :: py random list integers 
Python :: python return -1 
Python :: pandas split column into multiple columns by delimiter 
Python :: pad zeros to a string python 
Python :: taking string input from user in python 
Python :: detect stop codon 
Python :: colorized progress bar python in console 
Python :: check if directory exists python 
Python :: insert column at specific position in pandas dataframe 
Python :: count line of code in python recursive 
Python :: absolut beginners projects in python with tutorial 
Python :: `12` print () 
Python :: new column with age interval pandas 
Python :: add year to id django 
Python :: render_template not showing images 
ADD CONTENT
Topic
Content
Source link
Name
3+1 =