Search
 
SCRIPT & CODE EXAMPLE
 

PYTHON

scrape pdf out of link

import requests
from bs4 import BeautifulSoup
import io
from PyPDF2 import PdfFileReader
 
 
url = "https://www.geeksforgeeks.org/how-to-extract-pdf-tables-in-python/"
read = requests.get(url)
html_content = read.content
soup = BeautifulSoup(html_content, "html.parser")
 
list_of_pdf = set()
l = soup.find('p')
p = l.find_all('a')
 
for link in (p):
    pdf_link = (link.get('href')[:-5]) + ".pdf"
    print(pdf_link)
    list_of_pdf.add(pdf_link)
 
def info(pdf_path):
    response = requests.get(pdf_path)
     
    with io.BytesIO(response.content) as f:
        pdf = PdfFileReader(f)
        information = pdf.getDocumentInfo()
        number_of_pages = pdf.getNumPages()
 
    txt = f"""
    Information about {pdf_path}:
 
    Author: {information.author}
    Creator: {information.creator}
    Producer: {information.producer}
    Subject: {information.subject}
    Title: {information.title}
    Number of pages: {number_of_pages}
    """
    print(txt)
    return information
 
 
for i in list_of_pdf:
    info(i)
Comment

PREVIOUS NEXT
Code Example
Python :: get all subarrays of an array python 
Python :: check if element is in list 
Python :: python := 
Python :: NumPy resize Syntax 
Python :: pass query params django template 
Python :: reverse relationship in django for one to one field for usage in Django rest serializer 
Python :: ski learn decision tree 
Python :: List Comprehension build a list of tuples 
Python :: convert to string in python 
Python :: django convert model to csv 
Python :: do while python using dates 
Python :: update dataframe based on value from another dataframe 
Python :: label with list comprehension python 
Python :: python selenium teardown class 
Python :: convert timestamp to period pandas 
Python :: python prevent print output 
Python :: theme_use() tkinter theme usage 
Python :: python string formatting - padding 
Python :: convert to lwercase in df column 
Python :: python diferente de 
Python :: logging python 
Python :: python inspect 
Python :: pandas change string column to datetime 
Python :: change folder icon with python 
Python :: decorators in python 
Python :: Python RegEx SubString – re.sub() 
Python :: remove duplicates in json python 
Python :: python remove white space 
Python :: add text to axis 
Python :: relative frequency histogram python 
ADD CONTENT
Topic
Content
Source link
Name
2+2 =