Search
 
SCRIPT & CODE EXAMPLE
 

PYTHON

python arabic web scraping

import urllib.request,sys,time
from bs4 import BeautifulSoup 
import requests
import pandas as pd

pagesToGet = 1

for page in range(1,pagesToGet+1):
    print('processing page :', page)
    url = 'http://norumors.net/?post_type=rumors/?page=' + str(page)
    print(url)

    #an exception might be thrown, so the code should be in a try-except block
    try:
        #use the browser to get the url. This is suspicious command that might blow up.
        page = requests.get(url)                             # this might throw an exception if something goes wrong.

    except Exception as e:                                   # this describes what to do if an exception is thrown
        error_type, error_obj, error_info = sys.exc_info()      # get the exception information
        print('ERROR FOR LINK:',url)                          #print the link that cause the problem
        print(error_type, 'Line:', error_info.tb_lineno)     #print error info and line that threw the exception
        continue                                              #ignore this page. Abandon this and go back.

    soup = BeautifulSoup(page.text,'html.parser')
    texts = []
    links = []
    filename = "NEWS.csv"
    f = open(filename,"w", encoding = 'utf-8')

    Statement = soup.find("div",attrs={'class':'row d-flex'})
    divs = Statement.find_all("div",attrs={'class':'col-lg-4 col-md-4 col-sm-6 col-xs-6'})

    for div in divs:
        txt = div.find("img",attrs={'class':'rumor__thumb'})
        texts.append(txt['alt'])
        lnk = div.find("a",attrs={'class':'rumor--archive'})
        links.append(lnk['href'])

data = pd.DataFrame(list(zip(texts, links)), columns=['Statement', 'Link'])
data.to_csv(f, encoding='utf-8', index=False)
f.close()
Comment

PREVIOUS NEXT
Code Example
Python :: Prints all integers of a list 
Python :: scan wifi networke micropython 
Python :: python slice last 2 items of list 
Python :: show only integer values matplotlib 
Python :: change background create_text tkinter 
Python :: download google drive link collab 
Python :: python parse /etc/resolv.conf 
Python :: # logging 
Python :: changing database of django 
Python :: how to append to an empty dataframe pandas 
Python :: access icloud doc on jupyter notebook 
Python :: tar dataset 
Python :: tokyo timezone python 
Python :: reverse order of dataframe rows 
Python :: python socket github 
Python :: How can i restrict letters after a number in an input in Python 
Python :: check if input is pandas dataframe 
Python :: range python start at 1 
Python :: map in python 
Python :: pygame image get height 
Python :: python clear memory 
Python :: Getting the first element from each list in a column of lists 
Python :: ljust rjust center python 
Python :: how can i aggregate without group by in pandas 
Python :: numpy subtract 
Python :: numpy concatenate arrays 
Python :: loading bar python 
Python :: python emoji convert 
Python :: how to extract keys from dictreader python 
Python :: compare two data frames in assert 
ADD CONTENT
Topic
Content
Source link
Name
4+6 =