Python 2.7 Tutorial Pt 17

Python How ToIn this video tutorial I show you how to create dynamic websites with Python. This is the quickest and easiest way to start using Python in much the same way as PHP is used.

I previously taught you how to scrap websites for information in this tutorial Python Website Scraping. Here I’ll show you how to scrap and then display that information dynamically on the internet.

Leave any questions or comments below and here are all of my previous Python Video Tutorial’s:


Here is all the Code from the Video


from urllib import urlopen
from BeautifulSoup import BeautifulSoup
import re

print “Content-type: text/html”
print “<html><head>”
print “Huffington Post Feed”
print “</head><body>”

def cleanHtml(i):
i = str(i) # Convert the Beautiful Soup Tag to a string
bS = BeautifulSoup(i) # Pass the string to Beautiful Soup to strip out html

# Find all of the text between paragraph tags and strip out the html
i = bS.find(‘p’).getText()

# Strip ampersand codes and WATCH:
i = re.sub(‘&\w+;’,”,i)
i = re.sub(‘WATCH:’,”,i)
return i

def cleanHtmlRegex(i):
i = str(i)
regexPatClean = re.compile(r'<[^<]*?/?>’)
i = regexPatClean.sub(”, i)
# Strip ampersand codes and WATCH:
i = re.sub(‘&\w+;’,”,i)
return re.sub(‘WATCH:’,”,i)

# Copy all of the content from the provided web page
webpage = urlopen(‘’).read()

# Grab everything that lies between the title tags using a REGEX
titleString = ‘<title>(.*)</title>’
patFinderTitle = re.compile(titleString)

# Grab the link to the original article using a REGEX
origArticleLink = ‘<link rel.*href=”(.*)” />’
patFinderLink = re.compile(origArticleLink)

# Store all of the titles and links found in 2 lists
findPatTitle = re.findall(patFinderTitle,webpage)
findPatLink = re.findall(patFinderLink,webpage)

# Create an iterator that will cycle through the first 16 articles and skip a few
listIterator = []
listIterator[:] = range(2,16)

# Print out the results to screen
for i in listIterator:
print “<h3>” + findPatTitle[i]+ “</h3><br />” # The title
print “<a href='” + findPatLink[i] + “‘>Original Article</a><br />” # The link to the original article

articlePage = urlopen(findPatLink[i]).read() # Grab all of the content from original article

divBegin = articlePage.find(‘<div>’) # Locate the div provided
article = articlePage[divBegin:(divBegin+1000)] # Copy the first 1000 characters after the div

# Pass the article to the Beautiful Soup Module
soup = BeautifulSoup(article)

# Tell Beautiful Soup to locate all of the p tags and store them in a list
paragList = soup.findAll(‘p’)

# Print all of the paragraphs to screen
for i in paragList:
# i = cleanHtml(i)
i = cleanHtmlRegex(i)
print i + “<br />”

print “<br /></body></html>”

9 Responses to “Python 2.7 Tutorial Pt 17”

  1. Thanks for the information this tutorial Python Website Scraping. Thanks for the post.

  2. mike says:

    Do you know how to view from the Firefox or Chrome browser in Ubuntu 10.10?
    I installed Apache and I can see the file, but the browser will only download it. I can’t view it.

  3. Jeremi Bauer says:


    Once again I find myself at a loss, I get (am getting) the code, but I cannot get the program to run in localhost, I have spent the last 6 hours trying to figure out how you accessed the sudo chmod 755 in terminal to work. I keep getting chmod: No such file or directory. I have been literally everywhere looking for answers. Anything???

    • admin says:

      I have to go back and dig up that code. Maybe the Huff Post changed their tags? I’ll send you a message when I pull together the code

    • admin says:

      I found a link to the original code in this zipped archive. Does that help?

      • Jeremi Bauer says:


        You’re awesome, seriously.

        Unfortunately I’ll have to keep trying to figure this out. I have followed every one of your tutorials and this one (17) just stumped me when you instructed us to go into terminal and change the permissions. Is there a special location I should be saving these .py documents because for some reason my localhost doesn’t know where to look (probably I don’t know where to point my localhost, lol)

        Thanks though for taking the time.


        • admin says:

          Are you on a Mac or PC? PCs don’t have a terminal like Macs and Linux os. I think the problem lies in that this tutorial requires that a bunch of things stay the same. BeautifulSoup, Huffington Post, PyDev, Eclipse, etc. All of them have changed since I made this. If you understand the basic concepts though you should be able to make changes. I also show how to website scrap with PHP. I’ll have to revisit and correct this tutorial based on the changes. Sorry about that

Leave a Reply

Your email address will not be published.