Tuesday, June 11, 2013

Fill Out Web Forms - Python

I've been doing quite a bit lately with games an pygame. I thought I'd take a break and do something useful. Actually, I found some online drawing that I wanted to win, and they allowed you to submit an entry every day over the course of a month. Obviously I didn't want to log in every day, and even if I did have that level of enthusiasm, my memory would fail at some point and I'd miss a few days here or there.

So the only real solution was to script something and set up a cron job to do it for me every morning. Now, I've never done a script to crawl web pages or anything, so I did a bit of digging and found there are actually quite a few options open for python. Since I was more concerned with just getting something together quickly, I went for what looked to be the easiest option: Mechanize.

Mechanize is a library that can search a URL for existing forms and then allows you to edit form contents and follow links or submit forms. Perfect for what I wanted.

Here's an example script:

#! /usr/bin/env python
import os
from mechanize import Browser

fname="submitResults.txt"
f = open(fname, 'w')

br = Browser()
br.set_proxies({"http":"5.5.5.5:3128"})
br.open("http://www.site.com/giveaway/")
br.select_form(nr=1)
br['fvFirst']='firstName'
br['fvLast']='lastName'
f.write('submitted to site\n')
br.submit()
br.close()
f.close()

Nothing special here. I found out the form and field names ahead of time, by loading up the python interpreter and creating the browser object. Then I used Browser.forms to list the available forms. Then I hard coded the form names into the script. If you really wanted to automate it better, you could set up a function or two to handle that step for you. Have it parse for the fields and look for keywords to make sure you fill in the right values.

If you noticed, I have it write a little message to a text file. That was more for me to validate that cron was running every day. If I checked the text file modification date, it would reflect the last time the script ran successfully. If I had thought about it, I would've added some exceptions that would write errors to the log as well.

In any case, this worked out great. I set up the script with it's own directory and set up a cron job to run every morning around 1am. Still waiting on the results for the giveaway, but at least I know I've entered every day possible. And I learned how to use the mechanize library for web scraping and automated form submissions.

-Newt

No comments:

Post a Comment