Tuesday, August 12, 2008

Parsing Google Search Results for URLs

I had a visitor on the blog yesterday who was interested in my old Google search results parsing code. Apparently he found Goohackle's Gooparser and liked what it does -- i.e. return just a clean list of URLs from the search results of Google -- but wanted a way around the limit for the number of results and a way to pull them directly into Excel.


I cleaned up the old code for him. The result is a nice, clean worksheet that returns just the URLs from Google search results pages. You can set how many results per page and pages you'd like to return. Be careful, though, as if you hit Google too many times too quickly, they'll start to block you, asking you for your API key, etc. I've built in a random pause (3 to 7 seconds) between the pages in a multi-page query in an effort to prevent this from happening, but no guarantees. Google is a pretty sophisticated outfit, so they may still be able to detect an automated harvesting of data without using the API.

Geoff, enjoy the file. Insiders, the file's on the way.



1 comment:

Camped said...

Would you be willing to undertake something that's never been done? I don't know of any bookmarking service with a spreadsheet view. Can you pull in google bookmarks to excel and save back changes? I know GMarks does this in Firefox but the author doesn't value the power of a spreadsheet interface.