Saturday, October 21, 2006

Advanced Excel + VBA Web Queries & Parsing Tips

Those of you who've been reading my blog for some time know that one of the techniques I utilize quite often in completing projects is directly querying web pages using the "MSXML.ServerXMLHTTP" object.


Set objHTTP = CreateObject("MSXML2.ServerXMLHTTP")
       URL = ""
       objHTTP.Open "GET", URL, False
       objHTTP.setRequestHeader "User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)"
       objHTTP.send ("")

This works great for many applications. By simply returning the source HTML code for the web page we've requested...

SourceHTMLText = objHTTP.responseText

...and then using regular expressions to identify specify strings to parse out, you can return specific pieces of text or numeric values from the page.

Say you want to create a spreadsheet containing all of the pertinent data for a specific stock, say Google, from Yahoo Finance. You can simply set the URL you wish to parse to "". Then return the source HTML and parse out the current price, daily change, volume, high, low, etc., returning each of these values to specific cells in the spreadsheet.

Extending this, say you have an entire invenstment portfolio in a spreadsheet and you want to run a script that runs each hour (or day or week or every ten minuntes) and updates each stocks' data. You can simply run multiple queries, making the URL you request a dynamic variable, changing out the ticker symbol in the URL's query string.

But what if you encounter a website that doesn't use URLs with query strings? What if it instead does its database queries "in the background" and then returns the results on a more generic URL?

Example: A website, say, has a search function that allows visitors to enter a zip code and returns all gas stations in that zip code and their respective fuel prices. Instead of using URLs with query strings that we could access directly -- say, -- it returns results on a URL like this: What do we do? How did we get here?

Take a look at the source HTML for the page on which the search form is located, and look for the code relating specifically to the seach form (where you type in the zip code and click the "search" or "submit" button), which could look something like this:

< form action="zipcodesearch.asp" method="POST" name="zcsearch" id="zcsearch" >
< input type="text" size="7" maxlength="5" name="zipcode" value="" >
< input type="submit" value="Search" name="btntype" >
< / form >

(Note that I aded extra spaces after the "<"s and before the ">"s above to avoid errors when posting this article on this Blogger blog.)

The file specified in the "form action" tag -- "zipcodesearch.asp" -- is the specific page of this website that your zip code is "posted" to in order to generate the page. So what can we do with it?

We can use VBA to post a zip code value to this URL, which will generate the results page, then parse it. It might look something like this:

Set objHTTP = CreateObject("MSXML2.ServerXMLHTTP")
       URL = ""
       ZipcodeToPost = "zipcode=12345"
       objHTTP.Open "POST", URL, False
       objHTTP.setRequestHeader "lastCached", Now()
       objHTTP.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
       objHTTP.send ZipcodeToPost

Now you can simply return the source HTML for the resulting page, and parse out the strings containing the data you're after, again using regular expressions.

Easy right?

There are countless ways this sort of technique can be used. Compiling stock price data, weather data, product pricing data, even replicating a portion of a website's underlying database, regardless of what sort of data it contains. I've used this technique to pull down data relating to thousands of U.S. patents, for example. If you know of a website that contains data you're interested in, this technique can likely be used to automatically pull that data down into Excel for you.

Questions? Comments? Concerns? Trying it out but getting stuck? Use my Meembo Me chat feature and shoot me a quick question. And remember, you can always just ask me to do it as a $50 Project.




Anonymous said...

what about multiple values to pass to the form?

Excel_Geek said...

Well, you'd string together the input names followed by an equal sign (=) followed by the input value, separated by ampersands (&). Example:

DataToPost = "firstname=excel&lastname=geek"

Then you'd use the same...

objHTTP.send DataToPost

command to post it.

Anonymous said...

I want to get Hotels rate from online booking website using excel web query,i am facing some issues , the results data (tables) are not fixed , i can't link it to my main table ,every time i refresh it change the data location .also ,how can i store this dynamic data so i can generate statistics reports.

David Harris said...

Dear Excel_Geek,

Do you know where I might find more information about (1) how to become good at making web queries in Excel and (2) parameter lists for various websites, please?

This site shows a parameter list ("special tags") for Yahoo and I have tried but failed to find similar lists for MSN and other websites. Any ideas would be gratefully received.

Have you thought about writing an e-book on web queries? There is a surprising lack of information about how to do more than just establish a basic connection. I think the world might quickly become an even better place if advanced web query stuff was more widely known about.

Thank you.

David Harris

Ramon Andrews said...

These are indeed great Excel tips. A lot of users are surely grateful of these useful techniques. Thanks for sharing.

Clarissa Lucas said...

A very informative post on advanced Excel techniques. Those who frequently use such are surely grateful. Thank you for sharing.