Wednesday, February 6, 2013

Powershell Script DIY: Downloading HTML code from the internet using powershell


So you need to pull information from a webpage?  Just the raw HTML can be useful, especially if you are scouring for specific or variable strings on regularly accessed sights, like say a printers toner levels.  
First we start with Set-Location and the variable:

$web

I prefer to name my variables something easy to remember, given the context of the script.  We then tell the variable what it is:

$web = New-Object Net.WebClient

This is telling the variable that the new object is a Net.WebClient, or in simpler terms, it’s out on the internet.  But we need to tell it what to do, so we pipe it through Get-Member:

$web | Get-Member

I usually like to name all of my variables at once, so we can go ahead and name the output file (See my blog post on Creating unique temp file names) :

$foo = "c:\Temp\foo-$(Get-Date -format 'yyyy-MM-dd hh-mm-ss').log"
New-Item "$foo" -itemType File

So far we should be about here:

$web = New-Object Net.WebClient
$web | Get-Member
$foo = "c:\Temp\foo-$(Get-Date -format 'yyyy-MM-dd hh-mm-ss').log"
New-Item "$foo" -itemType File

Don’t forget to create $foo as a file, otherwise you will not have anywhere to put the html code that you pulled.
Now we pull the actual data.  This is accomplished by attaching “DownloadString” to the $web variable, telling the variable what it is going to become:

$web.DownloadString

We need to give it a source.  To do this we encapsulate the URL in quotes, and then in parentheses. The quotes are for the special characters and the parentheses are denoting the source as everything in between them.  The source data should look something like this:

("http://192.168.1.1/thisismywebsite.html")

Attach this to the $web.DownloadString and you get:

$web.DownloadString("http://192.168.1.1/thisismywebsite.html")

What about the output?  You can’t just leave the string just hanging! You need to Pipe it into your $foo variable.  We accomplish this with a set-content command:

| set-content $foo

To produce the complete command:

$web.DownloadString("http://192.168.1.1/thisismywebsite.html") | set-content $foo

Run it all together and you get:

$web = New-Object Net.WebClient
$web | Get-Member
$foo = "c:\Temp\foo-$(Get-Date -format 'yyyy-MM-dd hh-mm-ss').log"
New-Item "$foo" -itemType File
$web.DownloadString("http://192.168.1.1/thisismywebsite.html") | set-content $foo

As always, please feel free to give us a better way to do things with the comments below!




Sources: rob_campbell@centraltechnology.net posted online

No comments:

Post a Comment