Thursday, February 7, 2013

Powershell Script DIY: Removing Special Characters



     Tired of those pesky special characters infiltrating up your data that you pulled from erroneous sources? I have a solution for you!  In this post we are going to use the regex function, and its replace operator with the escape command.  Confused? I aim for you not to be.  First we need to set our location, if you are not already there.  I always default to c:\temp:


                Set-location “C:\temp

Define your variable, let’s say foo, with:


$foo = “foo.txt


Remember to encapsulate the value in quotations and put a $ before foo. Now, we need to pull the data from $foo with:


                Get-Content $foo

This will pull it into memory under the $foo variable.  But we need to do something with the data, so we use the pipe operator |.  Remember to put a space between the variable and the pipe operator:


                Get-Content $foo |

The pipe operator tells powershell that this is only part of a sequence of commands; you can keep adding pipes as long as the command between them is doing something.  I will go into more detail in a future post.  For the mean time, we need to pipe it into ForEach-Object.  This command says just that “For Each Object, do this”.  So, you should look like this:

                Get-Content $foo | ForEach-Object

But of course you need to tell it what to do, and that is where we need the {} to encapsulate your next set of commands:

                Get-Content $foo | ForEach-Object {

We need the Special Pipeline Variable $_ in front of our first –replace command, only the first one as the rest are separated by `.  This denotes everything within the brackets as the current pipeline object:

                Get-Content $foo | ForEach-Object {
                                $_ -replace

Now we tell it to replace a regular expression with [regex] and to escape the special character ? with nothing:

                Get-Content $foo | ForEach-Object {
                                $_ -replace [regex]::Escape(‘?’), ‘’`

Note that the special character is encapsulated by parentheses and quoted within those parentheses.  After the character to be replaced is a , and two ‘ followed by a `.  This is essential as two ‘’ is not one ” and means empty space.  The ` denotes the next line.  You can follow this up with as many characters you want to replace such as the case with % :

Get-Content $foo | ForEach-Object {
                                $_ -replace [regex]::Escape(‘?’), ‘’`
                                     -replace [regex]::Escape(‘%’), ‘’`

In order to finish this off you need to finish the encapsulation with a }:

Get-Content $foo | ForEach-Object {
                                $_ -replace [regex]::Escape(‘?’), ‘’`
                                     -replace [regex]::Escape(‘%’), ‘’`
                }
And use the pipe operator set the contents of your output file.  You can use either:

Set-Content $foolog
                                Or
                Out-file foo.log

For the purpose of this exercise we will use out-file foo.log as you need to define a variable for set-Content.  The last line should look like this:

            } | out-file Foo.log 


The final result should be:

set-location "c:\temp"
$foo = "foo.txt"
Get-Content $foo | ForEach-Object {
                $_ -replace [regex]::Escape('?'), '' `
                     -replace [regex]::Escape('%'), '' `    
    } | out-file foo.log 


If you feed it a file containing:


“%?!?!?!?!?!abc?!?!?!?!?!%”


You should get the result:


“!!!!!abc!!!!!”



You can replace the individual characters “?” and “%” with any character, but there are simpler ways for non special characters. As usual, I always invite comments; I have learned in the past that there is always a better, or different way to do things!


No comments:

Post a Comment