Regular Expressions in JavaScript – thanks MDN docs

Their docs are so useful. I did some handwritten points as reinforcement for me. I know it’s basically the same as any other regex engine, just amused me to do this. I do find it hard to get stylus-handwriting looking quite like the real thing, hence the greyness of the photographed page.

 

Advertisements

Microsoft Word: regex… sort of

So why don’t I just do my text manipulation in e.g. NotePad++, and then paste that into Word? Because the document I’m working on already has colour and formatting that I don’t want to lose.

What I’ve found (I might be a bit off, but I’ve proven enough to understand it for my needs):

Test 1 – note the different cases for [simpler]:

Simpler test by far
Next line of the simpler test

Search string [simple]

  • Non-wild card test finds both entries (meaning it is case in-sensitive)
  • Wild card (see below for the settings used) test finds [simpler] only, meaning a wild card search is case sensitive

Search string [simple*]

  • Non-wild card test finds neither entry (i.e. it treats the [*] in [simpler*] as a literal
  • Wild card test finds [simpler] only, meaning a wild card search is case sensitive

Test 2 – wild card search including paragraph marker

Control Line 1
Simpler test by far
Control Line 2
Next line of the Simpler test

Search string [simple*^13]

In Word regular expressions, this means “find the start of ‘simple’ anywhere on a line of text, then select that up to and including the end of the paragraph (^13 is the paragraph marker). If the search and found text differ at all in case, ignore.

Stating the obvious, a non-wild card search finds neither entry (i.e. it treats the [*^13] in [simpler*^13] as a literal… which it will never find)

A wild card search finds all entries up to and including the paragraph marker.


Test 3 – wild card replace (sure it’s the same text basically, but this now just makes it closer to the use-case I have)

Control Line 1
UniqueId test by far
Control Line 2
Next line of the UniqueId test

I want to delete the lines that contain [UniqueId], anywhere on the line, to give:

Control Line 1
Control Line 2

 

These options achieve that. I was a bit puzzled that this is enough to remove the whole of line 4, when I might have expected it to keep [Next line of the]. But good enough for my needs.

PowerShell: dates and -replace

I sometimes need to do a naive replacement of dates in a typical UK format to a typical US format.

For example, I might want to take

The Battle of Hastings took place on 14/10/1066 between Harold and William. The war between the 2 was triggered by Edward the Confessor’s death on 12/06/1066.

, and convert it to:

The Battle of Hastings took place on 10/14/1066 between Harold and William. The war between the 2 was triggered by Edward the Confessor’s death on 06/12/1066

or to…

The Battle of Hastings took place on 10-14-1066 between Harold and William. The war between the 2 was triggered by Edward the Confessor’s death on 06-12-1066

This line achieves the second transformation, assuming the text is in history.txt. Note the single quotes.

$(Get-Content -Path .\history.txt) -replace '([0-9]{2})/+([0-9]{2})/+([0-9]{4})','$2-$1-$3'

psregex02

Gist.

Credit to Don Jones for a similar example.

NotePad++:  As this is essentially Regex after all, I will mention that you can do the same kind of thing in NotePad++ and indeed in anything that has a Regex capability. The NotePad++ case looks like this, where the “Find what” pattern is exactly as for PowerShell, and the “Replace with” differs a bit: [\2/\1/\3]

psregex03

PowerShell: regex

psregex01

psregex02

cd C:\Sandbox\PowerShell
# this returns true
 $regex = [regex]"xxx"
 $regex.IsMatch("xxx")
# both the next match because {u?} means that u may occur, or may not occur
 $regex = [regex] "colou?r"
 $regex.IsMatch("color")
 $regex.IsMatch("colour")
# both match because { (i|e) ) means that those are mandatory alternatives at that character position
 $regex = [regex]"art(i|e)fact"
 $regex.IsMatch("artefact")
 $regex.IsMatch("artifact")

Running a regex on dates is far more involved. My advice would be to write a .net class/method, and or a date tryparse, with the challenging bit often being the locale and the many possible formats.

Removing blank lines (not strictly PowerShell)

[\n\r]+$