Wednesday, July 4, 2012

Regex Toolkit - Some Important Regular Expressions for Data Cleansing

Here are some replace regex expressions I found myself using over and over for data cleansing

Match starts with
^pattern


Match ends with
pattern$

Match last two words only
\s(\S+)\s(\S+)$

Match last word only
\s(\S+)$

Match www pattern (e.g. ww2.blah.com)
^w{2,3}[0-9]{0,5}[.]

Match Case Insensitive
(?i)pattern


Multiple Spaces
\s+     (one or more space)
 \s{2,} (two or more spaces)

Match text in parenthesis(inclusive)

\(.*\)

Match text containing period
text\.

No comments:

Post a Comment