1 minute read

Several code examples of using regular expressions with R for string processing. This is one part of A Beginner Guide to String Pattern Matching in R by Regular Expression.


Examples

Chunk 1: Remove all the word a from a string vector

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJsaWJyYXJ5KHN0cmluZ3IpXG4jZGVmaW5lIGEgcGF0dGVybiBmb3IgYSB3b3JkICdhJ1xucGF0dGVybiA8LSBcIlxcXFxiYVxcXFxiXCJcblxuY29sbGVjdGlvbiA8LSBjKFwiQSBib29rXCIsIFwiV2hhdCBhIGJlYXV0aWZ1bCBkYXkhXCIpXG5jb2xsZWN0aW9uLmxvd2VyY2FzZSA8LSB0b2xvd2VyKGNvbGxlY3Rpb24pXG5cbiMgcmVtb3ZlIGEgZnJvbSB0aGUgY29sbGVjdGlvblxuc3RyX3JlcGxhY2VfYWxsKGNvbGxlY3Rpb24ubG93ZXJjYXNlLCBwYXR0ZXJuLCBcIlwiKSJ9

Chunk 2: Removing stop words

Stop words are the non-content common words that are usually dropped before starting a text analysis project. To remove specific stop words, we can create a pattern which use OR operator to combine all the stop words.

The following is a sample snippet for removing certain stop words by using gsub function.

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJjb2xsZWN0aW9uLnJhdyA8LSBjKFwiVGhpcyBpcyBnb29kIG5ld3NcIiwgXCJhcmUgeW91IG9rYXlcIiwgXCJvbiB0aGUgZGVza1wiKVxuXG53b3JkcyA8LSBjKFwidGhpc1wiLCBcImlzXCIsIFwiYXJlXCIsIFwiYVwiLCBcInRoZVwiLCBcImhlXCIsIFwieW91XCIsIFwieW91clwiLCBcImFuZFwiLCBcIm9uXCIpXG5cbnBhdHRlcm4gPC0gXCJcIlxuZm9yKGkgaW4gMTpsZW5ndGgod29yZHMpKXtcbiAgd29yZHNbaV0gPC0gcGFzdGUoXCJcXFxcYlwiLCB3b3Jkc1tpXSwgXCJcXFxcYlwiLCBzZXA9XCJcIilcbiAgcGF0dGVybiA8LSBwYXN0ZSh3b3Jkc1tpXSwgcGF0dGVybiwgc2VwPVwifFwiKVxufVxuXG5yZXBsYWNlbWVudCA8LSBcIlwiXG5cbmNvbGxlY3Rpb24ubmV3IDwtIGdzdWIocGF0dGVybiwgcmVwbGFjZW1lbnQsIGNvbGxlY3Rpb24ucmF3LCBpZ25vcmUuY2FzZSA9IFRSVUUpIn0=

Chunk 3: Replace whitespace characters with a single space

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiJkYXRhIDwtIGMoXCJleHRyZW1lbHkgIHNpbXBsZVxuXG4gICAgICAgICAgXCIsXG4gICAgICAgICAgXCJEb3VibGUgICAgZG91YmxlXCIsXG4gICAgICAgICAgXCJub1xuICAgICAgICAgIGRvdWJsZVwiKVxuXG5cbndoaXRlc3BhY2UgPC0gXCJcXFxcc3sxLH1cIlxuXG5yZXBsYWNlbWVudCA8LSBcIiBcIiAjIHJlcGxhY2Ugd2hpdGVzcGFjZSB3aXRoIGEgc2luZ2xlIHNwYWNlXG5cbnN0cl9yZXBsYWNlX2FsbChkYXRhLCB3aGl0ZXNwYWNlLCByZXBsYWNlbWVudCkifQ==

comments powered by Disqus