Before reading this post, read the previous part.
A regular expression is a pattern that describes a group of strings.
1. Escaping Characters
\: Escape metacharacters in regular expression, i.e.
$ * + . ? [ ] ^ { } | ( ) \`As \ itself needs to be escaped in R, R requires double backslash to escape these metacharacters, like \?.
2. Special Metacharacters
\\t: Tab\\n: New line\\v: Vertical tab\\f: Form feed\\r: Carriage return
3. Quantifiers
Quantifiers specify how many times that the preceding pattern should occur.
*: matches at least 0 times.+: matches at least 1 times.?: matches at most 1 times.{n}: matches exactly n times.{n,}: matches at least n times.{,m}: matches at most m times.{n,m}: matches between n and m times.
Exercise
4. Position Anchors
^: Start of the string.$: End of the string.\\b: Empty string at either edge of a word.\\B: Empty string, not at the edge of a word.\\<: Beginning of a word\\>: End of a word
5. Characters and Operators
.: Any single character except\n[...]: a permitted character list. Use-inside the brackets to specify a range of characters.[^...]: an excluded character list. Match any characters except those inside the square brackets.|: an OR operator, matches patterns on either side of the|.
6. Character Classes
[[:digit:]]or\\dor[0-9]: digits0 1 2 3 4 5 6 7 8 9\\Dor[^0-9]: non-digits[[:lower:]]or[a-z]: lower-case letters[[:upper:]]or[A-Z]: upper-case letters[[:alpha:]]or[[:lower:][:upper:]]or[A-z]: alphabetic characters[[:alnum:]]or[[:alpha:][:digit:]]or[A-z0-9]: alphanumeric characters\\wor[[:alnum:]_]or[A-z0-9_]: word characters include alphanumeric characters (0-9,a-z,A-Z), (-, - and -) and underscores (_).\\Wor[^A-z0-9_]: non-word characters[[:xdigit:]]or[0-9A-Fa-f]: hexadecimal digits (base 16)0 1 2 3 4 5 6 7 8 9 A B C D E F a b c d e f[[:blank:]]: space and tab[[:space:]]or `\s' : space characters: tab, newline, vertical tab, form feed, carriage return, space\\S: not space characters[[:punct:]]: punctuation characters
! " # $ % & ' ( ) * + , - . / : ; < = > ? @ [ ] ^ _ ` { | } ~[[:graph:]]or[[:alnum:][:punct:]]: graphical (human readable) characters[[:print:]]or[[:alnum:][:punct:]\\s]: printable characters[[:cntrl:]]or\\c: control characters, like\nor\retc.
Exercise:
Continue to Part 3.
Share this post
Twitter
Facebook
LinkedIn
Email