Before reading this post, read the previous part.
A regular expression is a pattern that describes a group of strings.
1. Escaping Characters
\:  Escape metacharacters in regular expression, i.e.
$ * + . ? [ ] ^ { } | ( ) \`As \ itself needs to be escaped in R, R requires double backslash to escape these metacharacters, like \?.
2. Special Metacharacters
- \\t: Tab
- \\n: New line
- \\v: Vertical tab
- \\f: Form feed
- \\r: Carriage return
3. Quantifiers
Quantifiers specify how many times that the preceding pattern should occur.
- *: matches at least 0 times.
- +: matches at least 1 times.
- ?: matches at most 1 times.
- {n}: matches exactly n times.
- {n,}: matches at least n times.
- {,m}: matches at most m times.
- {n,m}: matches between n and m times.
Exercise
4. Position Anchors
- ^: Start of the string.
- $: End of the string.
- \\b: Empty string at either edge of a word.
- \\B: Empty string, not at the edge of a word.
- \\<: Beginning of a word
- \\>: End of a word
5. Characters and Operators
- .: Any single character except- \n
- [...]: a permitted character list. Use- -inside the brackets to specify a range of characters.
- [^...]: an excluded character list. Match any characters except those inside the square brackets.
- |: an OR operator, matches patterns on either side of the- |.
6. Character Classes
- [[:digit:]]or- \\dor- [0-9]: digits- 0 1 2 3 4 5 6 7 8 9
- \\Dor- [^0-9]: non-digits
- [[:lower:]]or- [a-z]: lower-case letters
- [[:upper:]]or- [A-Z]: upper-case letters
- [[:alpha:]]or- [[:lower:][:upper:]]or- [A-z]: alphabetic characters
- [[:alnum:]]or- [[:alpha:][:digit:]]or- [A-z0-9]: alphanumeric characters
- \\wor- [[:alnum:]_]or- [A-z0-9_]: word characters include alphanumeric characters (0-9,a-z,A-Z), (-, - and -) and underscores (_).
- \\Wor- [^A-z0-9_]: non-word characters
- [[:xdigit:]]or- [0-9A-Fa-f]: hexadecimal digits (base 16)- 0 1 2 3 4 5 6 7 8 9 A B C D E F a b c d e f
- [[:blank:]]: space and tab
- [[:space:]]or `\s' : space characters: tab, newline, vertical tab, form feed, carriage return, space
- \\S: not space characters
- [[:punct:]]: punctuation characters
! " # $ % & ' ( ) * + , - . / : ; < = > ? @ [  ] ^ _ ` { | } ~- [[:graph:]]or- [[:alnum:][:punct:]]: graphical (human readable) characters
- [[:print:]]or- [[:alnum:][:punct:]\\s]: printable characters
- [[:cntrl:]]or- \\c: control characters, like- \nor- \retc.
Exercise:
Continue to Part 3.
 
         
         
         
        
Share this post
Twitter
Facebook
LinkedIn
Email