In this part of my C++ tutorial I continue showing tons of examples about regular expressions. If you missed the 1st part watch it first.
This time I’ll show more ways to match what you are trying to grab. We’ll also talk about Greedy vs. Lazy matching, Boundaries, and grabbing multiple subexpressions. This tutorial also contains 2 problems for you to solve. All of the code follows the video below to help you learn.
If you like videos like this consider donating $1, or simply turn off Ad Blocking software. Either helps me to continue making free tutorials.
Code from Video
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 |
#include <cstdlib> #include <iostream> #include <string> #include <vector> #include <ctime> #include <numeric> #include <cmath> #include <sstream> #include <thread> #include <ctime> #include <regex> void PrintMatches(std::string str, std::regex reg){ // Used when your searching a string std::smatch matches; // Determines if there is a match and match // results are returned in matches while(std::regex_search(str, matches, reg)){ // Get the first match std::cout << matches.str(1) << "\n"; // Eliminate the previous match and create // a new string to search str = matches.suffix().str(); } std::cout << "\n"; } int main() { // Everything covered previously // [ ] : Match what is in the brackets // [^ ] : Match anything not in the brackets // . : Match any 1 character or space // + : Match 1 or more of what proceeds // \n : Newline // \d : Any 1 number // \D : Anything but a number // \w : Same as [a-zA-Z0-9_] // \W : Same as [^a-zA-Z0-9_] // \s : Same as [\f\n\r\t\v] // \S : Same as [^\f\n\r\t\v] // {5} : Match 5 of what proceeds the curly brackets // {5,7} : Match values that are between 5 and 7 in length // () : Return only what is between () // ---------- MATCHING ZERO OR ONE ---------- std::string str1 = "cat cats"; std::regex reg1 ("([cat]+s?)"); PrintMatches(str1, reg1); // ---------- MATCHING ZERO OR MORE ---------- // * matches zero or more of what proceeds it std::string str2 = "doctor doctors doctor's"; std::regex reg2 ("([doctor]+['s]{0,2})"); PrintMatches(str2, reg2); // ---------- PROBLEM ---------- // On Windows newlines are some times \n and other times \r\n // Create a regex that will grab each of the lines in this // string, print out the number of matches and each line std::string str3 = "Just some words\n" "and some more\r\n" "and more\n"; std::regex reg3 ("[\r]?\n"); std::string line = std::regex_replace(str3, reg3, " "); std::cout << line << "\n"; // ---------- GREEDY & LAZY MATCHING ---------- // Let's try to grab everything between <name> tags // Because * is greedy (It grabs the biggest match possible) // we can't get what we want, which is each individual tag // match std::string str4 = "<name>Life On Mars</name>" "<name>Freaks and Geeks</name>"; std::regex reg4 ("<name>(.*)</name>"); PrintMatches(str4, reg4); // When we want to grab the smallest match we use *?, +?, or // {n,}? instead std::regex reg5 ("<name>(.*?)</name>"); PrintMatches(str4, reg5); // ---------- WORD BOUNDARIES ---------- // We use word boundaries to define where our matches start // and end // \\b matches the start or end of a word // If we want ape it will match ape and the beginning of apex std::string str6 = "ape at the apex"; std::regex reg6 ("(ape)"); PrintMatches(str6, reg6); // If we use a word boundary std::regex reg7 ("(\\bape\\b)"); PrintMatches(str6, reg7); // ---------- STRING BOUNDARIES ---------- // ^ : Matches the beginning of a string if outside of // a [ ] // $ : Matches the end of a string // Grab everything from the start to the @ std::string str8 = "Match everything up to @"; std::regex reg8 ("(^.*[^@])"); PrintMatches(str8, reg8); // Grab everything from @ to the end of the line std::string str9 = "@ Get this string"; std::regex reg9 ("([^@\\s].*$)"); PrintMatches(str9, reg9); // ---------- PROBLEM ---------- // Get just the numbers minus the area codes from // this string std::string str10 = "206-709-3100 202-456-1111 212-832-2000"; std::regex reg10 (".{3}-(.{8})"); PrintMatches(str10, reg10); // ---------- MULTIPLE SUBEXPRESSIONS ---------- // You can have multiple subexpressions as well // Get both numbers that follow 412 separately std::string str11 = "My number is 904-285-3700"; std::regex reg11 ("(.{3})-(.*)-(.*)"); std::smatch matches; if(std::regex_search(str11,matches,reg11)){ for(int i = 1; i < matches.size(); i++){ std::cout << matches[i] << "\n"; } } return 0; } |
Leave a Reply