Main Classes

basic_regex

正则表达式的模板类,使用正则表达式初始化,需搭配其他helper class使用

    // Matches one or more digits
    const std::string text1="hello 114514hell999o";
    std::string patternMatchOneOrMoreDigits_s="\\d+";
    auto patternMatchOneOrMoreDigits=std::regex(patternMatchOneOrMoreDigits_s,std::regex::basic);
    matchAndPrint(text1,patternMatchOneOrMoreDigits);
void matchAndPrint(const std::string &text,const std::regex &pattern){
    std::sregex_iterator it(text.begin(),text.end(),pattern);
    std::sregex_iterator end;
    int count=0;
    for(;it!=end;++it){
        const std::smatch& match=*it;
        std::cout<<++count<<"."<<std::quoted(match.str())<<'\n';
    }
    std::cout << (count ? "\n" : "no match found\n\n");
}
no match found

sub_match

std::pair的子类,但是不能当作std::pair使用

std::sub_match inherits from std::pair<BidirIt, BidirIt>, although it cannot be treated as a std::pair object because member functions such as assignment will not work as expected.

提供了string的转换

match_results

通常用于储存匹配的结果,其中线性存储了sub_match,用于表示捕获,idx0表示整个匹配到的值,idxn表示第n个捕获

Algorithms

regex_match

用于整段匹配

    //regex_match
    const std::string fnames[] = {"foo.txt", "bar.txt", "baz.dat", "zoidberg"};\
    const std::regex pieces_regex("([a-z]+)\\.([a-z]+)");
    std::smatch pieces_match;
    for (const auto &item: fnames){
        if (std::regex_match(item, pieces_match, pieces_regex))
        {
            std::cout << item << '\n';
            for (std::size_t i = 0; i < pieces_match.size(); ++i)
            {
                std::ssub_match sub_match = pieces_match[i];
                std::string piece = sub_match.str();
                std::cout << "  submatch " << i << ": " << piece << '\n';
            }
        }
    }
foo.txt
  submatch 0: foo.txt
  submatch 1: foo
  submatch 2: txt
bar.txt
  submatch 0: bar.txt
  submatch 1: bar
  submatch 2: txt
baz.dat
  submatch 0: baz.dat
  submatch 1: baz
  submatch 2: dat

返回bool值,结果通常通过传参获取

用于搜索匹配,与regex_match用法差不多

    //regex_search
    std::string lines[] = {"Roses are #ff0000",
                           "violets are #0000ff",
                           "all of my base are belong to you"};

    std::regex color_regex("#([a-f0-9]{2})"
                           "([a-f0-9]{2})"
                           "([a-f0-9]{2})");
    // show contents of marked subexpressions within each match
    std::smatch color_match;
    for (const auto& line : lines)
        if (std::regex_search(line, color_match, color_regex))
        {
            std::cout << "matches for '" << line << "'\n";
            std::cout << "Prefix: '" << color_match.prefix() << "'\n";
            for (std::size_t i = 0; i < color_match.size(); ++i)
                std::cout << i << ": " << color_match[i] << '\n';
            std::cout << "Suffix: '" << color_match.suffix() << "\'\n\n";
        }
    // repeated search (see also std::regex_iterator)
    std::string log(R"(
        Speed:	366
        Mass:	35
        Speed:	378
        Mass:	32
        Speed:	400
	Mass:	30)");
    std::regex r(R"(Speed:\t\d*)");
    for (std::smatch sm; regex_search(log, sm, r);)
    {
        std::cout << sm.str() << '\n';
        log = sm.suffix();
    }
matches for 'Roses are #ff0000'
Prefix: 'Roses are '
0: #ff0000
1: ff
2: 00
3: 00
Suffix: ''

matches for 'violets are #0000ff'
Prefix: 'violets are '
0: #0000ff
1: 00
2: 00
3: ff
Suffix: ''

Speed:  366
Speed:  378
Speed:  400

regex_replace

正则表达式替换函数,结果做传参或返回值获得

    std::string text="Quick brown fox";
    std::regex vowel_re("a|e|i|o|u");
    std::string output1(23,'\0');
    std::regex_replace(output1.begin(),text.begin(),text.end(),vowel_re,"[$&]");
    auto t=std::regex_replace(text,vowel_re,"[$&]");
    if (t!=output1){
        return 1;
    }
    std::cout<<output1<<'\n';
Q[u][i]ck br[o]wn f[o]x

Iterators

分类是LegacyForwardIterator

regex_iterator

解引用获得的是match_results

这里以上文regex_search中的多重匹配的例子,用迭代器的写法:

//std::regex_iterator
    log=R"(
        Speed:	366
        Mass:	35
        Speed:	378
        Mass:	32
        Speed:	400
	Mass:	30)";
    auto words_begin = std::sregex_iterator(log.begin(), log.end(), r);
    auto words_end = std::sregex_iterator();
    std::cout << "Found " << std::distance(words_begin, words_end) << " words:\n";
    for (auto i=std::sregex_iterator(log.begin(),log.end(),r);i!=words_end;++i) {
        const std::smatch& match= *i;
        std::cout<<match.str()<<'\n';
    }
Found 3 words:
Speed:  366
Speed:  378
Speed:  400

regex_token_iterator

解引用获得的是sub_matches

What is the difference between regex_token_iterator and regex_iterator?

There is indeed a difference between, if we look at cppreference it describes std::regex_iterator as follows:

std::regex_iterator is a read-only ForwardIterator that accesses the individual matches of a regular expression within the underlying character sequence.

and std::regex_token_iterator as:

std::regex_token_iterator is a read-only ForwardIterator that accesses the individual sub-matches of every match of a regular expression within the underlying character sequence. It can also be used to access the parts of the sequence that were not matched by the given regular expression (e.g. as a tokenizer).

So std::regex_token_iterator allows you to also match the non-matched tokens or the n-th sub-expression.

也就是说,regex_token_iterator通过传入的tocken来决定获取非匹配/全部匹配/第n个捕获、

the index of the submatch that should be returned. "0" represents the entire match, and "-1" represents the parts that are not matched (e.g, the stuff between matches)

    //std::regex_token_iterator
    std::copy(std::sregex_token_iterator(log.begin(),log.end(),r,-1),
              std::sregex_token_iterator(),
              std::ostream_iterator<std::string>(std::cout));
    std::copy(std::sregex_token_iterator(log.begin(),log.end(),r,0),
              std::sregex_token_iterator(),
              std::ostream_iterator<std::string>(std::cout));
    std::cout<<'\n';
    // Iterating the first submatches
    const std::string html = R"(<p><a href="http://google.com">google</a> )"
                             R"(< a HREF ="http://cppreference.com">cppreference</a>\n</p>)";
    const std::regex url_re(R"!!(<\s*A\s+[^>]*href\s*=\s*"([^"]*)")!!", std::regex::icase);
    std::copy(std::sregex_token_iterator(html.begin(), html.end(), url_re, 1),
              std::sregex_token_iterator(),
              std::ostream_iterator<std::string>(std::cout, "\n"));
        Mass:   35

        Mass:   32

        Mass:   30Speed:        366Speed:       378Speed:       400
http://google.com
http://cppreference.com

exceptions

regex_error

用于处理regex异常,这里给上官方示例

    try
    {
        std::regex re("[a-b][a");
    }
    catch (const std::regex_error& e)
    {
        std::cout << "regex_error caught: " << e.what() << '\n';
        if (e.code() == std::regex_constants::error_brack)
            std::cout << "The code was error_brack\n";
    }