Main Classes
basic_regex
正则表达式的模板类,使用正则表达式初始化,需搭配其他helper class使用
// Matches one or more digits
const std::string text1="hello 114514hell999o";
std::string patternMatchOneOrMoreDigits_s="\\d+";
auto patternMatchOneOrMoreDigits=std::regex(patternMatchOneOrMoreDigits_s,std::regex::basic);
matchAndPrint(text1,patternMatchOneOrMoreDigits);
void matchAndPrint(const std::string &text,const std::regex &pattern){
std::sregex_iterator it(text.begin(),text.end(),pattern);
std::sregex_iterator end;
int count=0;
for(;it!=end;++it){
const std::smatch& match=*it;
std::cout<<++count<<"."<<std::quoted(match.str())<<'\n';
}
std::cout << (count ? "\n" : "no match found\n\n");
}
no match found
sub_match
std::pair的子类,但是不能当作std::pair使用
std::sub_match
inherits from std::pair<BidirIt, BidirIt>, although it cannot be treated as a std::pair object because member functions such as assignment will not work as expected.
提供了string的转换
match_results
通常用于储存匹配的结果,其中线性存储了sub_match,用于表示捕获,idx0表示整个匹配到的值,idxn表示第n个捕获
Algorithms
regex_match
用于整段匹配
//regex_match
const std::string fnames[] = {"foo.txt", "bar.txt", "baz.dat", "zoidberg"};\
const std::regex pieces_regex("([a-z]+)\\.([a-z]+)");
std::smatch pieces_match;
for (const auto &item: fnames){
if (std::regex_match(item, pieces_match, pieces_regex))
{
std::cout << item << '\n';
for (std::size_t i = 0; i < pieces_match.size(); ++i)
{
std::ssub_match sub_match = pieces_match[i];
std::string piece = sub_match.str();
std::cout << " submatch " << i << ": " << piece << '\n';
}
}
}
foo.txt
submatch 0: foo.txt
submatch 1: foo
submatch 2: txt
bar.txt
submatch 0: bar.txt
submatch 1: bar
submatch 2: txt
baz.dat
submatch 0: baz.dat
submatch 1: baz
submatch 2: dat
返回bool值,结果通常通过传参获取
regex_search
用于搜索匹配,与regex_match用法差不多
//regex_search
std::string lines[] = {"Roses are #ff0000",
"violets are #0000ff",
"all of my base are belong to you"};
std::regex color_regex("#([a-f0-9]{2})"
"([a-f0-9]{2})"
"([a-f0-9]{2})");
// show contents of marked subexpressions within each match
std::smatch color_match;
for (const auto& line : lines)
if (std::regex_search(line, color_match, color_regex))
{
std::cout << "matches for '" << line << "'\n";
std::cout << "Prefix: '" << color_match.prefix() << "'\n";
for (std::size_t i = 0; i < color_match.size(); ++i)
std::cout << i << ": " << color_match[i] << '\n';
std::cout << "Suffix: '" << color_match.suffix() << "\'\n\n";
}
// repeated search (see also std::regex_iterator)
std::string log(R"(
Speed: 366
Mass: 35
Speed: 378
Mass: 32
Speed: 400
Mass: 30)");
std::regex r(R"(Speed:\t\d*)");
for (std::smatch sm; regex_search(log, sm, r);)
{
std::cout << sm.str() << '\n';
log = sm.suffix();
}
matches for 'Roses are #ff0000'
Prefix: 'Roses are '
0: #ff0000
1: ff
2: 00
3: 00
Suffix: ''
matches for 'violets are #0000ff'
Prefix: 'violets are '
0: #0000ff
1: 00
2: 00
3: ff
Suffix: ''
Speed: 366
Speed: 378
Speed: 400
regex_replace
正则表达式替换函数,结果做传参或返回值获得
std::string text="Quick brown fox";
std::regex vowel_re("a|e|i|o|u");
std::string output1(23,'\0');
std::regex_replace(output1.begin(),text.begin(),text.end(),vowel_re,"[$&]");
auto t=std::regex_replace(text,vowel_re,"[$&]");
if (t!=output1){
return 1;
}
std::cout<<output1<<'\n';
Q[u][i]ck br[o]wn f[o]x
Iterators
regex_iterator
解引用获得的是match_results
这里以上文regex_search中的多重匹配的例子,用迭代器的写法:
//std::regex_iterator
log=R"(
Speed: 366
Mass: 35
Speed: 378
Mass: 32
Speed: 400
Mass: 30)";
auto words_begin = std::sregex_iterator(log.begin(), log.end(), r);
auto words_end = std::sregex_iterator();
std::cout << "Found " << std::distance(words_begin, words_end) << " words:\n";
for (auto i=std::sregex_iterator(log.begin(),log.end(),r);i!=words_end;++i) {
const std::smatch& match= *i;
std::cout<<match.str()<<'\n';
}
Found 3 words:
Speed: 366
Speed: 378
Speed: 400
regex_token_iterator
解引用获得的是sub_matches
What is the difference between regex_token_iterator and regex_iterator?
There is indeed a difference between, if we look at cppreference it describes std::regex_iterator as follows:
std::regex_iterator is a read-only ForwardIterator that accesses the individual matches of a regular expression within the underlying character sequence.
and std::regex_token_iterator as:
std::regex_token_iterator is a read-only ForwardIterator that accesses the individual sub-matches of every match of a regular expression within the underlying character sequence. It can also be used to access the parts of the sequence that were not matched by the given regular expression (e.g. as a tokenizer).
So
std::regex_token_iterator
allows you to also match the non-matched tokens or then-th
sub-expression.
也就是说,regex_token_iterator通过传入的tocken来决定获取非匹配/全部匹配/第n个捕获、
the index of the submatch that should be returned. "0" represents the entire match, and "-1" represents the parts that are not matched (e.g, the stuff between matches)
//std::regex_token_iterator
std::copy(std::sregex_token_iterator(log.begin(),log.end(),r,-1),
std::sregex_token_iterator(),
std::ostream_iterator<std::string>(std::cout));
std::copy(std::sregex_token_iterator(log.begin(),log.end(),r,0),
std::sregex_token_iterator(),
std::ostream_iterator<std::string>(std::cout));
std::cout<<'\n';
// Iterating the first submatches
const std::string html = R"(<p><a href="http://google.com">google</a> )"
R"(< a HREF ="http://cppreference.com">cppreference</a>\n</p>)";
const std::regex url_re(R"!!(<\s*A\s+[^>]*href\s*=\s*"([^"]*)")!!", std::regex::icase);
std::copy(std::sregex_token_iterator(html.begin(), html.end(), url_re, 1),
std::sregex_token_iterator(),
std::ostream_iterator<std::string>(std::cout, "\n"));
Mass: 35
Mass: 32
Mass: 30Speed: 366Speed: 378Speed: 400
http://google.com
http://cppreference.com
exceptions
regex_error
用于处理regex异常,这里给上官方示例
try
{
std::regex re("[a-b][a");
}
catch (const std::regex_error& e)
{
std::cout << "regex_error caught: " << e.what() << '\n';
if (e.code() == std::regex_constants::error_brack)
std::cout << "The code was error_brack\n";
}
评论区