{"id":638,"date":"2014-11-07T11:45:17","date_gmt":"2014-11-07T04:45:17","guid":{"rendered":"http:\/\/blog.trichev.com\/?p=638"},"modified":"2017-08-10T09:24:37","modified_gmt":"2017-08-10T02:24:37","slug":"perl-regular-expressions","status":"publish","type":"post","link":"https:\/\/trichev.com\/blog\/2014\/11\/07\/perl-regular-expressions\/","title":{"rendered":"Perl. Regular expressions"},"content":{"rendered":"<table border=\"0\" cellspacing=\"15\" cellpadding=\"0\">\n<tbody>\n<tr>\n<td valign=\"top\">\n<h2>Metacharacters<\/h2>\n<table border=\"1\" cellspacing=\"0\" cellpadding=\"4\">\n<tbody>\n<tr>\n<th>char<\/th>\n<th>meaning<\/th>\n<\/tr>\n<tr>\n<th><big><code>^<\/code><\/big><\/th>\n<td>beginning of string<\/td>\n<\/tr>\n<tr>\n<th><big><code>$<\/code><\/big><\/th>\n<td>end of string<\/td>\n<\/tr>\n<tr>\n<th><big><code>.<\/code><\/big><\/th>\n<td>any character except newline<\/td>\n<\/tr>\n<tr>\n<th><big><code>*<\/code><\/big><\/th>\n<td>match 0 or more times<\/td>\n<\/tr>\n<tr>\n<th><big><code>+<\/code><\/big><\/th>\n<td>match 1 or more times<\/td>\n<\/tr>\n<tr>\n<th><big><code>?<\/code><\/big><\/th>\n<td>match 0 or 1 times; <em>or<\/em>: shortest match<\/td>\n<\/tr>\n<tr>\n<th><big><code>|<\/code><\/big><\/th>\n<td>alternative<\/td>\n<\/tr>\n<tr>\n<th><big><code>( )<\/code><\/big><\/th>\n<td>grouping; \u201cstoring\u201d<\/td>\n<\/tr>\n<tr>\n<th><big><code>[ ]<\/code><\/big><\/th>\n<td>set of characters<\/td>\n<\/tr>\n<tr>\n<th><big><code>{ }<\/code><\/big><\/th>\n<td>repetition modifier<\/td>\n<\/tr>\n<tr>\n<th><big><code>\\<\/code><\/big><\/th>\n<td>quote or special<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p><small>To present a metacharacter as a data character standing for itself, precede it with <code>\\<\/code> (e.g. <code>\\.<\/code> matches the full stop character <code>.<\/code> only).<\/small><\/p>\n<p><small>In the table above, the characters themselves, in the first column, are links to descriptions of characters in my <cite><a href=\"https:\/\/www.cs.tut.fi\/%7Ejkorpela\/latin1\">The ISO Latin 1 character repertoire\u00a0&#8211; a description with usage notes<\/a><\/cite>. Note that the physical appearance (<a title=\"The glyph concept (in: A tutorial on character code issues)\" href=\"https:\/\/www.cs.tut.fi\/%7Ejkorpela\/chars.html#glyph\">glyph<\/a>) of a character may vary from one device or program or font to another.<\/small><\/td>\n<td valign=\"top\">\n<h2>Repetition<\/h2>\n<table border=\"1\" cellspacing=\"0\" cellpadding=\"4\">\n<tbody>\n<tr>\n<td><var>a<\/var><code>*<\/code><\/td>\n<td>zero or more <var>a<\/var>\u2019s<\/td>\n<\/tr>\n<tr>\n<td><var>a<\/var><code>+<\/code><\/td>\n<td>one or more <var>a<\/var>\u2019s<\/td>\n<\/tr>\n<tr>\n<td><var>a<\/var><code>?<\/code><\/td>\n<td>zero or one <var>a<\/var>\u2019s (i.e., optional <var>a<\/var>)<\/td>\n<\/tr>\n<tr>\n<td><var>a<\/var><code>{<\/code><var>m<\/var><code>}<\/code><\/td>\n<td>exactly <var>m<\/var> <var>a<\/var>\u2019s<\/td>\n<\/tr>\n<tr>\n<td><var>a<\/var><code>{<\/code><var>m<\/var><code>,}<\/code><\/td>\n<td>at least <var>m<\/var> <var>a<\/var>\u2019s<\/td>\n<\/tr>\n<tr>\n<td><var>a<\/var><code>{<\/code><var>m<\/var><code>,<\/code><var>n<\/var><code>}<\/code><\/td>\n<td>at least <var>m<\/var> but at most <var>n <\/var><var>a<\/var>\u2019s<\/td>\n<\/tr>\n<tr>\n<td><var>repetition<\/var><code>?<\/code><\/td>\n<td>same as <var>repetition<\/var> but the <em>shortest<\/em> match is taken<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p><small>Read the notation <var>a<\/var>\u2019s as \u201coccurrences of strings, each of which matches the pattern <var>a<\/var>\u201d. Read <var>repetition<\/var> as any of the repetition expressions listed above it. Shortest match means that the shortest string matching the pattern is taken. The default is <a href=\"https:\/\/www.cs.tut.fi\/%7Ejkorpela\/perl\/course.html#greedy\">\u201cgreedy matching\u201d<\/a>, which finds the longest match. The <var>repetition<\/var><code>?<\/code> construct was introduced in Perl version\u00a05.<\/small><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h2><a name=\"esc\"><\/a>Special notations with <code>\\<\/code><\/h2>\n<table>\n<tbody>\n<tr>\n<td>\n<table border=\"1\" cellspacing=\"0\" cellpadding=\"4\">\n<caption>Single characters<\/caption>\n<tbody>\n<tr>\n<td><code>\\t<\/code><\/td>\n<td>tab<\/td>\n<\/tr>\n<tr>\n<td><code>\\n<\/code><\/td>\n<td>newline<\/td>\n<\/tr>\n<tr>\n<td><code>\\r<\/code><\/td>\n<td>return (CR)<\/td>\n<\/tr>\n<tr>\n<td><code>\\x<\/code><var>hh<\/var><\/td>\n<td>character with hex. code <var>hh<\/var><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/td>\n<td valign=\"top\">\n<table border=\"1\" cellspacing=\"0\" cellpadding=\"4\">\n<caption>\u201cZero-width assertions\u201d<\/caption>\n<tbody>\n<tr>\n<td><code>\\b<\/code><\/td>\n<td>\u201cword\u201d boundary<\/td>\n<\/tr>\n<tr>\n<td><code>\\B<\/code><\/td>\n<td>not a \u201cword\u201d boundary<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<table border=\"1\" cellspacing=\"0\" cellpadding=\"4\">\n<caption>Matching<\/caption>\n<tbody>\n<tr>\n<td><code>\\w<\/code><\/td>\n<td>matches any <em>single<\/em> character classified as a \u201cword\u201d character (alphanumeric or \u201c<code>_<\/code>\u201d)<\/td>\n<\/tr>\n<tr>\n<td><code>\\W<\/code><\/td>\n<td>matches any non-\u201cword\u201d character<\/td>\n<\/tr>\n<tr>\n<td><code>\\s<\/code><\/td>\n<td>matches any whitespace character (space, tab, newline)<\/td>\n<\/tr>\n<tr>\n<td><code>\\S<\/code><\/td>\n<td>matches any non-whitespace character<\/td>\n<\/tr>\n<tr>\n<td><code>\\d<\/code><\/td>\n<td>matches any digit character, equiv. to <code>[0-9]<\/code><\/td>\n<\/tr>\n<tr>\n<td><code>\\D<\/code><\/td>\n<td>matches any non-digit character<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h2><a name=\"sets\"><\/a>Character sets: specialities inside <code>[<\/code>&#8230;<code>]<\/code><\/h2>\n<p><strong class=\"warning\">Different meanings<\/strong> apply inside a character set (\u201ccharacter class\u201d) denoted by <code>[<\/code>&#8230;<code>]<\/code> so that, <strong>instead of<\/strong> the normal rules given here, the following apply:<\/p>\n<table border=\"1\" cellspacing=\"0\" cellpadding=\"4\">\n<tbody>\n<tr>\n<td><code>[<\/code><var>characters<\/var><code>]<\/code><\/td>\n<td>matches any of the characters in the sequence<\/td>\n<\/tr>\n<tr>\n<td><code>[<\/code><var>x<\/var><code>-<\/code><var>y<\/var><code>]<\/code><\/td>\n<td>matches any of the characters from <var>x<\/var> to <var>y<\/var> (inclusively) in the ASCII code<\/td>\n<\/tr>\n<tr>\n<td><code>[\\-]<\/code><\/td>\n<td>matches the hyphen character\u00a0\u201c<code>-<\/code>\u201d<\/td>\n<\/tr>\n<tr>\n<td>[<code>\\n<\/code>]<\/td>\n<td>matches the newline; other <a href=\"https:\/\/www.cs.tut.fi\/%7Ejkorpela\/perl\/regexp.html#esc\">single character denotations with\u00a0<code>\\<\/code><\/a> apply normally, too<\/td>\n<\/tr>\n<tr>\n<td><code>[^<\/code><var>something<\/var><code>]<\/code><\/td>\n<td>matches any character <em>except<\/em> those that <code>[<\/code><var>something<\/var><code>]<\/code> denotes; that is, immediately after the leading \u201c<code>[<\/code>\u201d, the circumflex \u201c<code>^<\/code>\u201d means \u201cnot\u201d applied to all of the rest<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h2><a name=\"ex\"><\/a>Examples<\/h2>\n<table border=\"1\" cellspacing=\"0\" cellpadding=\"4\">\n<tbody>\n<tr align=\"left\">\n<th>expression<\/th>\n<th>matches&#8230;<\/th>\n<\/tr>\n<tr>\n<td><code>abc<\/code><\/td>\n<td><code>abc<\/code> (that exact character sequence, but anywhere in the string)<\/td>\n<\/tr>\n<tr>\n<td><code>^abc<\/code><\/td>\n<td><code>abc<\/code> at the <em>beginning<\/em> of the string<\/td>\n<\/tr>\n<tr>\n<td><code>abc$<\/code><\/td>\n<td><code>abc<\/code> at the <em>end<\/em> of the string<\/td>\n<\/tr>\n<tr>\n<td><code>a|b<\/code><\/td>\n<td>either of <code>a<\/code> and <code>b<\/code><\/td>\n<\/tr>\n<tr>\n<td><code>^abc|abc$<\/code><\/td>\n<td>the string <code>abc<\/code> at the beginning or at the end of the string<\/td>\n<\/tr>\n<tr>\n<td><code>ab{2,4}c<\/code><\/td>\n<td>an <code>a<\/code> followed by two, three or four <code>b<\/code>\u2019s followed by a <code>c<\/code><\/td>\n<\/tr>\n<tr>\n<td><code>ab{2,}c<\/code><\/td>\n<td>an <code>a<\/code> followed by at least two <code>b<\/code>\u2019s followed by a <code>c<\/code><\/td>\n<\/tr>\n<tr>\n<td><code>ab*c<\/code><\/td>\n<td>an <code>a<\/code> followed by any number (zero or more) of <code>b<\/code>\u2019s followed by a <code>c<\/code><\/td>\n<\/tr>\n<tr>\n<td><code>ab+c<\/code><\/td>\n<td>an <code>a<\/code> followed by one or more <code>b<\/code>\u2019s followed by a <code>c<\/code><\/td>\n<\/tr>\n<tr>\n<td><code>ab?c<\/code><\/td>\n<td>an <code>a<\/code> followed by an optional <code>b<\/code> followed by a <code>c<\/code>; that is, either <code>abc<\/code> or <code>ac<\/code><\/td>\n<\/tr>\n<tr>\n<td><code>a.c<\/code><\/td>\n<td>an <code>a<\/code> followed by any single character (not newline) followed by a <code>c<\/code><\/td>\n<\/tr>\n<tr>\n<td><code>a\\.c<\/code><\/td>\n<td><code>a.c<\/code> exactly<\/td>\n<\/tr>\n<tr>\n<td><code>[abc]<\/code><\/td>\n<td>any one of <code>a<\/code>, <code>b<\/code> and <code>c<\/code><\/td>\n<\/tr>\n<tr>\n<td><code>[Aa]bc<\/code><\/td>\n<td>either of <code>Abc<\/code> and <code>abc<\/code><\/td>\n<\/tr>\n<tr>\n<td><code>[abc]+<\/code><\/td>\n<td>any (nonempty) string of <code>a<\/code>\u2019s, <code>b<\/code>\u2019s and <code>c\u2019s<\/code> (such as <code>a<\/code>, <code>abba<\/code>, <code>acbabcacaa<\/code>)<\/td>\n<\/tr>\n<tr>\n<td><code>[^abc]+<\/code><\/td>\n<td>any (nonempty) string which does <em>not<\/em> contain any of <code>a<\/code>, <code>b<\/code> and <code>c<\/code> (such as <code>defg<\/code>)<\/td>\n<\/tr>\n<tr>\n<td><code>\\d\\d<\/code><\/td>\n<td>any two decimal digits, such as <code>42<\/code>; same as \\d{2}<\/td>\n<\/tr>\n<tr>\n<td><code>\\w+<\/code><\/td>\n<td>a \u201cword\u201d: a nonempty sequence of alphanumeric characters and low lines (underscores), such as <code>foo<\/code> and <code>12bar8<\/code> and <code>foo_1<\/code><\/td>\n<\/tr>\n<tr>\n<td><code>100\\s*mk<\/code><\/td>\n<td>the strings <code>100<\/code> and <code>mk<\/code> optionally separated by any amount of white space (spaces, tabs, newlines)<\/td>\n<\/tr>\n<tr>\n<td><code>abc\\b<\/code><\/td>\n<td><code>abc<\/code> when followed by a word boundary (e.g. in <code>abc!<\/code> but not in <code>abcd<\/code>)<\/td>\n<\/tr>\n<tr>\n<td><code>perl\\B<\/code><\/td>\n<td><code>perl<\/code> when <em>not<\/em> followed by a word boundary (e.g. in <code>perlert<\/code> but not in <code>perl stuff<\/code>)<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h3>Examples of simple use in Perl statements<\/h3>\n<p><small>These examples use very simple regexps only. The intent is just to show <em>contexts<\/em> where regexps might be used, as well as the effect of some \u201cflags\u201d to matching and replacements. Note in particular that matching is by default <em>case-sensitive<\/em> (<code>Abc<\/code> does not match <code>abc<\/code> unless specified otherwise).<\/small><\/p>\n<p><code>s\/foo\/bar\/;<\/code><br \/>\nreplaces the <em>first<\/em> occurrence of the exact character sequence <code>foo<\/code> in the \u201ccurrent string\u201d (in special variable <code class=\"rpad\">$_<\/code>) by the character sequence <code>bar<\/code>; for example, <code>foolish bigfoot<\/code> would become <code>barlish bigfoot<\/code><\/p>\n<p><code>s\/foo\/bar\/g;<\/code><br \/>\nreplaces <em>any<\/em> occurrence of the exact character sequence <code>foo<\/code> in the \u201ccurrent string\u201d by the character sequence <code>bar<\/code>; for example, <code>foolish bigfoot<\/code> would become <code>barlish bigbart<\/code><\/p>\n<p><code>s\/foo\/bar\/gi;<\/code><br \/>\nreplaces any occurrence of <code>foo<\/code> <em>case-insensitively<\/em> in the \u201ccurrent string\u201d by the character sequence <code>bar<\/code> (e.g. <code>Foo<\/code> and <code>FOO<\/code> get replaced by <code>bar<\/code> too)<\/p>\n<p><code>if(m\/foo\/)<\/code>&#8230;<br \/>\ntests whether the current string contains the string <code>foo<\/code><\/p>\n<p>Links: <a href=\"https:\/\/www.cs.tut.fi\/~jkorpela\/perl\/regexp.html\">https:\/\/www.cs.tut.fi\/~jkorpela\/perl\/regexp.html<\/a><br \/>\n<a href=\"http:\/\/www.troubleshooters.com\/codecorn\/littperl\/perlreg.htm\">http:\/\/www.troubleshooters.com\/codecorn\/littperl\/perlreg.htm<\/a><br \/>\n<a href=\"http:\/\/www.skillz.ru\/dev\/php\/article-Regulyarnye_vyrazheniya_dlya_chaynikov.html\">http:\/\/www.skillz.ru\/dev\/php\/article-Regulyarnye_vyrazheniya_dlya_chaynikov.html<\/a><br \/>\n<a href=\"http:\/\/www.ultraedit.com\/support\/tutorials_power_tips\/ultraedit\/non-greedy-perl-regex.html\">http:\/\/www.ultraedit.com\/support\/tutorials_power_tips\/ultraedit\/non-greedy-perl-regex.html<\/a><br \/>\n<a href=\"http:\/\/www.regular-expressions.info\/wordboundaries.html\">http:\/\/www.regular-expressions.info\/wordboundaries.html<\/a><br \/>\n<a href=\"http:\/\/www.cs.xu.edu\/csci380\/99s\/perlstuff\/anchorreg.html\">http:\/\/www.cs.xu.edu\/csci380\/99s\/perlstuff\/anchorreg.html<\/a><br \/>\n<a href=\"http:\/\/www.tutorialspoint.com\/perl\/perl_regular_expression.htm\">http:\/\/www.tutorialspoint.com\/perl\/perl_regular_expression.htm<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Metacharacters char meaning ^ beginning of string $ end of string . any character except newline * match 0 or more times + match 1 or more times ? match 0 or 1 times; or: shortest match | alternative ( ) grouping; \u201cstoring\u201d [ ] set of characters { } repetition modifier \\ quote or [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[231],"tags":[32,198,199,28,14,11],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/trichev.com\/blog\/wp-json\/wp\/v2\/posts\/638"}],"collection":[{"href":"https:\/\/trichev.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/trichev.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/trichev.com\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/trichev.com\/blog\/wp-json\/wp\/v2\/comments?post=638"}],"version-history":[{"count":3,"href":"https:\/\/trichev.com\/blog\/wp-json\/wp\/v2\/posts\/638\/revisions"}],"predecessor-version":[{"id":641,"href":"https:\/\/trichev.com\/blog\/wp-json\/wp\/v2\/posts\/638\/revisions\/641"}],"wp:attachment":[{"href":"https:\/\/trichev.com\/blog\/wp-json\/wp\/v2\/media?parent=638"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/trichev.com\/blog\/wp-json\/wp\/v2\/categories?post=638"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/trichev.com\/blog\/wp-json\/wp\/v2\/tags?post=638"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}