@c Copyright (C) 1994, 1996, 1998, 2000, 2001, 2003, 2004, 2005, 2006, @c 2007, 2009, 2010, 2011 Free Software Foundation, Inc. @c @c Permission is granted to copy, distribute and/or modify this document @c under the terms of the GNU Free Documentation License, Version 1.3 or @c any later version published by the Free Software Foundation; with no @c Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. @c A copy of the license is included in the ``GNU Free @c Documentation License'' file as part of this distribution. @c this regular expression description is for: findutils @menu * findutils-default regular expression syntax:: * awk regular expression syntax:: * egrep regular expression syntax:: * emacs regular expression syntax:: * gnu-awk regular expression syntax:: * grep regular expression syntax:: * posix-awk regular expression syntax:: * posix-basic regular expression syntax:: * posix-egrep regular expression syntax:: * posix-extended regular expression syntax:: @end menu @node findutils-default regular expression syntax @subsection @samp{findutils-default} regular expression syntax The character @samp{.} matches any single character. @table @samp @item + indicates that the regular expression should match one or more occurrences of the previous atom or regexp. @item ? indicates that the regular expression should match zero or one occurrence of the previous atom or regexp. @item \+ matches a @samp{+} @item \? matches a @samp{?}. @end table Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example @samp{[z-a]}, are ignored. Within square brackets, @samp{\} is taken literally. Character classes are not supported, so for example you would need to use @samp{[0-9]} instead of @samp{[[:digit:]]}. GNU extensions are supported: @enumerate @item @samp{\w} matches a character within a word @item @samp{\W} matches a character which is not within a word @item @samp{\<} matches the beginning of a word @item @samp{\>} matches the end of a word @item @samp{\b} matches a word boundary @item @samp{\B} matches characters which are not a word boundary @item @samp{\`} matches the beginning of the whole input @item @samp{\'} matches the end of the whole input @end enumerate Grouping is performed with backslashes followed by parentheses @samp{\(}, @samp{\)}. A backslash followed by a digit acts as a back-reference and matches the same thing as the previous grouped expression indicated by that number. For example @samp{\2} matches the second group expression. The order of group expressions is determined by the position of their opening parenthesis @samp{\(}. The alternation operator is @samp{\|}. The character @samp{^} only represents the beginning of a string when it appears: @enumerate @item At the beginning of a regular expression @item After an open-group, signified by @samp{\(} @item After the alternation operator @samp{\|} @end enumerate The character @samp{$} only represents the end of a string when it appears: @enumerate @item At the end of a regular expression @item Before a close-group, signified by @samp{\)} @item Before the alternation operator @samp{\|} @end enumerate @samp{*}, @samp{+} and @samp{?} are special at any point in a regular expression except: @enumerate @item At the beginning of a regular expression @item After an open-group, signified by @samp{\(} @item After the alternation operator @samp{\|} @end enumerate The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups. @node awk regular expression syntax @subsection @samp{awk} regular expression syntax The character @samp{.} matches any single character except the null character. @table @samp @item + indicates that the regular expression should match one or more occurrences of the previous atom or regexp. @item ? indicates that the regular expression should match zero or one occurrence of the previous atom or regexp. @item \+ matches a @samp{+} @item \? matches a @samp{?}. @end table Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example @samp{[z-a]}, are invalid. Within square brackets, @samp{\} can be used to quote the following character. Character classes are supported; for example @samp{[[:digit:]]} will match a single decimal digit. GNU extensions are not supported and so @samp{\w}, @samp{\W}, @samp{\<}, @samp{\>}, @samp{\b}, @samp{\B}, @samp{\`}, and @samp{\'} match @samp{w}, @samp{W}, @samp{<}, @samp{>}, @samp{b}, @samp{B}, @samp{`}, and @samp{'} respectively. Grouping is performed with parentheses @samp{()}. An unmatched @samp{)} matches just itself. A backslash followed by a digit matches that digit. The alternation operator is @samp{|}. The characters @samp{^} and @samp{$} always represent the beginning and end of a string respectively, except within square brackets. Within brackets, @samp{^} can be used to invert the membership of the character class being specified. @samp{*}, @samp{+} and @samp{?} are special at any point in a regular expression except: @enumerate @item At the beginning of a regular expression @item After an open-group, signified by @samp{(} @item After the alternation operator @samp{|} @end enumerate The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups. @node egrep regular expression syntax @subsection @samp{egrep} regular expression syntax The character @samp{.} matches any single character. @table @samp @item + indicates that the regular expression should match one or more occurrences of the previous atom or regexp. @item ? indicates that the regular expression should match zero or one occurrence of the previous atom or regexp. @item \+ matches a @samp{+} @item \? matches a @samp{?}. @end table Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example @samp{[z-a]}, are invalid. Within square brackets, @samp{\} is taken literally. Character classes are supported; for example @samp{[[:digit:]]} will match a single decimal digit. GNU extensions are supported: @enumerate @item @samp{\w} matches a character within a word @item @samp{\W} matches a character which is not within a word @item @samp{\<} matches the beginning of a word @item @samp{\>} matches the end of a word @item @samp{\b} matches a word boundary @item @samp{\B} matches characters which are not a word boundary @item @samp{\`} matches the beginning of the whole input @item @samp{\'} matches the end of the whole input @end enumerate Grouping is performed with parentheses @samp{()}. An unmatched @samp{)} matches just itself. A backslash followed by a digit acts as a back-reference and matches the same thing as the previous grouped expression indicated by that number. For example @samp{\2} matches the second group expression. The order of group expressions is determined by the position of their opening parenthesis @samp{(}. The alternation operator is @samp{|}. The characters @samp{^} and @samp{$} always represent the beginning and end of a string respectively, except within square brackets. Within brackets, @samp{^} can be used to invert the membership of the character class being specified. The characters @samp{*}, @samp{+} and @samp{?} are special anywhere in a regular expression. Intervals are specified by @samp{@{} and @samp{@}}. Invalid intervals are treated as literals, for example @samp{a@{1} is treated as @samp{a\@{1} The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups. @node emacs regular expression syntax @subsection @samp{emacs} regular expression syntax The character @samp{.} matches any single character except newline. @table @samp @item + indicates that the regular expression should match one or more occurrences of the previous atom or regexp. @item ? indicates that the regular expression should match zero or one occurrence of the previous atom or regexp. @item \+ matches a @samp{+} @item \? matches a @samp{?}. @end table Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example @samp{[z-a]}, are ignored. Within square brackets, @samp{\} is taken literally. Character classes are not supported, so for example you would need to use @samp{[0-9]} instead of @samp{[[:digit:]]}. GNU extensions are supported: @enumerate @item @samp{\w} matches a character within a word @item @samp{\W} matches a character which is not within a word @item @samp{\<} matches the beginning of a word @item @samp{\>} matches the end of a word @item @samp{\b} matches a word boundary @item @samp{\B} matches characters which are not a word boundary @item @samp{\`} matches the beginning of the whole input @item @samp{\'} matches the end of the whole input @end enumerate Grouping is performed with backslashes followed by parentheses @samp{\(}, @samp{\)}. A backslash followed by a digit acts as a back-reference and matches the same thing as the previous grouped expression indicated by that number. For example @samp{\2} matches the second group expression. The order of group expressions is determined by the position of their opening parenthesis @samp{\(}. The alternation operator is @samp{\|}. The character @samp{^} only represents the beginning of a string when it appears: @enumerate @item At the beginning of a regular expression @item After an open-group, signified by @samp{\(} @item After the alternation operator @samp{\|} @end enumerate The character @samp{$} only represents the end of a string when it appears: @enumerate @item At the end of a regular expression @item Before a close-group, signified by @samp{\)} @item Before the alternation operator @samp{\|} @end enumerate @samp{*}, @samp{+} and @samp{?} are special at any point in a regular expression except: @enumerate @item At the beginning of a regular expression @item After an open-group, signified by @samp{\(} @item After the alternation operator @samp{\|} @end enumerate The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups. @node gnu-awk regular expression syntax @subsection @samp{gnu-awk} regular expression syntax The character @samp{.} matches any single character. @table @samp @item + indicates that the regular expression should match one or more occurrences of the previous atom or regexp. @item ? indicates that the regular expression should match zero or one occurrence of the previous atom or regexp. @item \+ matches a @samp{+} @item \? matches a @samp{?}. @end table Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example @samp{[z-a]}, are invalid. Within square brackets, @samp{\} can be used to quote the following character. Character classes are supported; for example @samp{[[:digit:]]} will match a single decimal digit. GNU extensions are supported: @enumerate @item @samp{\w} matches a character within a word @item @samp{\W} matches a character which is not within a word @item @samp{\<} matches the beginning of a word @item @samp{\>} matches the end of a word @item @samp{\b} matches a word boundary @item @samp{\B} matches characters which are not a word boundary @item @samp{\`} matches the beginning of the whole input @item @samp{\'} matches the end of the whole input @end enumerate Grouping is performed with parentheses @samp{()}. An unmatched @samp{)} matches just itself. A backslash followed by a digit acts as a back-reference and matches the same thing as the previous grouped expression indicated by that number. For example @samp{\2} matches the second group expression. The order of group expressions is determined by the position of their opening parenthesis @samp{(}. The alternation operator is @samp{|}. The characters @samp{^} and @samp{$} always represent the beginning and end of a string respectively, except within square brackets. Within brackets, @samp{^} can be used to invert the membership of the character class being specified. @samp{*}, @samp{+} and @samp{?} are special at any point in a regular expression except: @enumerate @item At the beginning of a regular expression @item After an open-group, signified by @samp{(} @item After the alternation operator @samp{|} @end enumerate Intervals are specified by @samp{@{} and @samp{@}}. Invalid intervals are treated as literals, for example @samp{a@{1} is treated as @samp{a\@{1} The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups. @node grep regular expression syntax @subsection @samp{grep} regular expression syntax The character @samp{.} matches any single character. @table @samp @item \+ indicates that the regular expression should match one or more occurrences of the previous atom or regexp. @item \? indicates that the regular expression should match zero or one occurrence of the previous atom or regexp. @item + and ? match themselves. @end table Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example @samp{[z-a]}, are invalid. Within square brackets, @samp{\} is taken literally. Character classes are supported; for example @samp{[[:digit:]]} will match a single decimal digit. GNU extensions are supported: @enumerate @item @samp{\w} matches a character within a word @item @samp{\W} matches a character which is not within a word @item @samp{\<} matches the beginning of a word @item @samp{\>} matches the end of a word @item @samp{\b} matches a word boundary @item @samp{\B} matches characters which are not a word boundary @item @samp{\`} matches the beginning of the whole input @item @samp{\'} matches the end of the whole input @end enumerate Grouping is performed with backslashes followed by parentheses @samp{\(}, @samp{\)}. A backslash followed by a digit acts as a back-reference and matches the same thing as the previous grouped expression indicated by that number. For example @samp{\2} matches the second group expression. The order of group expressions is determined by the position of their opening parenthesis @samp{\(}. The alternation operator is @samp{\|}. The character @samp{^} only represents the beginning of a string when it appears: @enumerate @item At the beginning of a regular expression @item After an open-group, signified by @samp{\(} @item After a newline @item After the alternation operator @samp{\|} @end enumerate The character @samp{$} only represents the end of a string when it appears: @enumerate @item At the end of a regular expression @item Before a close-group, signified by @samp{\)} @item Before a newline @item Before the alternation operator @samp{\|} @end enumerate @samp{\*}, @samp{\+} and @samp{\?} are special at any point in a regular expression except: @enumerate @item At the beginning of a regular expression @item After an open-group, signified by @samp{\(} @item After a newline @item After the alternation operator @samp{\|} @end enumerate Intervals are specified by @samp{\@{} and @samp{\@}}. Invalid intervals such as @samp{a\@{1z} are not accepted. The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups. @node posix-awk regular expression syntax @subsection @samp{posix-awk} regular expression syntax The character @samp{.} matches any single character except the null character. @table @samp @item + indicates that the regular expression should match one or more occurrences of the previous atom or regexp. @item ? indicates that the regular expression should match zero or one occurrence of the previous atom or regexp. @item \+ matches a @samp{+} @item \? matches a @samp{?}. @end table Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example @samp{[z-a]}, are invalid. Within square brackets, @samp{\} can be used to quote the following character. Character classes are supported; for example @samp{[[:digit:]]} will match a single decimal digit. GNU extensions are not supported and so @samp{\w}, @samp{\W}, @samp{\<}, @samp{\>}, @samp{\b}, @samp{\B}, @samp{\`}, and @samp{\'} match @samp{w}, @samp{W}, @samp{<}, @samp{>}, @samp{b}, @samp{B}, @samp{`}, and @samp{'} respectively. Grouping is performed with parentheses @samp{()}. An unmatched @samp{)} matches just itself. A backslash followed by a digit acts as a back-reference and matches the same thing as the previous grouped expression indicated by that number. For example @samp{\2} matches the second group expression. The order of group expressions is determined by the position of their opening parenthesis @samp{(}. The alternation operator is @samp{|}. The characters @samp{^} and @samp{$} always represent the beginning and end of a string respectively, except within square brackets. Within brackets, @samp{^} can be used to invert the membership of the character class being specified. @samp{*}, @samp{+} and @samp{?} are special at any point in a regular expression except the following places, where they are not allowed: @enumerate @item At the beginning of a regular expression @item After an open-group, signified by @samp{(} @item After the alternation operator @samp{|} @end enumerate Intervals are specified by @samp{@{} and @samp{@}}. Invalid intervals are treated as literals, for example @samp{a@{1} is treated as @samp{a\@{1} The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups. @node posix-basic regular expression syntax @subsection @samp{posix-basic} regular expression syntax This is a synonym for ed. @node posix-egrep regular expression syntax @subsection @samp{posix-egrep} regular expression syntax This is a synonym for egrep. @node posix-extended regular expression syntax @subsection @samp{posix-extended} regular expression syntax The character @samp{.} matches any single character except the null character. @table @samp @item + indicates that the regular expression should match one or more occurrences of the previous atom or regexp. @item ? indicates that the regular expression should match zero or one occurrence of the previous atom or regexp. @item \+ matches a @samp{+} @item \? matches a @samp{?}. @end table Bracket expressions are used to match ranges of characters. Bracket expressions where the range is backward, for example @samp{[z-a]}, are invalid. Within square brackets, @samp{\} is taken literally. Character classes are supported; for example @samp{[[:digit:]]} will match a single decimal digit. GNU extensions are supported: @enumerate @item @samp{\w} matches a character within a word @item @samp{\W} matches a character which is not within a word @item @samp{\<} matches the beginning of a word @item @samp{\>} matches the end of a word @item @samp{\b} matches a word boundary @item @samp{\B} matches characters which are not a word boundary @item @samp{\`} matches the beginning of the whole input @item @samp{\'} matches the end of the whole input @end enumerate Grouping is performed with parentheses @samp{()}. An unmatched @samp{)} matches just itself. A backslash followed by a digit acts as a back-reference and matches the same thing as the previous grouped expression indicated by that number. For example @samp{\2} matches the second group expression. The order of group expressions is determined by the position of their opening parenthesis @samp{(}. The alternation operator is @samp{|}. The characters @samp{^} and @samp{$} always represent the beginning and end of a string respectively, except within square brackets. Within brackets, @samp{^} can be used to invert the membership of the character class being specified. @samp{*}, @samp{+} and @samp{?} are special at any point in a regular expression except the following places, where they are not allowed: @enumerate @item At the beginning of a regular expression @item After an open-group, signified by @samp{(} @item After the alternation operator @samp{|} @end enumerate Intervals are specified by @samp{@{} and @samp{@}}. Invalid intervals such as @samp{a@{1z} are not accepted. The longest possible match is returned; this applies to the regular expression as a whole and (subject to this constraint) to subexpressions within groups.