• Regular Expressions

    Regular Expression Pr-qualifier… these definitions are how regular expressions are generally used in .htaccess files and though most definitions will be applicable globally, there are some that may not.

    There are some predefined ‘terms’ in regular expressions to make your life easier. (At least, that are supposed to make your life easier.) Here is a short list, with what each does in the mod_rewrite setting.

    [ ] enclose the expression or a portion of the expression. (Used for determining the characters, or range of characters to be matched.)

    letter-letter (EG [a-z] matches any single lowercase alphabetical character in the range of a to z), so [c-e] will match any single character that is the lowercase letter c, d, or e.

    LETTER-LETTER (EG [A-Z] matches any single capital alphabetical character in the range of A to Z), so [C-E] will match any single character that is the capital letter C, D, or E.

    number-number (EG [0-9] matches any single number in the range of 0 to 9), so [4-6] would match any single number 4, 5, or 6.

    character list (EG [dog123] matches any single character, either d, o, g, 1, 2, or 3.

    ^ has two purposes, when used inside of [ ] it designates ‘not’. (EG [^0-9] would match any character that is not 0 to 9 and [^abc] would match any character that is not a lowercase a, b, or c.) When used at the beginning of a pattern in mod_rewrite, it also designates the begining of a ‘line’.

    It is very important to understand and remember [dog] does not match the word ‘dog’, it matches any individual lowercase letter d, o, or g anywhere in the comparison. In the same way, [^dog] does not exclude the word ‘dog’ from matching, it excludes the lowercase letter d, o, or g from matching individually.

    To match a ‘word’ or a group of characters in order, you do not need to use [] so ^dog$ would match the word dog, and not d, o, or g as a single character.

    . (a dot) matches any single character, except the ending of a line.

    ? matches 0 or 1 of the characters or set of characters in brackets or parentheses immediately before it. (EG a? would match the lowercase letter ‘a’ 0 or 1 time, (abc)? would match the phrase ‘abc’ 0 or 1 time, while [a-z]? would match any lowercase letter from ‘a to z’ 0 or 1 time.)

    + matches 1 or more of the characters or set of characters in brackets or parentheses immediately before it. (EG a+ would match the lowercase letter ‘a’ 1 or more times, (abc)+ would match the phrase ‘abc’ 1 or more times, while [a-z]+ would match 1 or more lowercase letters from ‘a to z’.)

    * matches 0 or more of the characters or set of characters immediately before it. (EG a* would match the lowercase letter ‘a’ 0 or more times, (abc)* would match the phrase ‘abc’ 0 or more times, while [a-z]* would match 0 or more lowercase letters from ‘a to z’.)

    These are the basic building blocks of regular expressions as used in .htaccess and associated with mod_rewrite. By themselves, they do little, but when you put them together, they become very powerful.

    Along with regular expressions, mod_rewrite allows for the use of special characters. It’s a good thing to understand what these are before you begin writing rules. (Mainly because you need one or more of them in almost every rule.)

    RewriteRule tells the server to interpret the following information as a rule.

    RewriteCond tells the server to interpret the following information as a condtion of the rule(s) that are immediately after it.

    ^ defines the begining of a ‘line’ (starting anchor). Remember, ^ also designates ‘not’ in a regular expression, so please don’t get confused.

    ( ) creates a variable to be stored and possibly used later, and is also used to group text for use with the quantifiers ?, +, and * described above.

    $ defines the ending of a ‘line’ (ending anchor), and when followed by a number from 1 to 9, also references a variable defined in the RewriteRule pattern (used for variables on the right side of the equation or to match a variable from the rule in a condition, see example below).

    % references a variable defined in a preceding rewrite condition. (used for variables on the right side of the equation only, see example below)

    *note* – The right side of the equation is everything that follows the $ in a RewriteRule.

    Examples: All variables are given a number according to the order they appear; The following rule and condition each have two variables, defined by parenthesis, so to use them you would put them where you need them in the results:

    (the ‘-’ is for spacing only to make the line more readable, and is not necessary to use variables.)

    RewriteRule ^(var1)/no-var/(var2)$ /to-use-variables-type-$1-and-$2

    The final result would look like this:

    to-use-variables-type-var1-and-var2

    RewriteCond %{CONDITION_STUFF} ^(var1)/no-var/(var2)

    RewriteRule ^no-var/no-var/no-var$ /to-use-variables-type-%1-and-%2

    The final result would look like this:

    to-use-variables-type-var1-and-var2

    To use a combination of the Condition and Rule Variables

    RewriteCond %{CONDITION_STUFF} ^(var1)/no-var/(var2)

    RewriteRule ^(var1)/no-var/(var2)$ /to-use-variables-type-$1-and-%2-$2

    The final result would look like this:

    to-use-variables-type-var1-and-var2-var2

    The exception to the above examples is, you can also use the %{CONDITION_STUFF} server variables in the right side of a rule, but it must appear exactly as in the condition:

    RewriteRule ^(var1)/no-var/(var2)$ /type-%{CONDITION_STUFF}

    ¦ (bar) stands for ‘or’, normally used with alternate text or expressions grouped with parenthesis (EG (with¦without) matches the string ‘with’ or the string ‘without’. Keep in mind that since these are inside parenthesis, the match is also stored as a variable.)

    \ is called an escaping character, this removes the function from a ‘special character’ (EG if you needed to match index.php?, which has both a . (dot) and a ?, you would have to ‘escape’ the special characters . (dot) and ? with a \ to remove their ‘special’ value it looks like this: index\.php\?)

    ! is like the ^ in a grouped regular expression and stands for Not, but can only be used at the beginning of a rule or condition, not in the middle.

    - on the right side of the equation stands for No Rewrite. (It is often used in conjunction with a condition to check and see if a file or directory exists.)

    Tags:

Comments on this post

Leave a Reply

  • Security Code :


    three × 4 =