lillesvin.net

PCRE Branch Reset Operator

In Perl Compatible Regular Expressions (PCRE) and other dialects you sometimes end up with a flurry of capture groups, and if you want to use alternatives/OR/branching then you’ll end up with empty matches because of the enumeration of matches will match the capture groups regardless of branching.

So a pattern like /([a-z])|([0-9])/ will result in a match array like:

# Matching on '5'
[
    0: '',
    1: '5'
]

# Matching on 'e'
[
    0: 'e',
    1: ''
]

simply due to the fact that ([a-z]) constitutes the first matching group and ([0-9]) constitutes the second one regardless of branching.

However, using the branch reset operator you can effectively exclude the unvisited branches like so: /(?|([a-z])|([0-9]))/ and the resulting matches will be:

# Matching on '5'
[
    0: '5'
]

# Matching on 'e'
[
    0: 'e'
]

I’m aware that the example is totally contrived, I’m merely trying to keep it simple in an effort to make the concept more approachable.