Mptn patterns

An Mptn pattern is applied to a string. The result is a set of all possible variable assignments.

Pattern syntax

Atomic patterns

character

Matches itself. Characters ., [, ], {, }, (, ), `, ', <, >, ?, *, +, \, & and | must be escaped by a backslash.

[character…]

Matches any character from the set listed. The set may contain ranges indicated by the starting and the ending characters separated by -. If you want to include character ^ in the set, do not put it immediately after the opening bracket.

[^character…]

Matches any character except those listed.

.

Matches any character.

{identifier}

If the variable named by identifier has a value, the string matched should coincide with this value. If the variable does not have a value, it gets the matched string as its new value. In case there is a pattern associated with the variable name, the string should match this pattern. Variable assignments inside the associated pattern do not propagate to the match where the variable occurs.

{identifier:character…}

The matcher procedure associated with identifier is called. The string after the : is given it as a parameter. The procedure is allowed to set variable values in the match where it is called.

Non-atomic patterns

pattern?

The string should either be empty or match pattern.

pattern*

Matches zero or more occurrences of pattern.

pattern+

Matches one or more occurrences of pattern.

pattern1 pattern2

Matches the concatenation of pattern1 and pattern2.

pattern1<pattern2

Same as concatenation, but only one possible assignment is given for pattern1, which makes the corresponding part of the string as short as possible.

pattern1>pattern2

Same as concatenation, but only one possible assignment is given for pattern1, which makes the corresponding part of the string as long as possible.

pattern1&pattern2

The string should match both pattern1 and pattern2.

pattern1|pattern2

The string should match at least one of pattern1 or pattern2.

(pattern)

pattern is matched against the string.

`pattern'

pattern is matched against the string; all the variable assignments done inside the pattern are forgotten.

Examples

In this section I will give several examples of Mptn usage. I will assume that the variable v is restricted to value [aeiou and variables c, c1 and c2 to value [bcdfgjhklmnpqrstvwxyz].

Pattern: abcd

Matches with: String abcd.

Pattern: a*

Matches with: A (possibly empty) string of letters a.

Pattern: {x}

Matches with: Any string (assigned to x).

Pattern: {x}{y}

Matches with: Any string. For a string n bytes long, the iterator will return n+1 variants, splitting the string between variables x and y.

Pattern: {x}>b{y}

Matches with: Any string containing at least one b. The variable x will contain the part of the string up to the last occurrence of b, variable y the part of the string after the last b.

Pattern: {c}

Matches with: Any consonant.

Pattern: {c}{v}

Matches with: An open syllable.

Pattern: ({x}&`{c1}{v}{c2}?')({y}&`{c1}{v}{c2}'*)

Matches with: A sequence of syllables of structure CV or CVC. The first syllable gets assigned to variable x, the rest of the string — to y. Variables c1, c2 and v remain unassigned.

The last example could be expressed in traditional regular expressions as ([bcdfghjklmnpqrstvwxyz][aeiou][bcdfghjklmnpqrstvwxyz]?)([bcdfghjklmnpqrstvwxyz][aeiou][bcdfghjklmnpqrstvwxyz]?). It seems to me that the Mptn variant is easier to write and understand.

Character sets

At present, Mptn only works with 8-bit characters. However, some attempts were made to keep the code clean enough in order to be able to move to Unicode strings. So far, the biggest problem I see is representing sets of characters. In the future I may borrow Henry Spencer's code from Tcl's excellent regular expression package. But the only provision I have made for this so far is consistently renaming char to chr throughout Mptn source code. Everyone willing to help is welcome.