Mptn pattern matching library
Prev

Language bindings.

It was one of my primary intentions to have Mptn bindings for a variety of scripting languages. I have a set of bindings designed for Scheme; it works with MzScheme system. Bindings for other languages will also be gratefully accepted.

Scheme.

The interface to the Mptn library for Scheme, as well as the C interface, divides naturally into two parts: the functions you need to know to use the library for pattern matching, and the functions which are only needed when you write your own matchers. I start with a brief description of all Mptn-related types, and then I give a list of functions in each of these categories.

It should be possible to implement this interface in any R⁵RS conformant implementation.[1]

Types.

Scheme Mptn bindings define the following data types:

mptn: A compiled Mptn pattern.
mptn-ectl: A step in the execution of pattern matching. There are two kinds of mptn-ectls: those that can be used to make another step of pattern matching (via mptn-exec-step), and those which only represent a set of variable assignments.
mptn-matcher: represents a group of procedures comprising an Mptn matcher.

User level functions.

In this section I am going to describe the functions you would normally use for pattern matching.

Working with patterns.

(mptn-parse str)

Creates an Mptn pattern object by parsing a string str.

(mptn? obj)

Checks whether obj is a compiled pattern.

(mptn-exec-map mptn str data func)
(mptn-exec-for-each mptn str data func)

Each of these functions finds all the matches of mptn against str with additional parameter data. For every match, func is called with one argument — an mptn-ectl object containing the relevant variable bindings. In mptn-exec-map, the values returned from calls to func are collected into a list; this list becomes the result of mptn-exec-map. Mptn-exec-for-each does not collect these values and its result is a boolean value: #t indicates at least one match has been found, #f means unsuccessfuk matching.

The ectl values passed to func are short-lived: you should not expect to save them in a variable of a wider scope and get a meaningful result later when accessing this variable.

(mptn-exec-one mptn str data)

Matches pattern mptn against string str with additional data data. Returns the variable assignments found in the first match as the result (of type mptn-ectl). The result cannot be used to continue matching. If the matching is unsuccessful, the return value is #f.

The above functions should be the convenient way to call Mptn matching. However, if you need a more fine-grained control over the matching process, a lower-level interface is provided, similar to C language interface to pattern matching.

(mptn-exec-start mptn str data)
(mptn-ectl-step! ectl)
(mptn-ectl-destroy! ectl)

Mptn-exec-start starts a matching process with pattern mptn, string str and additional argument data. If the matching is successful, it returns an mptn-ectl object; if not, #f.

Mptn-ectl-step! takes an mptn-ectl object and makes a step in matching; if successful, it returns another mptn-ectl object representing a new set of variable bindings. The old one becomes invalid.

Finally, mptn-ectl-destroy! invalidates an mptn-ectl object, saving some space. (In Scheme, it is not mandatory to call mptn-ectl-free; the memory will be cleared during the next garbage collection cycle.)

(mptn-subst mptn ectl-vars data)

This function substitutes the variable values contained in ectl into the pattern mptn, possibly using additional data. The result is either a string, or #f if it is impossible to recover the information.

Working with variable assignments.

Objects of type mptn-ectl, produced by the family of mptn-exec-ℓ functions, contain information about variable assignments.

(mptn-ectl? obj)
(mptn-ectl-valid? ectl)

Mptn-ectl? checks whether a given object has type mptn-ectl. Mptn-ectl-valid? returns #f if the object is invalid (i. e. mptn-ectl-step! or mptn-ectl-destroy! has been called on it, or a function to which it was passed as an argument has finished its work). If the object is valid, one of two symbols is returned: valid for an object which can be sent to mptn-ectl-step!, or vars if it can only be used for variables assignment inquiry.

(mptn-ectl-ref ectl varname)
(mptn-ectl->alist ectl)

Mptn-ectl-ref gets the assignment for variable varname. Varname can be either symbol or string. The value of a variable is a string. If the variable is not set in ectl, #f is returned.

Mptn-ectl->alist converts the sets of assignments contained in ectl into an alist.

Restricting values of variables.

(mptn-var-restrict! varname mptn)

This function restricts the possible values for Mptn variable varname to strings which can be matched against mptn.

Writing matchers in Scheme.

This sections describes a lower-level interface from Scheme to Mptn, which gives the user enough power to write his own matcher procedures.

The interface given here is little more than a thin layer over the corresponding C functions. Unlike C, Scheme has enough expressive power to make writing matchers a much less mind-boggling task: closures could be used to store matcher-specific and matching-loop specific data, and with the help of continuations one could write the whole matching loop as one function, instead of three. However, since the implementation of Mptn is aimed to be efficient, I decided against giving such an improved interface as the default one for Scheme. Continuations could be especially harmful for performance. A convenience level could be created on top of the current interface with very little effort; I might do this in the future.

More mptn and mptn-ectl functions.

There are several more functions which give you more control over objects of types mptn and mptn-ectl.

(mptn-exec-start-with-ectl-and-stage mptn str 
vars-ectl stage data)

This function, like mptn-exec-start, starts a matching cycle for mptn against str with additional parameter data. It also returns either an mptn-ectl object or #f.. However here the set of variables ectl-vars will be affected by the matching, and the initial state will be set to stage (must be an integer value).

(mptn-ectl-set! ectl varname value [stage])

Mptn-ectl-set! assigns a value to variable varname inside ectl. The value needs to be a string.

If stage is present, the stage of the variable assignment is set to that value. Otherwise, ectl is consulted.

(mptn-ectl-stage ectl)

This function returns the current stage of ectl.

Creating matchers.

(mptn-matcher keyword kvalue ℓ)

This function creates an object of type mptn-matcher. The argument list should consist of keyword-kvalue pairs, where each keyword is one of the symbols init, destroy, iter-start, iter-step, ictl-free, stage, subst and print. The corresponding kvalue should be a procedure satisfying the element of the matcher protocol named by keyword. The matcher protocol is described in the section called The matcher protocol..

You don't have to give a procedure for every possible keyword. If some procedure is not given, a default action will be taken.

(mptn-matcher-set! mname matcher [udata])

This function associates matcher with the name mname (symbol or string). Also, the init function of the matcher will get udata as one of its arguments. (If udata is omitted, #f will be used.)

The matcher protocol.

The procedures supplied as arguments to mptn-matcher should obey certain conventions. Those conventions are described in this section. The names of the functions listed here correspond to the keywords in the mptn-matcher argument list.

(init args udata)

This member is called whenever a pattern containing a reference to the matcher is compiled. Args (a string) contains the arguments for the matcher (the portion after : in the pattern expression). Udata is the value passed as the last argument to mptn-matcher-set!.

the function should compute a number of values and return them to the caller. The typical return will be
(values mctl min-len max-len new-vars)
Mctl is the value that will be passed to most other functions comprising the matcher when those are called -- this is the main data storage for the matcher object. Min-len and Max-len are, respectively, the minimum and maximum length a string fitting the matcher may have. They are used by the internal Mptn mechanisms when evaluating the possibilities for matching. New-vars should be a list of strings or symbols -- the names of the mptn-ectl variables the matcher is likely to use. Later, when the iter-start and iter-step members are called, the mptn-ectl they get will have space allocated for these variables.

You don't need to compute all these values. You may skip some of them, returning #f in its place, or you can simply return less than four results.

(destroy mctl)

This function is called when a matcher object is being destroyed. You may wish to perform some cleanup operations in this case. (Actually, since Scheme is a language with garbage collection, this member is probably of less value for Scheme programmers than for C programmers.)

(iter-start mctl str vars-ectl stage data)

This member gets called whenever a matching loop is started using the matcher for a particular string. Mctl is the value returned from the init member, str is the string that must be matched, vars-ectl contains the vars that you may inspect or set (but you cannot call mptn-ectl-step! on this object). Stage is the starting stage for the matching, and data is the additional data passed to one of the mptn-execℓ routines.

This function should return an object (called ictl)) that contains all the data relevant to the process of matching. If the function returns #f, the matching fails. The function can also set values in vars-ectl.

(iter-step ictl)

This member is called to get the next possible variable assignment in a matching loop. It is passed an ictl and, if the step is successful, should return an ictl that will be used for the next match. If the step fails, iter-step should return #f.

(ictl-free ictl)

This member is called when the matching loop terminates prematurely, so that the matcher can discard any data in ictl.

(stage ictl)

This member is called to determine the final stage reached by the matcher loop in ictl.

(subst mctl ectl-vars data)

This member is called when there is an attempt to substitute values from ectl into pattern described by mctl using additional argument data. It should return a string, or #f if it is unrecoverable.

(print mctl port offset)

this member is called to dump the matcher description (from data contained in mctl into the output port port, skipping offset spaces from the left margin.

Using Mptn with MzScheme.

The Scheme interface to Mptn currently only works with MzScheme; this implementation can be obtained from http://www.cs.rice.edu/CS/PLT/packages/mzscheme/.

After you run make and make install in the bindings/mzscheme directory of the Mptn distribution, you should be able to use
(require-library "mptn.scm" "mptn")
to load the Mptn library into your MzScheme interpreter.

MzScheme bindings are not automatically installed with the Mptn RPM, because I don't expect the majority of the people trying Mptn to have MzScheme on their computer. You will have to get the source tarball and recompile it.

Prev	Home
Matchers: extending the pattern syntax

Notes