Mptn pattern matching library
Prev		Next

Library C Interface

This section describes how to call Mptn from your C programs. To use the functions listed here, you need to say
#include <mptn.h>
near the beginning of your file.

mptn_control_t structure.

The mptn_control_t structure holds the global context for the library, including patterns that restrict variable values and matcher functions. The structure declaration is hidden from the library user, you should only use pointers to mptn_control_t's. You can also save the trouble of creating one; all the functions that accept an argument of type mptn_control_t * will work if you pass NULL as the value. In this case, a global structure will be used.

Creating and destroying mptn_control_t.

mptn_control_t * mptn_control_new(void);

void mptn_control_free(mptn_control_t *control);

Mptn_control_new creates a new mptn_control_t structure and returns a pointer to it. Mptn_control_free frees such a structure.

Restricting the variable values.

void mptn_var_restrict(mptn_control_t *control, GQuark vcode, mptn_t *mptn);

void mptn_vname_restrict(mptn_control_t * control, char *vname, mptn_t *mptn);

void mptn_var_restrict_str(mptn_control_t *control, GQuark vcode, char *expr);

void mptn_vname_restrict_str(mptn_control_t *control, char * vname, char *expr);

All these functions restrict the possible values a variable can take. Differences concern the types of their arguments.

Variable names are stored internally as GQuarks; functions with _var_ in their names take GQuark as the argument. Functions (actually, macros) whose name contain _vname_ take strings representing variable names as their arguments.

Functions whose name does not end with _str receive a pattern that has already been compiled (of type mptn_t). Those that do end with _str get a string, which they compile themselves using mptn_parse.

Associating matcher functions with names.

void mptn_matcher_set(mptn_control_t *control, GQuark mcode, mptn_matcher_ops_t *matcher);

void mptn_mname_set(mptn_control_t *control, char *mname, mptn_matcher_ops_t *matcher);

These functions serve to extend the syntax of Mptn patterns and associate objects called matchers with names. Matchers are actually structures containing pointers to functions; they are discussed in detail in the section called Matchers: extending the pattern syntax.

Again, like when restricting the values of a function, names are stored internally as GQuarks. Accordingly, mptn_matcher_set gets GQuark as its argument. Macro mptn_mname_set gets a string and translates it internally.

void mptn_matcher_param_set(mptn_control_t *control, GQuark mcode, gpointer data);

void mptn_mname_param_set(mptn_control_t * control, char *mname, void *data);

You may associate an arbitrary pointer (data) with a matcher name. When the matcher function gets called, it will receive data as one of its arguments. This feature may be useful, for example, in writing a single set of matcher functions to handle all the morphological alternations in a language, then setting it to different names and differentiating between particular alternations using this parameter.

The function with _matcher_ in its name receives the matcher name as GQuark, the one with _mname_ gets a string.

Compiled patterns

Mptn patterns, before they are used, need to be compiled. A compiled pattern is stored in a structure called mptn_t (as you have probably guessed, this is considered to be the most important type in the whole library)). Mptn_t, unlike most of the library's structures, is not opaque; its definition can be found in mptn.h. However, only the matcher writers need to look inside it, and so we will defer the discussion until the section called Matchers: extending the pattern syntax. This section only shows how to use mptn_t in "normal" work.

Creating, referencing and freeing mptn_t

mptn_t *mptn_parse(mptn_control_t *control, chr *str_ptn);

void mptn_refinc(mptn_t *mptn);

void mptn_free(mptn_t *mptn);

The natural way to obtain a compiled pattern is, of course, to compile it. This is what mptn_parse does. As it first argument, it takes a pointer to mptn_control_t structure (and remember, as said in the section called mptn_control_t structure., you may leave it NULL). As the second argument, it takes the string to be compiled. The return value is either a pointer to a newly created mptn_t, or NULL if the compilation fails.

mptn_parse uses bison, and therefore is probably the only thread-unsafe function in the library.

Mptn_t's are reference counted. You can increase a structure's count by calling mptn_refinc.

Mptn_free, on the other hand, decreases the reference count of an mptn_t, and if it drops to zero, frees the structure.

Applying patterns to strings: iterators

Since pattern matching with Mptn may return several variants of variable assignments, the most natural way to perform it is to start an iterator. An iterator is represented by (you guessed it) still another opaque structure, called mptn_ectl_t.

mptn_ectl_t *mptn_ectl_start(mptn_t *mptn, chr *begin, chr *end, gpointer data);

mptn_ectl_t *mptn_exec_step(mptn_ectl_t *ectl);

void mptn_ectl_free(mptn_ectl_t *ectl);

You start a pattern matching iterator by calling mptn_exec_start. The first parameter, of course, is the compiled pattern itself. The next two represent the beginning and the end of the string you want to match the pattern with. End should point at the next character after the part of the string you're interested in (so that you don't have to copy it). Alternatively, you may leave it NULL; in this case, the whole string starting at begin will be matched. Finally, you can pass an arbitrary pointer as the last parameter. This is designed as a means of communication with the matchers you may write. One of the matcher procedures will receive data as one of its arguments.

If the match fails, mptn_exec_start returns NULL. If it succeeds, the result is a pointer to mptn_ectl_t. You can use it to get the values of variable assignments (see the section called Getting variable values). Once you're done with the match, you can get the next one by calling mptn_exec_step. It takes the ectl value and returns a new one if there is one more match, and NULL if not. The old ectl is no longer valid after you call mptn_exec_step.

Finally, if you want to get out of the iterator prematurely, without having to scan all the possible matches, you should call mptn_ectl_free for the last value of ectl.

Getting variable values

GArray *mptn_ectl_vars(mptn_ectl_t *ectl);

Once you have a successful match variant, you naturally want to know which values have been assigned to the variables inside your pattern. This is done via mptn_ectl_vars, which takes ectl as its argument.

The return value of mptn_ectl_vars is a GArray of structures of type mptn_var_t. Unlike other Mptn types, this is an open structure[1] . Here is the definition:
struct mptn_var { GQuark vcode; /* Representing variable's name */ chr *beg; /* The start of a var value */ chr *end; /* The end of var value */ gboolean allocated; /* Is it allocated separately? */ guint stage; };
The fields that interest us here are three:

vcode: The name of the variable encoded as GQuark. You can call g_quark_to_string to obtain the name as a string.
begin: The beginning of the substring the variable is matched to.
end: The end of the matched substring.

You should not assume ownership of the memory occupied by and pointed to from mptn_var_t. After the next time you call mptn_exec_step, this memory may be lost.

Some variables present in the array may remain unassigned (for example, if the variable is present in an alternative, and another alternative was chosen). In such a case the begin field of the corresponding structure will contain NULL.

The end of an array returned by mptn_ectl_vars is marked by a structure that has 0 in the vcode field.

Pattern matching without an iterator

Sometimes you may be only interested whether a string matches the pattern, and only care about the first match. In this case, the following function may serve as a shortcut.

GArray * mptn_exec_one(chr * begin, chr *end, gpointer data);

void mptn_vars_array_free(GArray *vars);

Mptn_exec_one only performs one match of a string against a pattern. It returns TRUE if the match was successful, and FALSE if not.

Parameters mptn, begin, end and data for this function are the same as for mptn_exec_start (see the section called Applying patterns to strings: iterators). The return value will be a GArray of mptn_var_t.

Such an array is allocated inside mptn_exec_one, you will need to free it later using mptn_vars_array_free.

If the match does not succedd, mptn_exec_one returns NULL.

Substituting variable values.

chr *mptn_subst(mptn_t *mptn, GArray *vars, gpointer data);

After you matched a string against a pattern and have got a set of variable assignments, you may want to substitute these in another pattern to receive a string which presumably matches it. This is what mptn_subst is designed for. It takes an mptn_t, a set of variable assignments, and an arbitrary data to pass to the matcher functions. The result is a string allocated via g_malloc.

However, mptn_subst is not always guaranteed to work, and when it does, it is not guaranteed to return a result that can be matched against the same pattern. This is due to the lossy nature of some of the pattern types. For example, if the pattern contains . (which matches any character), and the corresponding part of the string is not being saved in a variable (via & operator), the string cannot in principle be recovered. in such cases, mptn_subst will return NULL.

Printing the pattern's internal structure.

void mptn_print(mptn_t *mptn, FILE *fp, guint offset);

This function is mostly useful for debugging purposes. It dumps the structure of mptn into an open file fp with offset offset from the left margin.

Error reporting.

Most Mptn functions, when they detect some erroneous conditions, just call g_error. It is the responsibility of the library user (i.e. a C programmer) to avoid such conditions.

However, errors which occur during the parsing of $mptn; expressions are really errors caused by the end user of the program built with Mptn. Therefore it is reasonable to expect that the programmer will want to catch them.

Mptn defines a function which resides in a separate file, so that the programmer who uses Mptn can easily override it with his own definition:

void mptn_parse_error(char *fmt, char *str);

Fmt is the string "Mptn parsing error:%s", str is the explanation of the error given by bison. It is expected that str will end up as the value for the %s format specifier in fmt. The default implementation just calls g_error.

Prev	Home	Next
Mptn patterns		Matchers: extending the pattern syntax

Notes