Matchers: extending the pattern syntax

In this section, the interface for extending the syntax of the patterns through specialized procedures will be described. The interface is not particularly beautiful. Anyone who is able to propose a better design is encouraged to contact the author.

Extensions of regular Mptn API

Matchers are plugged right in the middle of the pattern matching process. They need to be able to preserve some characteristics of the program state and change others. Such ability is not needed at the level of ordinary library usage. Therefore, in this section we are going to describe the functions and data structures which allow matchers to mess with the guts of Mptn.

Stages and variable assignments

The most important result of matching a string against a pattern is a set of variable bindings. On the other hand, matching involves a process of trial and error, and there may arise a need to undo some of the assignments. This is where the concept of stage comes in.

Every matcher, when it is invoked with a string as its argument, is given a GArray of variables (mptn_var_t array) and an integer number called stage. When it sets a value to a variable which has previously been unassigned, it should mark the stage. This is done by setting a field in the mptn_var_t structure, whose definition is repeated here for convenience:
struct mptn_var {
  GQuark vcode;			/* Representing variable's name */
  chr *beg;			/* The start of a var value */
  chr *end;			/* The end of var value */
  gboolean allocated;		/* Is it allocated separately? */
  guint stage;
};
When a process of undoing part of the assignments is invoked, the library rolls back at a particular earlier stage. The variables whose stage field is greater than this value (i.e. those that are set relatively late) lose their value.

While we are at it, I need to explain the last field of mptn_var_t structure. When you are setting the variable value, you either do so by referring to parts of the matched string, or you allocate new memory. In the former case, you set the allocated field to FALSE, in the latter, you make it TRUE. When unsetting variables, Mptn frees the memory occupied by variables who have TRUE in the allocated field.

Calling Mptn recursively

Sometimes you need to match a pattern recursively from a matcher, operating on the same set of variables. This is done by the following procedure

mptn_ectl_t *mptn_exec_start_with_vars(mptn_t *mptn, chr *begin, chr *end, GArray *vars, guint stage, gpointer data);

Mptn_exec_start_with_vars is similar to mptn_exec_start, but for two additional parameters. Mptn, begin, end and data have exactly the same meaning here. Vars should be the array of variables that the matcher received as its own argument. And stage must be set to at least the same value that the matcher received itself. If the matcher involves several variants of recursive calls, it should be set to a greater value, so that the assignments made inside the recursive call could be undone without unsetting the variables determined inside the matcher proper.

guint mptn_ectl_stage(mptn_ectl_t *ectl);

Once the recursive call is complete, its value will be an iterator structure of type mptn_ectl_t. If you want to call still another matcher recursively, you should give it a value of stage greater than what ectl has reached. (So that you could undo the second call without undoing the first). Precisely what level is reached by ectl you can ask by means of mptn_ectl_stage function.

Mptn_t structure

In the section called Library C Interface we viewed mptn_t as an opaque type. However, when writing matchers, we need to work directly with the fields of this structure. This is the definition:
struct mptn {
  guint magic;
  int refcnt;

  mptn_control_t *control;
  GArray *varnames;		/* Variable names used in this expression,
				   stored as GQuarks */
  mptn_matcher_ops_t *ops;	/* Matching operations */

  gpointer mctl;		/* What the matcher initialization returned */
  gint min_len;			/* The minimum length of the matched string */
  gint max_len;			/* the maximum length of the matched string */
  gint flags;

#define MPTN_LONGER 	0x01	/* Prefer longer matches */
#define MPTN_SHORTER	0x02	/* Prefer shorter matches */
};

The best way to start the discussion is probably to say which fields of mptn_t you do not need to deal with — in fact, most of them. These are magic (for the library to check its internal consistency), refcnt (for reference counting — use mptn_refinc/free), control (will be set when you allocate the structure), varnames (an array of variable names used; there is a special function to deal with that), and ops (matcher operations; will be filled for you).

The remaining fields, those which you do want to deal with, are:

min_len

This should contain the minimum length of the string matched against the pattern represented by your matcher.

max_len

Contains the maximum possible length of a string corresponding to your matcher. If there is no limit, use G_MAXINT.

flags

At present there are two possible values of this field (except for 0). MPTN_LONGER means that if the pattern corresponding to your matcher happens to be an element in a concatenation, the matching routine will attempt to occupy as big a part of the string as possible. MPTN_SHORTER, naturally, means just the opposite — that in a concatenation your pattern will get as little space as possible. These two flags are what the symbols > and < in the pattern language translate to.

Of course, we have missed one field -- mctl, but that was on purpose. Mctl can contain any value you please. This is where you will save the data corresponding to the specific task your matcher accomplishes.

The mptn_matcher_ops_t structure

What you do when you register a matcher (see the section called Associating matcher functions with names.) is you associate a mname with a structure containing several functions. Here is the definition of that structure; in the next following sections I will describe how its elements are used.
struct mptn_matcher_ops {
  mptn_matcher_init_ft 		*init;
  mptn_matcher_destroy_ft	*destroy;
  mptn_matcher_iter_start_ft 	*iter_start;
  mptn_matcher_iter_step_ft	*iter_step;
  mptn_matcher_ictl_free_ft	*ictl_free;
  mptn_matcher_stage_ft 	*stage;
  mptn_matcher_subst_ft		*subst;
  mptn_matcher_print_ft		*print;
};

Initializing and destroying the matcher.

mptn_t *mptn_matcher_init_ft(mptn_control_t *control, chr *args_begin, chr *args_end, GArray *varnames, gpointer data);

void mptn_matcher_destroy_ft(gpointer mctl);

When the pattern is compiled and it turns out that it includes a call to your matcher, the member init of the corresponding mptn_matcher_ops_t structure is called. It should be of the type mptn_matcher_init_ft.

The arguments to init are:

control

The mptn_control_t structure with respect to which mptn_parse was called.

args_begin

The beginning of the arguments list for the matcher (those that start after : in the pattern)

args_end

The end of the arguments

varnames

An array of variable names used in the top-level pattern.

data

An argument corresponding to the matcher name, set via mptn_matcher_param_set (see the section called Associating matcher functions with names.). If no argument has been set, NULL will be passed.

Given all this information, init is expected to produce a partially filled mptn_t structure. The structure itself should be obtained via a call to the following function:

mptn_t *mptn_new(mptn_control_t *control);

This function allocates the memory and sets some of the fields. Some of the fields are set by the init. The task of init itself is to fill the length fields, flags and mctl.

If the matcher decides it would need a variable to be present in the mptn_var_t array when the actual matching happens, it should indicate this by a call to the following function

guint mptn_var_get_index(GArray *arr, GQuark var);

with its own parameter varnames as the first argument and GQuark corresponding to the variable name as the second. This ensures that when the matcher is actually called, the variables array will already have place for the variable.

When the life cycle of the pattern is over, the destroy field of the matcher structure is called. It receives as its argument the mctl field; presumably, you might want to free some memory in this function. If there is no such need, you may leave the destroy field of your mptn_matcher_ops_t equal to NULL.

Iterating over a matcher.

gpointer mptn_matcher_iter_start_ft(mptn_t *mptn, chr *begin, chr *end, GArray *vars, guint stage, gpointer data);

gpointer mptn_matcher_iter_step_ft(gpointer mictl);

void mptn_matcher_ictl_free_ft(gpointer mictl);

When a pattern containing the matcher is applied to a string, the iter_start field of the mptn_matcher_ops_t structure gets called. It receives:

mptn

The matcher structure itself.

begin

The start of the string to be matched against the pattern.

end

The end of this string (points to the next character after the matching part).

vars

An array of variables (may be partially filled).

stage

The stage to start from.

data

The parameter passed to mptn_exec_start.

The function should find the first match, set the corresponding variables and return a pointer to a memory area representing the state of the iterator. This very pointer will be passed later as an argument to iter_step or ictl_free.

If the proposed string does not match, just return NULL

The iter_step member gets called when there is a need to advance the iterator. Again, it should return a pointer representing its state (if the next attempt to match is successful) or NULL (if it is not).

ictl_free is supposed to deallocate any memory that may be occupied by state information.

If there is no way a matcher can give several variants of a match, you may just leave iter_step and ictl_free equal to NULL. In this case you should return an arbitrary nonzero value from iter_start in case of successful matching, for example, (gpointer)1.

guint mptn_matcher_stage_ft(gpointer mictl);

The member stage may be called to determine the stage at which the iterator stopped. If you leave this member empty, the iterator's initial stage (same as passed to iter_start) will be given to the caller.

Substituting variables in a matcher.

chr *mptn_matcher_subst_ft(mptn_t *mptn, GArray *vars, gpointer data);

The member subst gets called when there is need to substitute variable values and get a string. The string returned should be allocated via g_malloc. Parameters are:

mptn

The pattern structure corresponding to the matcher.

vars

The variable assignments.

data

Additional parameter passed to mptn_subst.

Printing out a pattern.

void mptn_matcher_print_ft(mptn_t *mptn, FILE *fp, guint offset);

Finally, the print member of the mptn_matcher_ops structure gets called when there's need to dump the internal structure of the matcher. The arguments received are similar to the arguments of mptn_print.