Yo! (Again!) A couple of folks have asked me what all this "lex" and "yacc" stuff is all about. Still others have wondered about whether "lex" or "yacc" are relevant to CircleMUD. Well, instead of tailoring a bazillion individual replies, I've decided to just briefly describe what the hey "lex" and "yacc" are and (hopefully) convey just exactly how useful it would be in CircleMUD. So, here goes! =8-) I'll first discuss lex, and then yacc. Lex is a "lexical analyzer." It's used to take an arbitrary stream of characters and convert that stream into tokens. I wrote a simple lexer in lex in five minutes that would parse the CircleMUD movement directions for "north," "south," "east," and "west." I've included the source below. (And, no, I don't expect you to understand it, but I just wanted to show you how easy to understand a lex file is. =8-) ----->8 snip! 8<-------- %{ /* Sample lex script for tokenizing directions /* C declarations and includes go here */ #include <stdio.h> %} %% [\t ]+ /* this regular expression pattern will ignore whitespace */ n | no | nor | nort | north { printf("Going to the Great White North, a?\n"); } s | so | sou | sout | south { printf("Going south for the winter!\n"); } e | ea | eas | east { printf("Go to where the sun rises!\n"); } w | we | wes | west { printf("Go west, young man!\n"); } [a-zA-Z]+ { printf("Eh, wot?\n"); /* bogus input */ } .|\n { ECHO; /* normal default anyhoo */ } %% int main() { yylex(); /* this starts the lexer, which will start reading stdin and echoing to stdout */ exit(0); } ----->8 snip! 8<-------- I compiled, linked and ran it as follows: # this is the lex file with the above source resdgw17 ~/tmp/src 48 > flex direx.l # running flex, which is a public domain version of lex, # produces the following C source code file (I chose flex, # because it generates superior lexers and has fewer bugs) resdgw17 ~/tmp/src 50 > ll lex* -rw-rw-r-- 1 mcoletti general 37123 Oct 31 10:56 lex.yy.c # now I just compile and link it resdgw17 ~/tmp/src 52 > gcc lex.yy.c -o direx -ll # and now I can run it! resdgw17 ~/tmp/src 54 > direx n Going to the Great White North, a? s Going south for the winter! e Go to where the sun rises! w Go west, young man! nor Going to the Great White North, a? east s west nor Go to where the sun rises! Going south for the winter! Go west, young man! Going to the Great White North, a? bogosity Eh, wot? Well, that's cute, but somewhat useless. What we need next is a _parser_ that will take tokens issued by the lexer and, well, parse them. :) That is, the lexer will read in a stream of characters, convert those characters into tokens, and pass those tokens up to the parser. The parser, in turn, will do something meaningful with those tokens. As you've probably already guessed, yacc builds parsers just as lex (or flex) builds lexers. However, before diving into a yacc file, I need ta make a few niggling changes to the lexer so that it actually returns tokens instead of printing out what it found. The changes follow here: ----->8 snip! 8<-------- %{ /* Sample lex script for tokenizing directions /* C declarations and includes go here */ /* we need to get the token definitions from yacc, which are found in this header file that's automagically generated by yacc (or bison) */ #include "y.tab.h" %} %% [\t ]+ /* this regular expression pattern will ignore whitespace */ n | no | nor | nort | north { return NORTH; } s | so | sou | sout | south { return SOUTH; } e | ea | eas | east { return EAST; } w | we | wes | west { return WEST; } [a-zA-Z]+ { printf("bogus direction %s\n", yytext); /* bogus input */ } [0-9]+ { yylval.ival = atoi(yytext); return NUMBER; } .|\n { ECHO; /* normal default anyhoo */ } %% ----->8 snip! 8<-------- There are a few subtle changes I've made to this lexer. First, I now return tokens instead of printing stuff. I've also added a new token, NUMBER, which returns an integer to the parser. You'll see why I added that in a minute. ;) Okiley dokiley, now for, ta daaaa! the parser. Again, I just put this up for illustrative purposes. I don't intend for this to be a for real parser that will be used in CircleMud; this is something I kludged up over 'bout a half hour or so. ----->8 snip! direx.y 8<-------- %{ #include <stdio.h> %} %union { int ival; } %token NORTH SOUTH EAST WEST %token <ival>NUMBER %right NUMBER %% direction_list: direction | multi_direction | direction_list direction | direction_list multi_direction ; direction: NORTH { printf("you move north\n"); } | SOUTH { printf("you move south\n"); } | EAST { printf("you move east\n"); } | WEST { printf("you move west\n"); } ; multi_direction: direction NUMBER { printf(" %d times\n", $2); } ; %% ----->8 snip! direx.y 8<-------- In this yacc file I've defined a grammar that consists of direction tokens for each of the cardinal directions and another token for numbers. I've also defined a union type that will store any integers the underlying lexer runs across. The grammar is very simple. It consists of a list of directions. Directions themselves can be merely any one of the cardinal directions, or a cardinal direction _and_ a number; the number represents how many times you want to move in that direction. Ok, now to build the sucker: % flex direx.l % bison -d --yacc direx.y % gcc y.tab.c lex.yy.c -o direx -ly -ll I used ``bison'' which is the GNU version of yacc. This is preferable because it generates a better parser, gives better diagnostics, and apparently supports re-entrant parsers (which is big-time relevant to CircleMUD). Ok, now to run it! % direx n s e w you move north you move south you move east you move west n 1 sout 2 e wes 3 you move north 1 times you move south 2 times you move east you move west 3 times foo bogus direction foo bar ba bogus direction bar bogus direction ba 4 parse error Exit 1 Hopefully I've shown that lex and yacc (or their equivalents) are way useful at building parsers. In a matter of about half an hour, I was able to cobble together a very, very simple parser that would've taken much longer to write out by hand. Better yet, the generated parser is more efficient than one I could write in a reasonable amount of time. To quote from _Lex and Yacc_ by John Levine, Tony Mason and Doug Brown, "A lex lexer is almost always faster than a lexer you might write in C by hand." (pg. 1) Also, it's certainly _much_ easier to understand than the one found (somewhere) in CircleMUD, IMHO. =8-) I hope this helps! Mark! -- Mark Coletti | DBA Systems, Inc. Fairfax, VA mcoletti@clark.net | United States Geological Survey http://www.clark.net/pub/mcoletti | Office of Standards & Technology A toast to bread, for without bread, there could be no toast.
This archive was generated by hypermail 2b30 : 12/07/00 PST