Jeremy Elson pounded furiously on the keyboard: [...] > Lex is a great tool, but I have avoided including a lex-built parser for > a number of reasons. Mainly, because the efficiency gains in using a > lex-built parser are not worth the complexities it introduces. I disagree. I feel that using lex and yacc reduces the overall code complexity. > For example, in your example lex file, you had to enumerate > n > no > nor > north > > For north! This is complex. I re-emphasize that the code I posted earlier was for illustrative purposes. All the examples were something that I cobbled together in little over half an hour just to show lex and yacc's capabilities. I didn't intend for it to be a real basis for a lexer and parser for CircleMUD. As to being complex, I heartily disagree! In this case, the possible permutations for "north" commands are explicitly stated. Nowhere in the 1500+ lines of ``interpreter.c'' did I see a way to automagically come up with shortened prefixes for commands. And, yet, in four lines I've explicitly done so in a way that, to me, is intuitive. :) > What if you have 400 commands? Then you might have too many commands? ;) Seriously, though, the more appropriate (and what's typically done) method would be to have a core set of tokens that the lexer explicitly looks for, with the remaining keyword set being in some form of symbol table. The book I mentioned actually goes on to illustrate just such a set-up. Granted, CircleMUD already has such a type of symbol table, cmd_info, located in ``interpreter.c''. However, the structure of that table is not clearly defined, requiring the programmer to reverse engineer the parsing mechanism to deduce the underlying semantics of the various fields. (I've just re-read the preceeding paragraph. I've been working with the government for too long! Aiee!) > How will you handle conflicting commands? Conflicting command syntax is a sign of linguistic ambiguity, which is a badism. "Remember that conflicts mean that yacc can't properly parse a grammar, probably because it's ambiguous, which means that there are multiple possible parses for the same input. ... this usually points to a mistake in your language design. If a grammar is ambiguous to yacc, it's almost certainly ambiguous to humans, too." pg. 64, _Lexx & Yacc_ You'd have to do something like.. > > n -> north > no -> north > nor -> north > nos -> noshout > nort -> north > north -> north > nosh -> noshout > nosho -> noshout > noshou -> noshout > noshout -> noshout > What would you have to do if you decide that "n" means "noshout" instead > of north? What if you have 15 different commands that start with n? Lex and it's derivatives always match against the longer rule. So: n | no { return NORTH; /* <1> */ } noshout { return NOSHOUT; /* <2> */ } A string of "noshout" will always match the second rule even though the first two characters, 'no', partially match the first rule. Again, this is because lex prefers to match against a longer regular expression. Besides, I'm leaning towards having only two strings match for the NORTH token: 'n' and 'north'. If you want any other variations, you can use an alias to create them. And, for toggling various switches, I'd have a general language construct like this: no { SHOUT | TELL | GOSSIP } If you wanted "nos" to stand for "no shout," then you can make that an alias. I'm not hung up on this; I'm still working out the command language syntax. > CircleMUD's parser is not efficient at all but it gets the job done and it's > easy to change. (In 3.0 you just change the order the terms appear in the > list to change their precedences.) It might be (sorta) easy to add new commands to the command symbol table, but it's not easy to _change the grammer_. For example, when I was dreaming up the earlier examples, I arbitrarily decided to extend the example grammer to allow an optional numeric parameter that specified the number of moves to make in a given direction. It took just a few minutes to add in the NUMBER token and its support, and to extend the yacc grammer definition to accomodate specifying the number of moves. I've looked through ``interpreter.c'' and I don't know where even to begin to make the necessary changes to do the same thing. :-\ In short, adding new keywords and their respective support functions seems to be relatively easy in CircleMUD; modifying the command line _grammar_ is a different story altogether. Besides, it's safe to say that more programmers understand lex and yacc than there are programmers that know how the CircleMUD's parser works -- if the parser were written using lex and yacc, it's more likely that newbie programmers would be able to get up to speed on the code as they're more likely to have some familiarity with lex and yacc. Anyhoo, they can always run down the street to the local Borders or whatever and get a lex and yacc textbook; they can't do the same for the CircleMUD parser. > Many people have said in the past that the CircleMUD parser is inefficient > and should be changed. I agree that it's ineffeicient but I do not think > it should be changed because it's simple and it's efficient *enough*. > Have you ever profiled a running CircleMUD? If you do, you'll find an > incredibly small amount of time (say, 0.5% or something, if I remember > correctly) is spent in command_interpreter() and its subsidiaries. > > If you make CircleMUD's parser *10 times* more efficient, you'll make > CircleMUD 0.45% faster. Is that really worth it? Apparently my mistake in the original post was to over-emphasize the parsing speedup that using lex would garner. You're absolutely right -- in fact the principle is called Amdahl's Law. Why spend an inordinate amount of time optimizing code that's using an insignificant amount of cycles? If your goal is truly to speedup program execution, then you should concentrate on the "hot-spots" where your program is spending most of its time. However, my goal with using lex and yacc is not necessarily to speed up CircleMUD (although that'd be a likely by-product), but to reduce the source-code's complexity and to increase its scalability. > Besides, CPU time is cheap -- people's time is *not* cheap. If I've > learned anything about software engineering, it is that your goal should > be to reduce the amount of work *people* have to do or the amount of time > people have to wait, not the amount of work computers have to do. Computers > don't care. ... and it's ironic that you should mention this as this is primarily the reason why I wanted to re-write the CircleMUD command line parser using lex and yacc! =8-) I looked through ``interpreter.cc'' and had great difficulty in making heads or tails of _exactly_ how it worked. There are a number of ancillary lexing/parsing functions whose purpose is unclear and whose relationships with the rest of the program is difficult to determine. If something goes awry in any of the parsing functions, I'm not sure where to start looking for the cause of the problem. I do note that a lot of the work done by some of the lexing functions are handled implictly by lex. Using lex would therefore deprecate a lot of existing code -- that alone would mean reducing the parser's complexity. > In other words, if you want to rewrite a huge chunk of CircleMUD code in > a way that's more complex than it already is, your time would be much > better spent attacking a piece of the code that could benefit much more > from improvement -- say, the way mobile specprocs are called every 10 > seconds, or changing the specproc structure over to being event driven, > or some such. Before making significant enhancements to a program, the underlying infrastructure should be sound. I've worked on adding enhancements to crufty code before, and twust me, it's an arduous process. =8-) Although CircleMUD has its good sides, it needs some basic things done first before new features are added: o Makefile should not have explicity dependencies; ``makedepend'' or gcc -M should be used to automagically generate them o Makefile should rely as much as possible on implicit pattern matching rules to eliminate redundancy and complexity o Add header file sentinals o Reduce dependency on pre-processor wherever possible o Simplify header files by moving as much as possible to implementation files o Introduce opaque types to reduce code complexity o Rely on lex and yacc for command line parsing o Re-visit the command language grammar and its implementation o Rely on lex and yacc for world, zone, shop, and object file parsing o Re-visit the world, zone, shop, and object file language syntax I don't mean for this to sound like a out and out diatribe on CircleMUD. It's a good program with a lot of potential --- as you've emphasized in the documentation, now the program works, so optimization and polishing can begin. Working on the above points first before adding features would go a long ways towards achieving those goals. Then again, that's JMHO and YMMV. ;-) Mark! -- Mark Coletti | DBA Systems, Inc. Fairfax, VA mcoletti@clark.net | United States Geological Survey http://www.clark.net/pub/mcoletti | Office of Standards & Technology code as if whoever maintains your code is a violent psychopath who knows where you live
This archive was generated by hypermail 2b30 : 12/07/00 PST