On Wed, May 14, 2003 at 06:57:41PM +0200, Templar Viper wrote: > From: "Thomas Arp" <t_arp@stofanet.dk> > > From: "Templar Viper" <templarviper@HOTMAIL.COM> > > > A while ago, I fixed this code together. It checks the argument for > > > cursing (placed in curse_list), and replaces it with a harmless beep. I > > > had to use str_strr, as strstr isn't case sensitive. However, I want to > > > ignore certain characters, such as spaces, full stops and more of those. > > > > I'm a bit unclear on your intent here; > > Do you wish to make sure to catch "CUR SE", too? Or do you mean you have > > 'multiword' curse words ? > > Yes, I do want to catch "CUR SE". I had to help out with something like this a few years ago for a semi-major online game from a semi-major online game company (both of which shall remain nameless <grin>). We didn't both to filter chat--that was nominally covered by the TOS agreement (<chuckle>), but we did want to prevent people from choosing handles based on various 'illegal' words. We did this: 1) Generate a list of invalid words (BADWORDS), in plain english.[1] 2) Generate a list of substitutions that could be used, stuff like 'l' => 'l', and 'k' => '|<', and 'U' => 'V' or '|_|', etc 3) Permute BADWORDS against all of our mappings, and generate a HUGE list of 'invalid' names. 4) On each name creation, check the proposed name against this list. Note 1: This was, perhaps, the single most fun thing I've ever done in a paying job. :-) I also discovered just how twisted the minds of two of my co-workers are... It worked quite well, and the permutation, which was essentially an N*N operation, only had to be done whenever we update the BADWORDS list (which wasn't every often...see note 1...). The new checks added zero CPU time overhead. These were also running on hardware that was already old (HP 715 pizza boxes). Now, if you want to filter on the fly, you now have, at worst, a linear search to perform per word. Naturally, if you do some clever data structure work, you can improve on this. All of the patterns were for matching whole words only, so we didn't have to worry about works like "hassle". Adding a rule to permute your words into spaced out versions and adding them to your checklist can also be done. This might be a fair bit faster than recomputing each word on the fly, especially if you have a good searching algorithm. --Hawson -- +---------------------------------------------------------------+ | FAQ: http://qsilver.queensu.ca/~fletchra/Circle/list-faq.html | | Archives: http://post.queensu.ca/listserv/wwwarch/circle.html | | Newbie List: http://groups.yahoo.com/group/circle-newbies/ | +---------------------------------------------------------------+
This archive was generated by hypermail 2b30 : 06/26/03 PDT