[leafnode-list] WANTED: leafnode-1 filter examples from real-life (was: groupinfo has lost some moderated flags)

Matthias Andree matthias.andree at gmx.de
Thu Mar 23 02:00:35 CET 2006


"Michael R. McCarrey" <wa7qzr at myrealbox.com> writes:

> This msg is in response to the last two questions which Matthias asked
> above.

Thanks.

> I'm a new Leafnode user and as such am still trying to sort it out. 
> Fortunately, it's very intuitive; meaning there's little I actually have to 
> do to make it work. I do have one problem, and that's with the documentation 
> concerning writing filters.

Okay.

> The examples supplied are sufficient, I suppose, for Perl programmers and such 
> who, for example, who are familiar with regular expressions.

Actually, with the Perl flavored regular expressions. There are many
kinds, as you suggest, and it's quite a mess.

> What seems to work there, doesn't work very well in practice.

Sorry to hear that. Does regexcoach understand/train Perl-compatible
regular expressions? What other assumptions does regexcoach make?

Leafnode compiles PCRE patterns with just PCRE_MULTILINE set.

> Most of mine end up being either too greedy (matching far too 
> much), or they never match completely. The O'Reilly book on the subject 
> doesn't help much either. Searching the Internet for working examples has led 
> me to believe that this is not an isolated situation, peculiar to me only. 
> Lots of requests for help with regex, but few useful answers are to be found.
>
> So, as a user, I'd like to see more, and better examples of regex usage in 
> Leafnode; Examples that go over the syntax that actually works with the 
> program because, things like the following, which are supposed to be 
> legitimate regex, don't seem to pass muster:

I don't use leafnode-1 patterns in production, and thus am asking for
people to show their working patterns that we can then include in
filters.example.

> # Filter "Content" headers
> ^Content-Type:.*text/html #-No HTML articles
>
> # Filter "Subject" headers
> ^Subject:.*[Mm][Oo][Nn][Ee][Yy] # money
> ^Subject:.*[Pp][Rr][Oo][Ff][Ii][Tt] # profit
> ^Subject:.*[Ii][Nn][Cc][Oo][Mm][Ee] # income
> ^Subject:.*[Aa][Ll][Ww][Aa][Yy][Ss].*[Ww][Aa][Nn][Tt][Ee][Dd] # always wanted
> ^Subject:.*[Ww][Aa][Nn][Tt][Ee][Dd].*[Aa][Ll][Ww][Aa][Yy][Ss] # wanted always

These all have comments inside the regular expression - PCRE does not
recognize these as comments, because leafnode compiles regexps with
PCRE_EXTENDED unset. Put the comments alone on the line before or after
the regular expression.

Alternatively, (UNTESTED - no warranties!) prepend (?x) early on the
pattern line. This enables PCRE_EXTENDED for the rest of the line (which
has other implications though, check the PCRE man pages).

Additionally, you can switch to case insensitive mode with (?i) in the
pattern, for instance:

^Subject:.*(?i)money

> # Filter "From" headers
> ^From:.*Nasty Poster \<_-_Stinky917_-_ at somedomain\.com>

Should be fine, although the \ isn't needed for < in PCRE - this doesn't
hurt though. Check if Nasty Poster is in the header, or "Nasty Poster"
is.

> ^From:.*[a-zA-Z]\<whatisitallabout at somedomain\.com>

This would match with a letter right in front of the opening bracket.

> ^From: *[Rr]obert* && ^User-Agent: *Outlook* && ^Path: *Sausage*
> ^From:.*[Na]sty* && ^User-Agent:.*Forte* && ^Organization:.*Road Runner High  
> Speed Online*

You cannot combine regular expressions like this (with &&) in leafnode
unfortunately.

Looking through the code whilst documenting applyfilter better (which is
the reference for the filter files in leafnode-1) I found two bugs as
well, these will be fixed in 1.11.5.

> So, if anyone has constructed filters that actually work, they could, with a 
> little modification to protect privacy, be the genesis of some good filter 
> documentation for Leafnode.

Bring them on :-)

Thanks for taking your time.

-- 
Matthias Andree



More information about the leafnode-list mailing list