[leafnode-list] leafnode2 + lua
clemens fischer
ino-news at spotteswoode.dnsalias.org
Fri Jul 25 17:54:26 CEST 2008
hi leafnode2 users,
a few weeks ago i announced my plans for adding lua-scripting to at
least fetchnews to be able to increase its ability for filtering.
presently i have a version in test mode on my PC. it allows access to
all of lua, adding to or replacing the complete header, adding to or
replacing the cached part of the body and changing the newsgroup(s) an
article will be stored at. of course it is possible to use luas builtin,
regex-like pattern matching for all of this.
now, lua is a highly dynamic language. it is easy to shoot oneself in
both feet if using risky constructs. so i'm trying to find a way for
easy configuration for some of the more important, filtering related
features.
i have two problems left to solve:
1. lua does not include any socket support. you would have to use
a module called "luasocket" available for linux, the BSDs and
possibly others. then again, bogofilter, my favourite bayes filter,
might do socket-I/O, but it doesn't provide a "daemon-mode" allowing
to push many articles through it without restarting it each time.
being not quite satisfied with the various, sometimes incomplete
packages of luasocket for the many platforms and "distros", and not
willing to roll and support my own, i settled for a very simple solution
using only what stock lua gives me. you may be aware of the unix library
function "popen" in manual section 3 of most any unix. it is tied into
the stream buffered standard-file I/O, which is its biggest problem.
while freebsd provides a two-way pipe implementation, it is unusable
from lua. but even if it were, we cannot pump arbitrary amounts of data
into it, because if the popen'ed program starts to output and this
output isn't read in time, the machinery blocks. i tried a few times,
but the most reliable i came up with is this:
io.popen(bogofilter .. " > " .. tmp_result, "w")
the tmp_result file will contain only one line of bogofilters result.
without two-way communication there's no blocking. putting this file on
a memory-disk will make this fast enough, but still no daemon-mode.
bogofilter is started up for every article, but at least with luasocket
we have the possibility to connect up to any external filter supporting
sockets if available.
i have this working and filtering USENET articles with a few kinks left.
for example, my bayes database consists only of emails, there are no
USENET headers trained on. plus, reading only even a few dozen
newsgroups, the material is so diverse that it might be advisable to
setup a separate bayes database for USENET stuff.
btw, how much of the body do you think do you think needs to be cached
in memory for effective filtering? i have simply defined a macro to the
value 15000 (bytes). this would be 187 lines of 80 bytes, but i think
this should be made runtime configurable by a fetchnews option.
2. i don't really know what you people want to filter for. me, i need
a bayes filter, but i also need to strip all that HTML crap in some
groups. then there are groups where a googlegroups message-ID
almost always means spam.
for this, one might want to define a hierarchy of group parameters,
with "prototypes" to get defaults from. lua is quite powerful in Object
Oriented terms. one can keep the parameters in tables and provide an
"__index" so-called metamethod pointing to a prototype. that one can
have its own prototype table and so on. thus you can have the
equivalent of the current filtering systems "." catch-all with all sorts
of variations.
what i think we need is a table structure where each table describes
filtering parameters for one group:
patterns = {}
patterns.root = {
header.1.match = "some pattern possibly with captures"
header.1.replace = "some string possibly with captures"
header.2.match = ...
header.2.replace = ...
body.1.match = "some pattern possibly with captures"
body.1.replace = "some string possibly with captures"
body.2.match = ...
body.2.replace = ...
articlefilter.1 = bogofilter
}
patterns.group.blarp.feudel = {
header.prototype = patterns.root
}
this is pseudocode, the nested tables have to be setup using
table.insert or some other mechanism. prototypes have to be set by
setmetatable(table, metatable). i hope you get the idea from this
description. i'm interested in the way _you_ want to use filtering. if
we can agree on something robust, pleasant to use and implement, i will
provide it as sample code.
of course, if your filtering needs are simpler, you don't have to use
any lua Object Oriented bloat or tables, but i think if we find
a generic pattern, it will be much easier to maintain.
note that when using the lua-extension and messing up your own code, it
is you who will be in for long debugging sessions unless you can
interest others from this list in your concept!
regards, clemens
More information about the leafnode-list
mailing list