[leafnode-list] add lua-scripting to leafnode2
clemens fischer
ino-news at spotteswoode.dnsalias.org
Thu Jun 19 21:43:33 CEST 2008
i'm planning to add lua-scripting support to leafnode2.
the first "milestone" will be an interface to fetchnews allowing people
to filter on the complete headers of articles as well as (part of) the
body. "part of" means: you can cache only so much of large articles in
memory, so there will be an arbitrary, configurable amount set aside for
this task. in addition to filtering, users will be able to change the
names of the newsgroups an article will be filed under. the idea is to
either filter using luas builtin string-matching primitives or hand an
article off to an external program such as a bayesian filter. then the
lua script can decide to eg. prefix "spam." to every newsgroup according
to the classification result. articles thus tagged won't clogg the
original newsgroups, but will still be available some place else for
later inspection.
currently, there are no plans to add scripting to leafnode, the server
part of the package, because my main motivation is to have something to
fight the growing vandalism in some groups.
there will be a compile-time option to include scripting. if turned on,
the extension will read some special file containing so-called "hooks"
littered about fetchnews code. if not, leafnode2 won't change in any
way.
the hooks i currently ponder:
function fetchnews_headertable_lua(header_table),
returns table-of-newsgroup-names
the names are modelled after the pattern "fetchnews_" + argument-type(s)
+ "_lua", because somebody might want and implement scripting for
leafnode (the server) or in another language. the one mentioned only
receives the headers, for people using the "delaybody" option.
the return value will always be a table containing a status code and the
new or unchanged newsgroup names.
another one is:
function fetchnews_headertable_text_lua(header_table, article_string),
returns table-of-newsgroup-names
it gets the same table of headers plus the entire article formatted as
a (lua-) string, the result is the same as above.
if feasible, there might be yet another hook like:
function fetchnews_headertable_file_lua(header_table, article_file),
returns table-of-newsgroup-names
this function would receive the same table of headers plus a
file-descriptor to a temporary file containg the entire article
unabridged. don't count on this last one: all the filtering and
scripting is supposed to be as lightweight as possible to not interfere
with fetchnews operation all that much. a modern operating system might
be good in memory-caching, but disk-operations mean seeks, which means
time possibly wasted on a spammy article.
the entire interface is still subject to change. for example,
it might be better or more convenient to hand over C-strings
instead of lua-strings, or the return-values might be a (number,
table-of-newsgroups) tuple.
i picked lua as the language for the extensions because (a) matthias
mentioned it in a private mail, (b) it is quite easy to implement, (c)
users can easily add a lua-sockets module and use it for persistent
connections to their favourite filter-engine, and (d) it is nice and
easy to program in. you might want to check <url:http://www.lua.org>
for further information on the language.
i am mentioning all this not only for your entertainment, but also to
solicit criticism and suggestions. i don't have all that much time, but
i want to do it "right" and i'm going at a slow pace here.
regards, clemens
More information about the leafnode-list
mailing list