[leafnode-list] Re: add lua-scripting to leafnode2

Mon Jun 23 02:55:33 CEST 2008

On Sun, 22 Jun 2008 16:47:22 -0400 Cory Albrecht wrote:

> Since SpamAssassin is written completely in perl, I can't imagine that
> a lua interface to SA woudl be a small undertaking. However, I know
> zero about lua, other than it exists.

the idea behind the scripting extension is to avoid a fork(2)/exec(2)
combination on every single article, but not prohibiting it if people
want it. so i wanted something able to link up any anti-spam tool.

AFAIK SpamAssassin is typically run as a daemon.  it sits there waiting
on incoming connections on a socket, be it unix-domain or tcp-network.

with lua-scripting, you can load the lua-socket-module and use it to
connect to any persistent server. this is in no way different than doing
it in perl.

OTOH, people might not want a tool as large as SA.  they might go for
a simple fork/exec on, eg. "pcregrep -f <file-of-patterns>" or even do
without any external programs, just using luas builtin pattern matcher.

> Since perl is already there on 99.9% of un*x systems by default, one
> would only link with the appropriate embedding *.so file. I've done it
> under Windows, but not a un*x, so I can't say I am 100% on the exact
> mechanics in un*x.

of course, with every scripting based on hooks the script-extension
author has to provide the hooks implementation and the user their
semantics in the form of some lua/perl/whatever functions.

> Also, I was assuming that one would use autoconf (or whatever)
> --with-<feature>/--enable-<feature> arguments when setting up the make
> environment. That would negate extra size because an admin could build
> with it or without based on personal preferences.

yes, this is exactly what happens.  but you should consider the extra
processing time incurred by scripting:  you need to initialize scripting
once per fetchnews run, and re-initialize some variables for every
article processed.  then comes the processing time per article.  also,
there's quite a bit of malloc/realloc/free going on behind the scenes,
which has costs in both time _and_ space.

> From my perspective, preferring perl, all I would ask that the
> scripting interface be language agnostic.

where did i say it isn't? what i will provide is some function calls
throughout fetchnews with names like "script_gurgle", "script_blarp" and
so on. these functions get to see some internal read-only structures as
their arguments, like the array of header- or body lines. in addition
i'll provide these hooks implementation for lua, consisting mostly of
repackaging the internal data to lua-accessible items. there will be
example code in the user part of these hooks, but he or she can do
whatever is needed.

again: if you want perl, you won't need to change anything in fetchnews,
you would just write up script-perl.[hc] to handle those hooks and add
some includes to script.h. plus, you'd need the autoconf magic for the
"--with-scripting=perl" option.

> What I'm thinking is sort of like this:
>
> 1. the main leafnode2 config file has a way of saying dlopen <blank>.so
> for a scripting language (allows for multiple modules), i.e. :
>
>    load_script_lib = perl /usr/lib/leafnode2/leafnode-perl.so
>    load_script_lib = lua /usr/lib/leafnode2/leafnode-lua.so"
>
> 2. in the filters file you have a way specify  the .so file from the
> load_script_lib and the name of the filter, e.g.
>
>    language = perl
>    name = CheckWithSpamAssassin
>    action = select
>
> 3. When processing filters on a received message, leafnode2 determines
> whether the filter is a old-style filter or an external filter. ...

well, these features will have to be a later milestone.  systems with
usable dlopen(3) implementations won't pose many problems for this type
of configuration, but currently my first priority is to get lua
scripting working.

nevertheless your notation for the configuration section is well
chosen. i'm not sure if matthias wants all this new-school in the
config file(s). first of all: if we let people use _several different_
scripting engines at the same time, how do we isolate them, how do we
make sure every one of them gets its data intact, what order do we call
them in? which scripting module gets the final say on the future of an
article? what happens when one of the modules is broken?

to me this is way too complicated.  i want the possibility of having
several scripting engines, but, at least for the first version, only one
can be picked by the admin.

there's another point to be pondered upon: as the configuration section
becomes more and more complicated, we might want to redo it in a
scripting language. lua was invented for this purpose, but has proven to
be of much greater value. today it is used in gaming-software for the
parts of the game which have to change frequently and dynamically.

and don't forget the task leafnode/fetchnews must handle:  the NNTP
protocol engine, the storing of articles and overview data and the
serving of articles to local newsreaders.  i'll be quite content, even
proud if i can add lua scripting without breaking anything.  without
matthias suggestions and clarifications i would have made serious
conceptual mistakes already, with only a few lines of code written so
far!

> it's an external filter, send it off to routines in hypothetical
> script.c part of leafnode2 which take care of calling the external
> .so and passing it the entire NNTP article and then reporting back to
> the filter processor the possibly altered article and a true or false
> value indicating whether action happens or not. In my hypothetical
> perl CheckWithSpamAssassin filter, it always returns true but it adds
> various X-Spam-* headers.

yes, i think it's easy to add some headers, although it may be
impossible to add to the body.  USENET articles may get enormous, but
fetchnews cache allowance isn't.  thus it will be impossible to attach
a signature to very large articles.  as header-processing is more
involved and detailed than body-processing, the latter won't allow
more than inspection.  fetchnews has elaborate machinery for the
headers, partly because overview handling is vital for a NNTP server,
but bodies just get copied along line by line.

> Of course there are many of way this could be done. Maybe the
> external .so files only only need to export the function int
> DoLeafnodeScript(char *lpScriptName, struct _article *pArticle).
>
> Or maybe the command load_script_lib in config only needs one param,
> the name of the .so to load and not the language.Leafnode2, when
> parsing config calls char **ListAllFilters(void) in the .so which
> returns an array of strings which are the symbols in the .so as well
> as what goes in the "name =" line in filters. The filters processor
> sees the name parameter and hands it off to script.c which takes care
> of calling int CheckWithSpamAssassin(struct _article *) from the
> appropriate .so and again returning the possibly altered article along
> with the pass/fail result.

nice idea.  the protocol/behaviour i'm thinking of is much less dynamic,
it just knows about magic names for the hooks, their arguments and their
return values.  for you this means that you'd have to implement all of
this in lua-code (which isn't difficult with luas meta-tables), and for
me this means that the task doesn't grow without bounds  8-).

btw, there is no script.c. don't be disappointed. script.h contains
the preprocessor statements for selecting which files have to be
"#include"ed and which stubs/hooks to nullify or expand.

i get the feeling that you're about to invent a combination of ASN and
XML.  i'll take down notes from your suggestions, but i will stick to my
concept as long as nobody finds show-stoppers or no-go areas.

so far i think all your ideas are doable, but my undertaking is to get
rid of some very bad cases of spam and vandalism in some of the groups
i read, and i need it badly enough to get going with the coding real
soon.

let the future decide in what way to expand.

> Like I said, I don't know lua from a hole in the ground, but I do know
> that by using the embedding magic for perl that one can take values in a
> C/C++ program and link them with perl values so that the perl script can
> read and maybe modify variables internal to the program. Thus the
> calling of the external flters could get very complex and possible
> modify the state of the running fetchnews rathe rthan merely returning a
> pass/fail value on a filter.

oh, that would be far from what i'll implement and wrote about in my
previous articles! currently, the filter-code is supposed to return a
tuple (or maybe three different values): a status or score, a table of
new newsgroup names even if it doesn't change the original ones and an
extra header table.  the score is currently not implemented in
fetchnews' store.c, but we need a way of telling if something went wrong
with the filtering, hence the status.  then come the new newsgroup
names.  if the article passes as ham, it may be the same as specified in
the original article, but these newsgroup-names might change in order to
re-file them into some "spam." or "biz." or "pr0n." hierarchy.  in these
cases the user might want to see results in special headers added by the
extension, hence the extra header table.

regards, clemens

PS: <url:http://www.lua.org> <url:http://www.lua.org/docs.html> !!