[leafnode-list] Re: trying to decide which is better

Thu Nov 20 22:10:35 CET 2008

Eric S. Johansson schrieb am 2008-11-20:

> I'm trying to setup an internal NNTP based conference for a client and
> what I originally took on the job, they told me there was a relatively
> small (5000ish) existing set of messages.  Then they said "oh by the
> way" and pointed me to an another archive of messages they want
> stored.  I haven't counted them but it's something like eight years
> worth of internal traffic.  I need to reevaluate which nntp
> environment I should use and I'm trying to decide between leaf node
> and WendzelNNTPd (http://www.wendzel.de/?sub=softw&ssub=wendzelnntpd).
> I'm not really a fan of the hairball of complexity known as INN but I
> will use that if I have to.

Hi Eric,

I haven't got time to look at WendzelNNTPd.

If IMAP (rather than NNTPd) is an option, have a look at Dovecot and if
it meets your requirements.

> in addition to a large pool of messages, the customer never wants to
> expire messages.  While turning off expire is easy, but has some
> serious implications for the message store in the back end especially
> when planning for another five to 10 years of message accumulation
> (10-50/day).

Leafnode uses what INN would call tradspool - files 1 2 3 4 5... in a
group directory. So, that would add some 80,000 files over the course of
the years. Quick file access can be achieved with modern file systems,
such as "dirhash" on UFS, "dir_index" on ext3, and most tree-structured
file systems should also be able to deal with this.

What I cannot answer off-hand is how often leafnode would have to
rebuild (thus rescanning all articles) the overview files, and if there
are non-scalable parts in leafnode. I haven't looked at the threading
code in texpire (that you do not need anyways, if you do not want to
expire), but outside that I'm not aware of leafnode doing really stupid
things in the main code paths. (This isn't saying that texpire does, but
I simply haven't looked at the complexity.)

I'm happy to help with patches addressing non-scalability - I'd say do
some test runs on a test machine and see if it gets slow with "many
messages", let's profile and fix the slow parts.

> I also need to inject messages (converted from an custom-built web
> forum and others, as I mentioned, from an earlier usenet environment)
> and specify the message ID and threading.  Leafnode seems to have a
> problem with me specifying my own message ID

In what context exactly? I'm not quite sure what you mean.
If you need to replace the ID or References, you need to that outside
leafnode. If you POST, leafnode suggests a Message-ID that the client
can either pick up and use, or the client can generate its own, leafnode
doesn't care beyond some basic syntax and duplicate checks.

Frankly, I haven't used leafnode with really crammed groups of tens of
thousands of articles, and I don't recall the complexities of all those
little algorithms all over the place, but I do recall having used
red-black-trees and quicker sort functions for pre-sorted material (such
as mergesort) in various places.

> my primary question is "have my just-discovered requirements
> eliminated leaf node as a viable platform for internal conferencing?".
> leaf node has been great to work with but I really need to know if
> I've exceeded its operational boundaries.

> if it has, then there's no need to deal with the other problems I've
> found.  :-)

> I really appreciate the feedback.

-- 
Matthias Andree