[leafnode-list] Re: better hash for message.id/ ?

Matthias Andree matthias.andree at gmx.de
Wed Nov 5 01:39:03 CET 2008


clemens fischer schrieb am 2008-11-04:

> hi,
> 
> I just ran a test (on my freebsd box) to test the "even-ness" of the
> distribution of Message-IDs in var/spool/news/message.id/* :

Hi Clemens,

you've confirmed a suspected (or known, to some) fact that the hash
function is rather cheap and not very good at actually shuffling bits.

> The result shows that the hash calculating the location of a particular
> message-id might be considered good enough, but propably far from
> optimal.  The code is this:

Definitely. This is known to be suboptimal, but feel free to change for
leafnode-2. If you want to give it a show while waiting for the next
precinct or state results, change the code, install and run texpire -r
(-r for repair), it is supposed to fix wrong hashes (but best to keep a
backup of /var/spool/news anyways).

If it results in a better distribution (and perhaps if it maintains that
property with r = r % 1000 rather than r = r % 999 - or would a prime be
better here?), send me a patch, I'm open to change this.

I'm not sure though how many places still hardcode the actual layout of
the name (i. e. that there are three digits), so best to go with the
current scheme and play with the hashes.

> There's a simple hash function by Weinberger, see Aho, Sethi, Ullman
> book on compilers from 1986 or search for hashpjw.  It basically goes

What's the license on that, or did you derive the code yourself?

On the other hand, there is already an RC4-based PRNG, I wonder if that
can be recycled for hashing and might even yield better results.

> like this:
> 
> unsigned long int
> string_hash(char *s)
> {
...

-- 
Matthias Andree



More information about the leafnode-list mailing list