[leafnode-list] better hash for message.id/ ?
clemens fischer
ino-news at spotteswoode.dnsalias.org
Tue Nov 4 23:41:39 CET 2008
hi,
I just ran a test (on my freebsd box) to test the "even-ness" of the
distribution of Message-IDs in var/spool/news/message.id/* :
# find /news/message.id/* -type d -print |
xargs -I \% -R 1 -n 1 sh -c "ls % |wc -l" |
sed 1d |
ministat -s -w74
result:
x <stdin>
+--------------------------------------------------------------------------+
| x |
| x x x |
| x x x x |
| x x x x |
| x x x x x |
| x x x x x |
| x xx x x x |
| x x xx x x x |
| x x xx x x x |
| x x xx x x x |
| x x x xx x xx x |
| x x xx xx x x xx x |
| x x xx xx x x xx x |
| x x xx xx x x xx x |
| x xx xx xx xx xx xx xx xx |
| x x xx xx xx xx xx xx xx xx xx |
| x x xx xx xx xx xx xx xx xx xx |
| x x xx xx xx xxx xx xx xx xx xx |
| x x xx xx xx xxx xx xx xx xx xx |
| x x xx xx xx xxx xx xx xx xx xx |
| x x xx xx xx xxx xx xx xx xx xx |
| x x xx xxxxx xxx xx xx xx xx xx |
| x x xx xxxxx xxx xx xx xx xx xx x |
| x x xx xxxxx xxx xx xx xx xx xx x |
| x xxxxxxxxxx xxx xx xx xx xx xx x x |
| x xxxxxxxxxxx xxxxxx xxxxx xx xx x x |
| xx xxxxxxxxxxxxxxxxxx xxxxx xx xx xxx x |
| xx xxxxxxxxxxxxxxxxxx xxxxx xx xx xxx x |
| xx xxxxxxxxxxxxxxxxxx xxxxx xx xxxxxx xx |
| xx xxxxxxxxxxxxxxxxxx xxxxxxxx xxxxxx xx |
| xx xxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxx xxxx |
| xx xxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxx |
| xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx x |
| x x xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xx x x |
| xxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xx x x |
| x xxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx x x |
| x xxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx x xx |
| x xxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx x x xx |
| xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xx x xxx |
|x xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxx xxx x x|
| |___________MA____________| |
+--------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 999 47 170 97 99.132132 22.308067
The result shows that the hash calculating the location of a particular
message-id might be considered good enough, but propably far from
optimal. The code is this:
unsigned int
msgid_hash(const char *name)
{
int i = 0;
unsigned int r = 0;
int stop = 0;
do {
if (name[i] == '>')
++stop;
r += (int)((unsigned char)name[i]);
r += ++i;
} while (!stop && name[i]);
r = r % 999 + 1;
return r;
}
Freebsd has this simple BSD-licensed function for hashing strings in
/usr/include/sys/hash.h:
/* Convenience */
#ifndef HASHINIT
#define HASHINIT 5381
#define HASHSTEP(x,c) (((x << 5) + x) + (c))
#endif
...
/*
* Return a 32-bit hash of the given string.
*/
static __inline uint32_t
hash32_str(const void *buf, uint32_t hash)
{
const unsigned char *p = buf;
while (*p)
hash = HASHSTEP(hash, *p++);
return hash;
}
There's a simple hash function by Weinberger, see Aho, Sethi, Ullman
book on compilers from 1986 or search for hashpjw. It basically goes
like this:
unsigned long int
string_hash(char *s)
{
register char *p;
register unsigned long int h = 0, g;
ASSERT((s != NULL) && (strlen(s) > 0));
p = s;
while (*p)
{
h = (h<<4) + *p++;
g = h & 0xf0000000;
if (g != 0)
{
h ^= (g>>24);
h ^= g;
}
}
return h;
}
I should note that "ministat" is in freebsd base, but it may not be in
eg. linux distributions.
Nothing much to do today, I'm just waiting for the outcome of the US
elections ...
-c
More information about the leafnode-list
mailing list