$Id: CHANGELOG,v 1.39 2005/03/02 18:29:12 jonz Exp $

Version 3.2.8
-------------

[20050302.0730] jonz: fixed bug in TOE mode with automatic whitelisting

fixed a bug causing the "spam hit" in a false positive's whitelist address to
never be reversed

Version 3.2.7
-------------

[20050218.0800] jonz: signature to write to all text segments

reverted back to signature writing to all text segments of an email, and not
just the top level parts.

[20050128.0245] jonz: fixed segfault on post-signature failure

fixed a segfault that only occurs when dspam_process() fails after loading a
signature

Version 3.2.6
-------------

[20050119.0245] jonz: set default sedation level to 0

set default sedation level to 0, must be specified (somewhere) to instantiate

Version 3.2.5
-------------

[20050119.0230] jonz: fix for statisticalSedation ignore

statisticalSedation preference was ignored

Version 3.2.4
-------------

[20041229.1925] jonz: fix for broken boundary rfc

added fix for intentionally broken mime boundaries

[20041203.0800] jonz: performance fixes for pgsql_drv

minor performance fixed for pgsql_drv that may have a big effect on some
implementations. you should also consider creating the (unnecessary) index
below to precent the pgsql query builder from getting confused:

CREATE INDEX id_token_data_04 ON dspam_token_data(uid);

[20041203.0800] jonz: applied patch to fix build fail when CFLAGS defined

fixed a bug causing tools to fail to build when CFLAGS was specified

[20041203.0745] jonz: fixed addition of spurious colons after delimiter

fixed bug causing a colon to be added to lines after -- delimiters that
were not actual boundary delimiters. this also caused certain encoded portions
of the body to not be decoded, giving the appearance of equal signs (=) in
message bodies.

Version 3.2.3
-------------

[20041125.0925] jonz: rewrote boundary extraction in decode.c

rewrote boundary extraction in decode.c to fix a bug where messages could
get mangled if boundary was specified without quotes, but other tags used
quotes

[20041123.1830] jonz: fixed multipart blocks with no content-type

fixed bug causing the DSPAM signature to NOT be written to multipart
blocks without a content-type (broken RFC)

[20041123.0800] jonz: fixed bug in _ds_get_spamrecord broken on mysql 4.1

changed token = '' to token in('') in _ds_get_spamrecord, to fix bug in
mysql 4.1 with respect to numeric fields and quoted conditionals. in('') seems
to work without problem.

[20041117.0800] jonz: fixed critical bug in Bayesian Noise Reduction

fixed a critical bug in Bayesian Noise Reduction causing the algorithm to
never instantiate

Version 3.2.2
-------------

[20041114.2245] jonz: fixed optOut preferences option

fixed a bug causing optOut preference to be ignored

[20041112.0800] jonz: fixed source address tracking bug w/TOE

fixed a bug causing source address tracking to fail when TOE used

[20041112.0800] jonz: fixed LocalMX bug

fixed a bug causing LocalMX to be ignored in dspam.conf

[20041110.0800] jonz: set permissions on dspam.conf to 640

permissions on dspam.conf were defaulted to 750, changed to 640

[20041101.0700] jonz: changed loose signature matching

changed loose signature to X-DSPAM-Signature from the ever useless DSPAM: to
allow the use of signature headers in forwarded attachments.

[20041109.0800] jonz: fixed source address tracking

fixed source address tracking by removing an old #ifdef that never got 
defined in 3.2. also changed 'ham' to 'nonspam' in dspam.conf.

[20041109.0800] jonz: adjusted chi-square cutoff

changed chi-square cutoff from 0.5000 to 0.5010 to avoid erroneous
classifications when there is no data

[20041108.0800] jonz: fixed multiline token bug

fixed a bug where tokens on a multiline header would be ignored past the first
line

[20041103.0745] jonz: fixed segfault on signature scan

fixed a bug causing segfault during scanning of some messages for a signature

[20041103.0745] jonz: fixed signature encoding bug in sqlite_drv

fixed a bug causing signature inserts to fail in sqlite.

Version 3.2.1
-------------

[20041029.0800] jonz: added needed c/r at end of pgp messages

added needed c/r at end of pgp messages

[20041029.0800] jonz: fixed invalid read of free()'d memory

fixed invalid read of free()'d memory caused when parsing multi-line 
header tokens

[20041029.0800] jonz: fixed pragma bug in sqlite_drv

fixed a bug in sqlite_drv causing pragma's in dspam.conf to be ignored

[20041028.0700] jonz: support for mysql 4.1's ON DUPLICATE KEY

added support for mysql 4.1's ON DUPLICATE KEY functionality, so that compiling
with 4.1+ will perform a single insert query without causing duplicate key
failures

[20041025.0600] jonz: memory leaks in dspam_clean

fixed minor memory leaks in dspam_clean

[20041025.0600] jonz: sqlite fixes

fixes to sqlite driver; started using sqlite_[encode|decode]_binary and fixed
calls to sqlite_finalize causing segfaults.

[20041024.2000] jonz: added patch for parsing signature from body

added a patch to parse leading whitespace from signature keys found in
messages with malformed signature lines

[20041024.1845] jonz: added patch for pgsql and PQfreemem()

added patch to search for PQfreemem() and use free() as an alternative

[20041024.0650] jonz: fixed bug with mysql_drv and duplicate key entries

fixed a bug caused by performing multiple inserts simultaneously on the
database

[20041023.0923] jonz: fixed memory malpractice in pgsql_drv

fixed some bugs in pgsql_drv with memory mishandling

[20041021.0730] jonz: fixed attachment for PGP signed messages

fixed a bug in the dspam.txt attachment added to PGP signed messages, causing
the attachment to have an invalid boundary delimiter

[20041021.0700] jonz: put --with-delivery-agent back, minor config fixes

put --with-delivery-agent flag back (formerly, would try and just autodetect)
and made some fixes to escape comma's.

[20041020.1400] jonz: changed default logdir to dspam_home/log

default logdir has been changed to dspam_home/log; this prevents confusion
around permissions on /var/log

[20041020.1400] jonz: applied patch for man page install

applied patch adding $(DESTDIR) to man page install

[20041020.1200] jonz: applied patch for URL parsing bug

applied a patch causing an invalid memory read when an email ends with
http://

Version 3.2.0
-------------

[20041020.0100] jonz: fixed mysql performance bottleneck

fixed mysql performance bottleneck with inserts by using multi-row inserts
instead of hundreds of individual inserts

[20041019.2315] jonz: changes to mysql 4.1 purge script

changed IN() to left-join query for faster purging
rewrote all fields as 'not null' 

[20041019.2200] jonz: made all rows not null in mysql

made all rows in mysql scripts, conserves 1-2 bytes per row and speeds up
just a hair

[20041018.0800] jonz: added qmail/vpopmail instructions

added qmail/vpopmail instructions contributed by John Peacock 
<jpeacock@rowman.com>
 
[20041018.0800] jonz: split up MTA configuration into multiple README files

split up the MTA configuration section of the README into multiple files.

[20041018.0730] jonz: fix for write of .stats files on notrain

.stats files shouldn't be getting written when in notrain mode

[20041018.0700] jonz: memory leak fixes for pgsql

many memory leak fixes for postgresql driver

[20041017.1725] jonz: added mysql4-initialization configure option

added an option to disable mysql client library initialization and
cleanup. this is only really useful if you're using libdspam with a third
party application that requires this (e.g. the application accesses
libmysqlclient itself, and therefore needs to manage startup and shutdown
of the library).

[20041016.1935] jonz: fixed massive number of memory leaks

fixed a massive number of memory leaks in libdspam and the agent and
incorrect memory management practices.

[20041015.1700] jonz: fixed sedation deactivation

fixed a bug causing statistical sedation to _not_ deactivate even when the
training buffer level was set to 0.

[20041015.0800] jonz: fixed dspam_admin segfaults on invalid syntax

fixed bugs in dspam_admin causing a segfault (instead of print of usage
information) when too few arguments for a function were specified

[20041015.0800] jonz: fixed preferences extensions in admin.cgi

fixed bugs in admin.cgi causing server errors when preferences extensions
was used.

[20041014.1130] jonz: bugfix for dspam_dump

applied bugfix for dspam_dump to username correctly when commandline options
are specified.
 
Version 3.2.pr1
---------------

[20041014.0800] jonz: added WITHOUT OIDS to all pgsql tables

turned off OIDS for all pgsql tables, speeding up table access significantly

[20041014.0000] jonz: added DSPAM_BIN to path in CGI

added DSPAM_BIN to the path in configure.pl; some CGIs weren't finding
the DSPAM binaries

[20041013.1800] jonz: added mysql 4.1 objects script, renamed .sql files

did some minor renaming of .sql files. added a mysql 4.1 object script which
uses bigint/unsigned instead of char(20) for tokens. put neural networking
in its own file.

[20041013.0900] jonz: added mysql/postgres purge scripts for TOE and TUM

added purge-pe.sql for mysql/postgres with preferences extensions.
additional logic skips certain purges for TOE and TUM-mode users.

[20041013.0830] jonz: consolidated error messages in language.h

consolidated error messages and other important output in language.h to
centralize most commonly used output, and to make translation easier.

[20041012.2330] jonz: dspam_clean fix for toe training mode

added patch to dspam_clean to skip certain unused token operations for users 
with toe training mode, since their last_hit value is never updated. left token
probability operation, as it will use the date the token was first hit which
is still useful.

[20041012.0300] jonz: removed !DSPAM tag from X-DSPAM-Signature header

when using signatureLocation=headers, the !DSPAM tag is no longer written to
the header, just the signature. backward-compatible.

[20041012.0300] jonz: changed location of dspam_home and dspam.conf

dspam.conf now defaulted to sysconfdir (default: /usr/local/etc)

dspam home now defaulted to prefix/var/dspam (default /usr/local/var/dspam)

can still override dspam home using --with-dspam-home

[20041011.0300] jonz: added --signature= functionality

for commandline signature correction where the admin would prefer to just
specify the signature on the commandline, --signature=[signature] can be
used. only the signature itself should be provided, and not the !DSPAM tag.

[20041009.1830] jonz: bugfixes for inline decoding

fixed a bug which caused a segfault on malformed inline encoding blocks

added better support for inline encoding; now encodes all blocks in a header
and not just the first block.

[20041009.1345] jonz: fix for sqlite permissions

added fix for sqlite permissions to create database as 0660

[20041009.1100] jonz: added storage profile support

Implemented this from my blog:

5. Distributed database configurations. I'd like to add a commandline
   option or environment variable to set a storage "profile". This
   profile would refer to a MySQL or PgSQL server config in dspam.conf.
   For example:
                                                                                
     MySQLServer.Sun420R 10.0.0.5
     MySQLPort.Sun420R   3306
     ...
                                                                                
     MySQLServer.DECAlpha 10.0.0.6
     MySQLPort.DECAlpha   3306
     ...
                                                                                
     Providing --profile=DECAlpha on the commandline would cause DSPAM to
     use that particular storage profile. This is especially useful in
     distributed environments where a user might be mapped to a particular
     server.

[20041008.0420] jonz: added new logo

added new logo for cgi

[20041008.0420] jonz: added index to dspam_signature_data for created_on dates

added index and updated purge.sql to use for dspam_signature_data, which
greatly improves purge speed.

Version 3.2.rc2
---------------

[20041007.2300] jonz: made LARGE_SCALE and DOMAIN_SCALE autodetect in CGI

made filesystem scaling auto-detect based on dspam --version

[20041007.1950] jonz: added preferences verbose output

added verbose debugging of preference attributes and values loaded

[20041007.0400] jonz: added autogen support for freebsd

added support for freebsd to autogen script, so freebsd users can build
from cvs.

[20041006.2245] jonz: merged group cgi bugfix

fixed a bug in the cgi where merged groups would be added to the user totals
when displayed under statistics.

[20041006.0430] jonz: added algorithm and pvalue choice to dspam.conf

added support for selecting the combination algorithm(s) and pvalue
technique to dspam.conf. for third-party agent compatibility, configure
options have remained active, but the agent will override these if it finds
options in dspam.conf.

[20041005.1930] jonz: added debug option to dspam.conf

added DebugOpt option to specify which types of messages to route to debug

[20041005.0400] jonz: syslogging of more error messages

added the syslogging of more types of failures

[20041005.0355] jonz: libdspam debug: all calculations only on verbose

changed libdspam's debugging output so that all calculations are made only
when verbose is active; this will allow users to run with standard debug
enabled without using as many resources

[20041005.0102] jonz: applied pgsql patches for performance/bugfixes

applied pgsql patch submitted by Rustam Aliyev to improve performance by
an estimated 30% and fix some minor issues.

[20041003.0515] jonz: --version to return configure parms only when trusted

--version should print configuration parameters only when running as a
trusted user

[20041002.1730] jonz: added return codes on quarantine failure

the agent now returns a failure code if it was unable to quarantine an
incoming spam. 

[20041002.1700] jonz: added unlearning functionality

added unlearning functionality which can be triggered in one of two ways:

1. using --mode=unlearn on the commandline will unlearn the message passed in;
   useful for some cases of error and such. Use --source=error and set --class
   to the original classification that the message was LEARNED WITH (e.g.
   if it was originally classified as spam, set --class=spam to unlearn it
   as spam).

2. by setting OnFail to unlearn in dspam.conf, DSPAM will unlearn a message
   on delivery or quarantine failure. this will fix problems on some servers 
   where the message is requeued, and then reprocessed.

[20041002.1040] jonz: minor bugfix for inoculation

fixed a minor bug which may have caused message inoculations to be overly
inoculated (5 hits instead of 2).

[20041001.2145] jonz: algorithm cleanup

cleaned up algorithm definitions in configure:

1. --disable-traditional-bayesian is now --disable-graham-bayesian
2. --disable-alternative-bayesian is now --disable-burton-bayesian
3. configure will no longer allow you to enable chi-square without disabling
   both bayesian calculations; this is to avoid rainstorms of false positives
   and accidental configurations by users who don't realize you need to
   disable one to enable the other

Version 3.2.rc1
---------------

[20041001.0035] jonz: signatureLocation "headers" preserves encoding

when signatureLocation is set to "headers", the message is treated as if
signed and the original encoded body is preserved.

[20041001.0030] jonz: a few features removed/changed

a few features have been removed from the agent and/or libdspam to improve
functionality and/or to restore libdspam's function as a text classifier and
not an encoding/decoding engine. some functionality has simply been "moved"
into the agent and out of libdspam. changes are very minor and shouldn't
affect a majority of third party applications or many end-users.

1. dropped "attachment" signatureLocation

signatureLocation = 'attachment' has been officially dropped. I realize one or
two people were using it, but the amount of black magic that had to be
used to maintain this function were just too time consuming, and nobody
liked having paper clips on every message. 

2. dropped DSF_COPYBACK; libdspam no longer permanently decodes anything

copyback feature to copy back decoded message no longer used by DSPAM agent,
not very useful for any other applications. applications looking to decode
should consider either self-actualizing the message (as the dspam agent does)
or using a different approach to decoding.

3. libdspam to treat all messages as "signed"

libdspam now preserves the original message body and transfer-encoding,
treating all messages as signed. the agent is the piece responsible now for
modifying the message and appending signatures.

[20040930.2215] jonz: added parse-to-headers to dspam.conf

setting "ParseToHeaders on" in dspam.conf is now used to parse the To: 
line for extracting a username when forwarding spam/fp's to catchall
domains (see README)

[20040930.1900] jonz: changes to homedir option

1. --enable-homedir-dotfiles is now --enable-homedir
2. all .nodspam and .dspam files are now .optout and .optin, respectively
3. when --enable-homedir is used at configure time, not only are opt-in/out
   files looked for in the user's home directory, but all data files are stored
   in ~/.dspam including the .inoc file previously stored in ~.

NOTE: This option requires dspam to run as setuid root (automatically
      configured) and is incompatible with the DSPAM CGI (can't read mailboxes).
      If you require users to be able to opt themselves in/out and use the CGI,
      use the CGI's opt-in/out preference or configure a small tool to manage
      them from DSPAM Home.
 
[20040930.1845] jonz: added TrackSources attribute

moved source address tracking into dspam.conf

[20040930.1830] jonz: added Opt attribute

Opt can be set to in or out in dspam.conf to specify whether the system is 
opt-in or opt-out.

[20040930.1815] jonz: added TrainPristine attribute

TrainPristine replaces --enable-webmail and is used to put the DSPAM agent in
a training mode where it assumes the original message is provided for 
retraining. This ceases the writing of any signatures, and is ideal for
webmail or imap systems where the original message is preserved on the server
and can be used to retrain.

[20040930.0129] jonz: implemented thread-safe functionality

implemented thread-safe functionality in libdspam and two storage drivers
(mysql_drv and pgsql_drv). each thread will require its own context, however
if you check out the libdspam man page, you'll see it's possible to set
up multiple contexts with the same database handle. 

[20040929.0710] jonz: added libdspam man page

added libdspam man page, API reference. symlinked to:
dspam_init, dspam_create, dspam_attach, dspam_addattribute, dspam_process,
dspam_getsource, dspam_destroy

[20040929.0700] jonz: fixed bugs in preferences extension/dspam.cgi

applied fixed submitted by Marty Pauley to fix bugs with dspam.cgi's handling
of preferences extension calls

[20040928.1900] jonz: added new API functions

added new API functions to support the libdspam attribute API. dspam_init has
remained in the code for backward compatibility with other applications,
however a new set of create/attach functions have been added with the
attribute API so that storage attributes (such as server information) can be
set prior to connecting to storage. see the updated example.c's example 4 for
a more thorough explanation.

to retain backward-compatibility, contexts instantiated using dspam_init
will revert to their legacy behavior of looking for a [driver].data file in
the dspam home. dspam_init() has been slightly tweaked, however, to require
the path to the dspam home as an argument. DSPAM_HOME can be passed in by
legacy applications that already have it defined.

This functionality also allows multithreaded applications (which must have
a separate context) to share a single database handle.

[20040928.0501] jonz: added IgnoreHeader option to dspam.conf

added IgnoreHeader option to dspam.conf, allowing specific headers from other
virus tools/spam filters on the network to be ignored.

[20040928.0210] jonz: made dspam.conf operational

dspam.conf now operational; will copy at install time into prefix/etc if a
copy does not already exist. 

NOTE: See the UPGRADING file for a full explanation of changes and be sure
read before attempting an upgrade 

[20040928.0200] jonz: moved show/hide factors as a preference

showFactors is now a preference (and suitable as an option for each user or
globally); set to "on" to enable factors in the message headers. added to cgi. 

[20040925.0817] jonz: fixed empty spamSubject bug

fixed bug causing subject of spam to be truncated if spamSubject left blank

[20040923.0300] jonz: rating sort default quarantine view

made "rating sort" the default quarantine view in cgi. if a delete all fails,
will revert to a chronological sort so users can see the last spam to be
quarantined.

Version 3.2.beta-1
------------------

[20040922.0500] jonz: fixed toe/zero signature bug

fixed a bug created in 3.1.2 causing toe-mode training to write zeros for
signatures (causing toe users to cease all learning)

[20040922.0400] jonz: fixed mysql buffer overrun bug

fixed a bug in the mysql driver where escaping a large number of
characters caused an unexploitable overrun.

Version 3.1.2
-------------

[20040906.0939] jonz: implemented sparse binary polynomial hashing

implemented Bill Yerazunis' sparse binary polynomial hashing (tokenizer
method only). use --feature=sbph to generate an SBPH-based token set.

[20040901.0800] jonz: bugfix for --debug

fixed --debug so it doesn't get passed along to MTA

[20040830.0800] jonz: pgsql fixes

Applied Rustam Aliyev's patch to fix the following issued with pgsql_drv:

- Added support for Preferences Extensions.
- BUGFIX: 'length' field's type changed from 'smallint' to 'int' 
 'smallint' not enought for big signatures.
- All values passed to columns with 'smallint' type now are quoted. 
  This will enable casting and make indexes on these columns available. 
- Added new index on dspam_token_data (token) which helps speed up 
  some operations. 
- Number of fixes to keep memory cleaner.

[20040826.2100] jonz: fixed bugs in classify and inoculation

fixed a bug where noise reduction and chained tokens weren't applied to
user classification and message inoculation

[20040826.0600] jonz: tweaked mysql where clauses

tweaked mysql where clauses for better indexing

[20040825.0225] jonz: added --disable-factors option

added option to disable factors in message headers

Version 3.1.1
-------------

[20040819.0800] jonz: minor CGI template changes

minor changes to CGI templates
 
[20040819.0800] jonz: added X-DSPAM-Factors

added determining factors header to emails containing a list of tokens that
played a role in the decision. if multiple algorithms are defined, only one
is used. if the message is spam, the factor set from an algorithm returning
a spam result will be used. 

[20040818.1900] jonz: cast smallints in postgres

cast all smallint's in postgres, so indexes should be used now (major 
performance increase)

[20040818.1845] jonz: fixed memory leaks

fixed some miscellaneous memory leaks

[20040818.1845] jonz: added optIn / optOut preference

added optIn and optOut preference support; whichever one is used depends on
whether dspam is configured for opt-in or opt-out.

[20040811.0900] jonz: fixed totals bug with merged groups

fixed a small bug preventing totals from traveling < 0 when using merged
groups

Version 3.1.0
-------------

[20040724.2000] jonz: fixes to Bayesian Noise Reduction

made fixed to Bayesian Noise Reduction to fix bugs related to 3.1 Beta

[20040723.0630] jonz: added --deliver=summary option

added 'summary' delivery option which will deliver (to stdout) a summary
identical to the output of message classification:

X-DSPAM-Result: user; result="Innocent"; probability=0.0023; confidence=1.00

Obviously, should not be used with --stdout

Version 3.1.0-beta-2
--------------------

[20040721.0800] jonz: added single spam hit purge to purge.sql

added purge of tokens with single spam hit to purge.sql. adjusted purge times

[20040721.0800] jonz: place signature before /HTML tags

in spams without a </body> tag, signature are now placed before /html tags
in order to ensure they are passed on with some email clients (such as
outlook). this might explain users who receive the same spam over and over.

[20040720.2130] jonz: rewrites to bayesian noise reduction

rewrite of BNR algorithm with minor tweaks, code cleanup

[20040720.0800] jonz: applied patches to CGI

applied patches to CGI submitted by Craig Hockenberry to add configure.pl
functionality for configuring the CGIs.

[20040716.0800] jonz: removed 2500 message threshold for TOE

TOE-mode training now kicks in immediately after 100 learned innocent
messages, rather than waiting for 2500 messages. as a result, more initial 
errors are likely to occur (just as with any other filter implementing TOE) 
but final accuracy should be better.

[20040711.2300] jonz: fixed field names in dspam_2sql

updated field names in dspam_2sql to reflect present-day database field names.

Version 3.1.0.beta.1.1
----------------------

[20040711.2200] jonz: fixed --disable-trusted-user-security compile errors

fixed compiler errors when users disabled trusted user security

[20040711.2200] jonz removed debug output

removed a line of debug output causing problems with implementations using
stdout 

Version 3.1.0.beta.1
--------------------

[20040709.0700] jonz: fixed bug with subject encoding and spam tags

fixed a bug where spam tags would not be added to encoded subjects

[20040709.0700] jonz: added --debug commandline argument

if --enable-debug is specified, --debug can be passed on the commandline to
activate debugging. alternatively, dropping a .debug file in DSPAM_HOME or
user.debug file in the user's DSPAM_HOME data directory will still work.

[20040709.0700] jonz: fixed error bug with snprintf

fixed a bug in error reporting where not using vsnprintf as required
caused crashing on some systems.

[20040709.0700] jonz: added header support for automatic whitelisting

instead of X-DSPAM-Probability: -2 to identify automatic whitelisted emails,
a header of X-DSPAM-Result: Whitelisted will be used, and the original 
probability (even if guilty) will be provided in each message.

[20040707.0730] jonz: added dynamic noise reduction extension support

added support for dynamic noise reduction extension; designed to track SNR
in emails for each user to dynamically determine noise thresholds and perform
calibration. extensions supported in libdspam, but is still experimental and
only used for tracking noise margins at the moment.

[20040707.0700] jonz: added whitelistThreshold preference

the whitelistThreshold preference will set the threshold for innocent hits
before automatically whitelisting a recipient. the default value is 10. do
not set this value too low!

[20040707.0500] jonz: added NOTRAIN preference for trainingMode

added NOTRAIN preference for trainingMode, which will result in messages being
processed but not trained.

[20040707.0408] jonz: signature location now a preference

signature location (headers, message, attachment) now moved to 
signatureLocation preference and added to CGI. configure-time arguments
will set a default preference if user hasn't overridden.

[20040706.2000] jonz: applied win32 patches

applied patch portion of win32 build supplement; win32/README updated. 
visual c++ project updated. initial testing shows all systems go =) 

[20040706.0800] jonz: added ignoreGroups preference

ignoreGroups, when set to 'on', will ignore any group memberships the user
should belong to (including system-wide). useful to allow some users to remove
themselves from any memberships. 

[20040705.2000] jonz: utilities to require trusted user permissions

utilities modified to require the caller be a trusted user. this is normally
done with groups, but as an extra security measure is also done with trusted
users.

[20040705.2000] jonz: rewrote preferences, added preferences extension support

preference functions entirely rewritten. added preferences extension support to
dspam, added first extension to mysql_drv, and added preference administration 
to dspam_admin.

[20040705.0800] jonz: added sort option to cgi quarantine 

added ability to sort by rating or date to cgi's quarantine

[20040630.0800] jonz: added preliminary win32 support files

added Vadim Zeitlin's preliminary win32 files into win32/ directory 

[20040630.0800] jonz: added transactional blocks to postgres driver

applied rustam's patch to add transactional blocks to pgsql_drv for
performance increase

[20040629.1945] jonz: untrusted user error to report username

untrusted user error (specifying --user) should report active username

[20040629.0800] jonz: fixed domain scale in dspam.cgi

domain scale pathname was missing /data/ in dspam.cgi

[20040629.0800] jonz: fixed segfault on empty body

fixed a bug causing libdspam to segfault with some email having an empty body.

[20040628.1945] jonz: added removal option for merged groups

added removal option for merged groups, by specifying -username, username is
removed from the group. This is useful if you want system-wide merged groups
but have a few users who want to unsubscribe

[20040628.0700] jonz: fixed bug in spam-subject

fixed a bug in spam-subject causing:
  1. the last character of the subject to be truncated
  2. spam tags to be repeated for each local recipient

[20040627.1330] jonz: added sql-formatted output support to dspam_dump

added support for sql-formatted output in dspam_dump using the -d [driver]
command. only driver supported is sqlite_drv. use dspam_2sql for all other
drivers (dspam_dump dumps one user at a time, so is only useful for sqlite
at the moment).

[20040625.2300] jonz: rewrote locking in bdb drivers

rewrote locking in bdb drivers to use fcntl locking instead of db env
locking. kernel-level locking works over nfs and automatically removes
stale locks if a process should crash or the system fail.

[20040625.2200] jonz: fixed a locking bug with fcntllocking/quarantine

fixed a quarantine locking bug where fcntl locking was not waiting for a lock,
but returning a failure immediately if already locked

[20040625.0800] jonz: added configure arguments to --version output

added a list of arguments DSPAM was configured with to --version output

[20040624.0425] jonz: applied CGI facelife 

applied CGI facelift submitted by Craig Hockenberry <craig@iconfactory.com>

[20040623.0700] jonz: bugfix for encoded multiline header mangling

fixed a bug that caused encoded, multiline headers to lose any lines of text
after the first.

[20040621.2135] jonz: made sqlite_drv default storage driver

made sqlite_drv the default storage driver 

[20040621.2135] jonz: added SQLite storage driver
                                                                                
added SQLite storage driver. see tools.sqlite_drv/README for more information

[20040621.0245] jonz: committed minor patch for Solaris builds

another patch for declaring u_int32_t's on Solaris

[20040617.0220] jonz: fixed configure help text for --enable-webmail

fixed configure help text for --enable-webmail, which was mangled

[20040617.0211] jonz: fixed type-o in admin.cgi for $CONFIG{'LARGE_SCALE'}

fixed a type-o in admin.cgi where $CONFIG{'LARGE_SCLAE'} = 0;

Version 3.0.0
-------------

[20040614.0700] jonz: fixed 14-day user graphs

fixed a bug causing the 14-day user graphs to appear empty

[20040612.0018] jonz: oracle storage driver fixes

made several bugfixes to oracle storage driver
added --with-oracle-version[=10] configure flag for linking to 10g libraries

[20040609.0205] jonz: fixed a bug in --enable-signature-attachments

fixed two bugs using --enable-signature attachments; 1 compiler error and 1
segfault (uninitialized value)

[20040608.0715] jonz: fixed compile bug with --enable-webmail

fixed compile errors resulting from --enable-webmail

[20040607.1800] jonz: replaced quarantine locking with fcntl locking

replaced quarantine .lock'ing with fcntl locking and also applied it to
locking .log files. fcntl should work over NFS.

[20040607.0730] jonz: fixed rare segfault (strlen on NULL)

fixed a rare segfault in decode.c

[20040607.0730] jonz: minor aesthetic changes to cgi

minor aesthetic changes to cgi

[20040606.1445] jonz: added training left option to dspam_stats -H

modified dspam_stats to display # of training messages left when using -H 
command

[20040606.1441] jonz: fixed bug in training threshold

fixed a bug in the training threshold, which miscalculated the mail left to
train.

[20040605.1521] jonz: added statistical sedation to cgi

added level of sensitivity-during-training to cgi preferences

[20040605.1450] jonz: added ability to edit user preferences from admin suite

added the ability to edit user preferences (and the default preferences)
from the admin suite.

[20040605.1100] jonz: fixed a bug with user processing flag

fixed a bug where some parameters may be added as users instead of parameters.
this was particularly the case if no mailer flags prepended %u.

[20040604.0525] jonz: fixed blank dspam signature on reclassification

fixed a problem where reclassified messages would receive:

X-DSPAM-Signature: !DSPAM!

fixed this by NOT stripping the old X-DSPAM-Signature header, since a new one
is not created upon reclassification

[20040604.0525] jonz: fixed untrusted.mailer_args

fixed a bug where the last argument of untrusted.mailer_args was ignored.

Version 3.0.0.rc2
-----------------

[20040603.2215] jonz: added user-logging option

added --disable-user-logging option to disable user logging

[20040603.0500] jonz: auto-whitelisting now works with toe-mode training

added code to cause automatic whitelisting to function with toe-mode training

[20040602.0030] jonz: added administration suite cgi

added administration suite cgi

[20040602.0030] jonz: added system logging of execution time

added system logging of execution time

[20040602.0025] jonz: fixed spam subject

fixed spam subject headings to support variable length titles

[20040601.2230] jonz: added system logging

added system logging to DSPAM_HOME/system.log for future sysadmin interface

[20040601.1822] jonz: removed mysql delay_key_write

removed mysql's delay_key_write feature from the sql scripts, because of a
bug in mysql that leads to database corruption when using it.

[20040601.0330] jonz: added To: header parsing

added --enable-parse-to-header, which will parse spam-username and fp-username
from the To: header of a message to determine the username. This can be
used in lieu of using spam/fp aliases by creating a wildcard subdomain
(such as spam.yourdomain.com) and piping all email into dspam without a
--user flag, for example:

wildcard: "|/usr/local/bin/dspam --mode=toe --class=spam --source=error"

[20040531.2245] jonz: added pkgconfig files

added installation of pkgconfig files submitted by Ronald Hummelink
<ronald@hummelink.xs4all.nl>

[20040531.2120] jonz: added --enable-broken-return-codes

added --enable-broken-return-codes configure option which causes DSPAM to 
return an exit code of 99 if the message being processed is believed to be
spam, 0 if not, and any other code to suggest an error has occured. this is
useful for some MTAs such as qmail.

[20040531.2100] jonz: fixed error.h overwrite bug

fixed a bug where libc's error.h would be overwritten if --prefix=/usr. DSPAM
headers are now written to includedir/dspam.

[20040531.1915] jonz: added man pages

added man pages to distribution 

[20040531.0830] jonz: fixed header signature stripping

signatures no longer stripped if --enable-signature-headers is used; to allow
for re-re-training

[20040531.0830] jonz: fixed cgi graphs falling below zero

minor fix to cgi graphs preventing data points from falling below zero

Version 3.0.0.rc1
-----------------

[20040528.0100] jonz: added logging support

added support for message logging (enabled by default). logs all classification 
calls to $DSPAM_HOME/data/user/user.log. disable with --disable-logging.

[20040527.2200] jonz: added new CGI

added new CGI

[20040527.0730] jonz: added support for profiling

added support for profiling using gmon output. this allows developers to use
profiling tools such as gprof to analyze the performance of the software.

[20040527.0730] jonz: applied patch submitted by Mark Femal

applied a patch submitted by Mark Femal <mark@beantree.com> which:
1. Includes select *.h files and incorporates them into the installation
2. Fixes some issues in compiling with Sun's Pro C compiler
3. Makes some minor changes to header files to avoid conflicts

Version 3.0.0.beta.3.1
----------------------

[20040525.0830] jonz: fixed compiler error on verbose debug

fixed compiler errors when verbose debug was enabled

Version 3.0.0.beta.3
--------------------

[20040524.2024] jonz: bugfix for null bodies

applied bugfix causing a segfault when the message body of some parts was
null. rare occurrence.

[20040524.1903] jonz: implemented Robinson's technique for combining p-values

added support for using Robinson's technique for combining p-values, as
described at http://www.linuxjournal.com/article.php?sid=6467. This technique
is presently used for chi-square calculations, but using 
--enable-robinson-pvalues will use this technique for *all* calculations in 
place of Graham's approach. Appears to provide slightly better results
(on the order of 1 message per thousand).

[20040524.0529] jonz: implemented *real* chi-square

implement Fisher-Robinson's Inverse Chi-Square algorithm...the real stuff.
use --enable-chi-square to use.

[20040522.2350] jonz: renamed chi-square to robinson's naive bayesian

renamed chi-square because it really isn't chi-square, but robinson's first
algorithm for naive bayesian combination. use --enable-robinson to use.

[20040520.0800] jonz: bugfix for attachments

fixed a bug that caused message headers in attachment sections to be ignored

Version 3.0.0.beta.2.1
----------------------

[20040518.0630] jonz: bugfix: seg faults on rare occasions

fixed a strlen(NULL) bug fixing an occasional segfault

[20040514.1130] jonz: applied dspam_genaliases patch

applied dspam_genaliases patch supplied by Scott Moorhouse 
<smoorhouse@ae-solutions.com> which adds the following functionality:

--exclude NAME     Do not generate an alias for username / usernames.
--excludeuid NUM   Do not generate an alias for UID / UIDS.
--minuid NUM       Minimum UID for which to generate an alias.
--maxuid NUM       Maximum UID for which to generate an alias.

It also uses setpwent/getpwent to get passwd information instead
of /etc/passwd. This allows the tool to be used with any default system
authentication.

[20040514.0830] jonz: modified mode=notrain to ignore signature

when setting mode=notrain, the signature is NOT stored, and not appended to
an email.

Version 3.0.0.beta.2
--------------------

[20040513.1845] jonz: updated configure.ac

updated configure.ac to work with newer versions of autoconf (with warnings)

[20040513.0157] jonz: segfault patch for sql drivers

applied patch to prevent segfaults in mysql and pgsql drivers under certain
conditions

[20040512.0830] jonz: user directories moved to $DSPAM_HOME/data

user directories have been moved to $DSPAM_HOME/data. it will be necessary to
move all user directories into this folder when upgrading

[20040512.0830] jonz: default $DSPAM_HOME changed

default dspam home has been changed from /etc/mail/dspam to /var/dspam. use
--with-dspam-home to change this.

[20040512.0830] jonz: patch for sql drivers

applied patch for mysql and pgsql drivers to prevent errors in sql due to 
lack of commas

Version 3.0.0.beta.1.2
----------------------

[20040504.1835] jonz: bugfix for signed message signature

corrected a bug where the boundary for a signed message would be missing
a carriage return.

[20040504.0548] jonz: bugfix for token storage bug

fixed a token storage bug, where some tokens would not be stored if they
were preceeded by a token that was found in the database

[20040503.0830] jonz: bugfix for corpus spam delivery

fixed a bug where corpusfed messages would be delivered if a quarantine agent
was specified at configure time.

[20040501.1052] jonz: added spam-subject feature

added a spam-subject feature which can be activated with --enable-spam-subject.
when enabled, DSPAM will prepend [SPAM] to the subject headers of all messages
suspected to be spam.

Version 3.0.0.beta.1.1
----------------------

[20040501.0630] jonz: fixed critical problems with pgsql_drv driver

fixed a critical problem with the postgres storage driver to correct sql errors
in processing

Version 3.0.0.beta.1
--------------------

[20040430.0800] jonz: fix for sql driver subtractions

implemented GREATEST(0, [Argument] ) functions for subtractions, which fixes a
problem in which error corrections are not made to tokens where there are
zero hits for the classification being subtracted from.  should also
definitively prevent negative values in hit totals.

[20040430.0800] jonz: bugfix: corpus feeding invoked test-conditional training

fixed a bug where corpus feeding would invoke test-conditional training.
 
[20040430.0800] jonz: test-conditional training to subtract only once

test-conditional training modified to subtract from misclassified corpus only
once, and corpus feed for all other iterations

[20040430.0800] jonz: fixed bug in sql-drivers/test-conditional training

fixed a bug in the sql drivers where test condition training would make
exponential changes instead of incremental.  this was due to not resetting
the control token on every call to _ds_getall_spamrecords.
 
[20040430.0745] jonz: fixed bug in web stats

fixed bug where merged group web stats wouldn't get written

[20040430.0730] jonz: fixed bug in TOE totals

fixed a bug where spam/innocent classified wasn't updated when TOE was used

[20040427.0433] jonz: fixed bug in mysql and pgsql drivers

fixed a bug in mysql and pgsql drivers where dspam_merge was functioning
incorrectly, due to the token count on record insertion being set to 1 or 0,
and not the actual token value.

[20040427.0155] jonz: merged groups shouldn't merge with themselves

corrected a situation where the actual user in a merged group could be merged
with themselves, if they were the target user.

[20040427.0119] jonz: applied bdb patch for solaris

applied a patch to building on Solaris 9 with BDB drivers

[20040425.0757] jonz: updated pgsql drivers

applied pgsql_drv storage driver updates submitted by Rustam Aliyev

Version 3.0.0.alpha.6
---------------------

[20040424.2235] jonz: fixed header tokenization

fixed header tokenization from previous alpha; was suddenly leaving out
heading from token names.

[20040424.1427] jonz: added merged groups

merged groups are similar to global groups, only instead of the global user
being used in lieu of per-user statistics, the global user in a merged group
is merged with the user's own training data.  this allows immediate correction
to take place and no training loop.

NOTE: merged groups are storage driver dependent.  presently they have only
been implemented for the mysql driver.

[20040422.1900] jonz: messages with empty bodies should still be processed

fixed bug where messages with empty bodies failed into delivery 

[20040422.1829] jonz: added encoding strip patch

added patch to fix the stripping of the content-transfer-encoding

[20040421.1809] jonz: added training mode 'notrain'

added training mode 'notrain' which will process the message, but not train any
user data; this is ideal for implementations where a global dictionary is
used, but the administrator doesn't want to accumulate training data for each
user.

[20040421.0310] jonz: fixed TOE-mode totals updating

fixed bug where TOE-mode would update totals when it shouldn't

Version 3.0.0.alpha.5
---------------------

[20040421.0100] jonz: fixed totaling problems with classification groups

fixed totaling problems with global users and classification groups, where
spams wouldn't get counted, and some innocents

[20040421.0100] jonz: fix for dspam_stats

fix for dspam_stats, identifying individual users

[20040420.0734] jonz: fix for builds on Solaris w/BDB

fixed compiler error when building on Solaris w/BDB drivers

[20040419.0758] jonz: fix for X-DSPAM-Result header problem with TOE

TOE resulted in the X-DSPAM-Result being send to stdout, which broke all
implementations of TOE where --stdout was used.  bug fixed.

[20040419.0700] jonz: added support for multipart/encrypted messages

added the same support for multipart/encrypted messags as is provided
for multipart/signed

[20040418.1840] jonz: changes to pgsql objects

changes to pgsql objects to fix performance issues

[20040417.1105] jonz: more global user tweaks

if the global user thinks the message was innocent, but the user thinks it was
spam, retrain the message as a false positive into the user's dictionary
automatically, but don't update FP totals (internal function)

[20040417.1050] jonz: implemented totals checking

implemented totals checking to insure no totals travel below 0

[20040417.1045] jonz: don't retrain some classification catches

patch added not to retrain some spams in a global user catch if the user's
own dictionary already learned it as spam

[20040417.1037] jonz: patch for non-user creation

patch made to sql-based drivers to avoid creating virtual users in cases where
a message isn't being directly processed (e.g. tools, error correction, etc.)

[20040417.2006] jonz: added human-readable patch to dspam_stats

added patch for human-readable format to dspam_stats, submitted by Alan
Shields

Version 3.0.0.alpha.4
---------------------

[20040416.0000] jonz: fix for global users to prevent FPs

applied bugfix for global users code where false positives were getting
generated because the user's dictionary wasn't completely ignored.  

[20040416.0000] jonz: applied dspam_corpus division by zero patch

applied div by zero patch for dspam_corpus submitted by Nick Burnett

[20040415.0010] jonz: added end-of-token truncated symbols

added support for end-of-token symbols, such as exclamation point.  slight
boost in accuracy in testing.

[20040414.0052] jonz: added abbreviated feature references

the first two letters of a feature can be used alternatively instead of the
whole feature name; for example --feature=ch,no,wh

[20040411.0100] jonz: added X-DSPAM-Confidence header

added X-DSPAM-Confidence header to all processed messages to identify the
confidence level of the decision made.

[20040410.0930] jonz: tum maturity level increased to 50 hits

train-until-mature level increased from 25 hits to 50; doesn't appear to work
well in classification groups.

[20040409.0201] jonz: added support for domain scale

added support for domain scale applying patches submitted by 
Patrick Tudor <ptudor@ptudor.net>

[20040409.0153] jonz: applied pgsql patches

applied more pgsql patches

[20040409.0129] jonz: fixed headers to preserve original encoding

headers are now delivered with original encodings

[20040407.2254] jonz: added mass false positive button to CGI

added a button to reverse multipe false positives by clicking on checkboxes.

[20040407.2248] jonz: fixed bug in classification groups

fixed a bug in classification groups, where a "classify catch" would cause
the DSPAM signature to be empty, and thus irreversible.

[20040407.0255] jonz: tweaks to postgres m4

tweaks to postgres m4 to test headers and library on configure

Version 3.0.0.alpha.3
---------------------

[20040406.0124] jonz: supress extra newline in message body

corrected message reassembly behavior by supressing newline characters at the
end of the message body.

[20040405.0524] jonz: added postgresql driver to project

added pgsql_drv (PostgreSQL) submitted by Rustam Aliyev <rustam@azernews.com>
to project, added to configure with its own set of configuration commands.
see tools.pgsql/README for more information.  Applied recent SQL fixes.
 
[20040405.0330] jonz: virtual users should not be created on reclassification

if a message is being submitted for reclassification, a virtual user should not
be created, but fail instead - e.g. spam could be getting sent to the alias,
and shouldn't create new uids.

[20040405.0233] jonz: fixed SQL-driver hits-below-zero bug

fixed a bug causing some tokens to drop below zero hits using the mysql
driver.

[20040405.0149] jonz: fixed BNR bug

fixed a bug caused by Bayesian Noise Reduction which caused some messages
never to get learned if the control token was filtered; or caused filtered
tokens never to be learned.

[20040403.1745] jonz: rewrite of libdspam API

rewrite of libdspam's API.  in short:

- Operating modes DSM_ADDSPAM and DSM_FALSEPOSITIVE dropped
- CTX->classification added: DSR_ISSPAM | DSR_ISINNOCENT | DSR_NONE
- CTX->source added: DSS_ERROR | DSS_INOCULATION | DSS_CORPUS | DSS_NONE

provides a much cleaner and less ambiguous interface

[20040403.1215] jonz: removed signature deletion

removed signature deletion from agent, so messages can be re-re-classified.
also prevents mysql errors.

[20040403.1125] jonz: added dotfile debugging support

--enable-debug and --enable-verbose-debug flags now require a .debug file
to be dropped in order to log debug messages, providing you with the ability
to dynamically activate/deactivate debug messages for some or all users.  A 
.debug file can either be dropped in DSPAM_HOME to activate debugging for all 
users, or a username.debug file can be dropped in DSPAM_HOME/userpath/ to 
activate debugging for a subset of users.  

[20040402.1839] jonz: added support for domain-name groups

added support for groups based on domain name

Version 3.0.0.alpha.2
---------------------

[20040402.0730] jonz: improved agent classification output

agent classification output improved to include username, result, probability,
and confidence level in MIME format for easy parsing

[20040402.0730] jonz: added broken MTA support

--enable-broken-mta
You should enable this if your MTA is broken and passes messages into DSPAM
with CTRL-M's (^M) in them.

[20040402.0730] jonz: added training loop buffering feature

Training loop buffering is the amount of statistical sedation performed to
water down statistics and avoid false positives during the user's training loop.
The training buffer sets the buffer sensitivity, and should be a number 
between 0 (no buffering whatsoever) to 10 (heavy buffering).  The default is 5,
half of what previous versions of DSPAM used.  To avoid dulling down 
statistics at all during the training loop, set this to 0.

The training buffer can be set using bf=N as a feature, where N is the level of
buffering (0-10).  For example:

--feature=chained,noise,tb=10

Causes the buffer level to be set to 10, the highest level of safety, whereas

--feature=chained,noise,tb=0

Removes all buffering constraints

[20040402.0723] jonz: fixed bug in dspam_dump

fixed a bug in dspam_dump causing unknown tokens to be displayed with 
uninitialized values

[20040402.0720] jonz: fixed bug in agent for signature dropping

when a signature can't be found, the message is dropped; unfortunately the
agent forgot to shut down the dspam context which caused BDB to lock up.
 
[20040402.0700] jonz: added switch for webmail

The webmail switch is designed for systems where the original message remains
server side and can therefore be presented in pristine format for retraining.

   --enable-webmail
   The webmail switch is designed for systems where the original message
   remains server side and can therefore be presented in pristine format for
   retraining.  This option will cause DSPAM to cease all writing of
   signatures and DSPAM headers to the message, and deliver the message in as
   pristine format as possible.  This mode REQUIRES that the original message
   in its pristine format (as of delivery) be presented for retraining, as in
   the case of webmail or other applications where the message is actually
   kept server-side during reading, and is preserved.  DO NOT use this switch
   unless the original message can be presented for retraining with the
   ORIGINAL HEADERS and NO MODIFICATIONS.
 
[20040401.2243] jonz: fix for signature headers

applied patch to fix multipart boundary bug when signature-headers is enabled

Version 3.0.0.alpha.1
---------------------

[20040401.1230] jonz: patches to corpus locking

made patches for corpus locking, to help prevent corruption with BDB drivers.  
DSPAM agent now drops a .corpuslock file upon processing a corpus which in 
turn tells the drivers not to run automatic recovery.  this should prevent 
corruption when an email comes in while you are corpus training with the BDB 
drivers.  this was not an issue with the SQL-based drivers.

[20040401.1230] jonz: deleted libdb4_purge, libdb3_purge

libdb4_purge and libdb3_purge have been obsoleted by the new rewritten 
dspam_clean tool

[20040401.0720] jonz: extended group line length to 10k

extended length of a single group line to 10k, from 1k

[20040401.0720] jonz: new dspam_clean functionality

dspam_clean has been rewritten to support the following different clean
operations:

1. Using the -s flag, dspam_clean will continue to perform stale signature
   purging.  If an age is specified, for example -s14, the age defined as the
   default will be overridden.  Specifying an age of 0 will delete all
   signatures for the users processed.

2. Using the -p flag, dspam_clean will delete all tokens from a user's database
   whose probability is between 0.35 and 0.65 (fairly neutral, useless tokens)
   that fall beyond the default age.  If an age is specified, for example
   -p30, the age defined as the default will be overridden.  It is a good
   idea to use this type of clean with an age of 0 on users after a lot of
   corpus training.  

3. Using the -u flag, dspam_clean will delete all unused tokens from a user's
   database.  There are four different types of unused tokens:

     - Tokens which have not been used for a long time
     - Tokens which have a total hit count below 5
     - Tokens which have only one spam hit
     - Tokens which have only one innocent hit

   Ages may be overridden by specifying a format such as -u30,15,10,10
   where each number represents the respective age.  Specifying an age of
   zero will delete all unused tokens in the category. 

Optionally, usernames may be specified to override the default behavior of
processing all users.

Examples:

Process all users on the system using all clean operations:
  dspam_clean -s -p15 -u90,30,15,15 

Delete all of user 'dick' and 'jane's signatures
  dspam_clean -s0 dick jane

Perform a post-corpus training clean on user 'spot'
  dspam_clean -p0 -u0,0,0,0

Perform nightly maintenance using all default values, for all users, with all
options enabled:
  dspam_clean -p -u -s

NOTE: You may wish to only run certain cleaning modes depending on the type of
storage driver you are using.  For example, the MySQL storage driver
includes a purge.sql script which performs signature and unused operations,
leaving only the probability operation as a useful operation.  If you are 
using a SQL-based storage driver, it is strongly recommended that you use
the maintenace scripts wherever possible.

[20040401.0720] jonz: added _ds_delall_spamrecords and _ds_del_spamrecord

added spamrecord deletion functionality to storage driver, increased version
to 5:0:0

[20040331.2000] jonz: applied some memory leak patches

applied some memory leak patches submitted by 
William Ahern <wahern@barracudanetworks.com>

[20040328.2200] jonz: renamed USERDIR to DSPAM_HOME

all references to USERDIR are now known as DSPAM_HOME, including the 
--with-dspam-home configure flag, and mode settings.

[20040328.2200] jonz: moved several features to commandline

many features have been REMOVED from the configure script and into the
commandline including chained tokens, bayesian noise reduction, automatic
whitelisting, and training modes.  please see the documentation for a complete
list of commandline arguments.

configure functions which have changed:

--with-userdir-*			changed all to dspam-home
--with-local-delivery-agent		changed to --with-delivery-agent
--enable/disable-chained-tokens		removed from configure
--enable/disable-bnr			removed from configure
--enable/disable-whitelist		removed from configure
--enable/disable-toe			removed from configure
--enable/disable-tum			removed from configure
--enable/disable-spam-delivery		removed from configure
--enable-deliver-to-stdout		removed from configure

[20040328.1745] jonz: completely reworked commandline arguments 

please see documentation for new commandline arguments. 

[20040328.1745] jonz: removed free-pass of arguments by untrusted users

removed ability to pass in arguments by untrusted users, when the file
untrusted.mailer_args didn't exist

[20040327.2230] jonz: CGI to allow logo-click to return

changed CGI to allow a click on the DSPAM logo to return the user to the
main page

[20040327.2222] jonz: thresholds to include all totals

thresholds changed to include all 3 totals: learned, classified, corpusfed

[20040327.2221] jonz: test-conditional training threshold dropped

test-conditional training threshold dropped to 1000 messages

[20040326.0730] jonz: extended DAF flagset

extended DAF flagset to four bytes

[20040326.0730] jonz: temporarily removed blackbox framework

archived and removed blackbox framework from cvs; not likely i'll be working
on it any time soon

[20040325.2129] jonz: extended context flags to u_int32_t

extended context flags to 4 bytes, to add additional commandline features

[20040325.2129] jonz: compatibility fixes for TOE

compatibility fixes for TOE for web client and stats

[20040325.1939] jonz: code cleanup

commented headers, cleaned up code

[20040325.1930] jonz: converted total_spam, total_innocent

converted total_spam, total_innocent to spam_learned, innocent_learned, and
added spam_classified, innocent_classified for stats use with TOE.  

NOTE: changes are required to SQL-based drivers for this version

MySQL Example:

alter table dspam_stats add spam_learned int;
alter table dspam_stats add innocent_learned int;
alter table dspam_stats add spam_classified int;
alter table dspam_stats add innocent_classified int;
update dspam_stats set spam_learned = total_spam;
update dspam_stats set innocent_learned = total_innocent;
update dspam_stats set spam_classified = 0;
update dspam_stats set innocent_classified = 0;
alter table dspam_stats drop column total_spam;
alter table dspam_stats drop column total_innocent;
alter table dspam_stats add spam_misclassified int;
alter table dspam_stats add innocent_misclassified int;
update dspam_stats set spam_misclassified = spam_misses;
update dspam_stats set innocent_misclassified = false_positives;
alter table dspam_stats drop column spam_misses;
alter table dspam_stats drop column false_positives;

[20040325.1930] jonz: addspam to fail on failed signature retrieval

due to a lot of misconfigurations of dspam, addspam will now fail if a 
signature cannot be retrieved.  this should help pinpoint problem installs
and clients, and prevent poor accuracy. 

Version 2.11.1
--------------

[20040325.0757] jonz: added --help

added --help commandline argument

[20040325.0757] jonz: fixed division by zero bug in dspam.cgi

small chance of division by zero bug fixed

[20040325.0740] jonz: fixed toe

fixed toe, which has been accidentally disabled in testing

[20040325.0740] jonz: provided runtime arguments for training mode

added run-time arguments --toe --tum --teft to specify training mode.  the
default is based on configure-time options.

also added training_mode variable to dspam context, should not affect
compatibility.

Version 2.10.2
--------------

[20040319.2138] jonz: added shell quoting of special characters

special characters are now quoted, instead of filtered, when calling the LDA.

Version 2.11.0 / Version 2.10.2
-------------------------------

[20040319.1845] jonz: fixed bash special characters problem

fixed special characters problem in bash by encapsulating all arguments in
quotes

[20040319.0730] jonz: added train-on-mature training option

--enable-tum
train-on-mature (TuM) is a hybrid of train-everything and train-on-error.  
all tokens are candidates for training as in train-everything, but only tokens
whose total number of "hits" don't exceed 100 are trained.  on error, all
tokens are trained.  this provides a good balance between the volatility of
train-everything and the lack of behavioral learning in train-on-error.  it 
also has the added benefit of not breaking the things that toe presently
breaks in dspam (whitelists, stats, etc).

[20040319.0700] jonz: fixed source address bug

fixed a bug in source address tracking where messages were reported as innocent
even if they were guilty, if the user had < 2500 messages in corpus

[20040318.1932] jonz: fixed compile-time warning in dspam_tools.c

fixed warning for uninitialized crc variable

[20040318.0259] jonz: post-training features dropped to 2500

post-training features such as TOE and BNR have had their prerequisite ham count
droped from 4000 to 2500.

[20040318.0241] jonz: fixed up headers so developers only need libdspam.h

fixed up header dependencies so developers only need include libdspam.h to
use libdspam.

[20040318.0124] jonz: added support for header-based signatures

for implementations where a signature in the body is unacceptable, using
--enable-signature-headers will place the signature in the header, and not
in the body.

IMPORTANT: This will -require- that the headers be forwarded with the message
when being reported as spam.  This usually requires bouncing the message,
forwarding it as an attachment, or using a macro.  The header will otherwise
be lost with standard forwarding.

[20040316.2315] jonz: added support for userlist termination

userlist can now be terminated using --

Version 2.10.1
--------------

[20040314.0128] jonz: bugfix for segfaults in dspam.c

segfaults can occur on some systems (predominantly Solaris) when mail is sent
to multiple local recipients.  bugfix required the header insert pointer to
be reset.

Version 2.10.0
--------------

[20040307.1828] jonz: new dspam_corpus tool by Gary Funck

replaced old dspam_corpus tool with a better one contributed by Gary Funck 
<gary@intrepid.com>

[20040305.0320] jonz: added postfix documentation

added documentation for postfix local delivery

[20040305.0320] jonz: added support for domain filesystem structure

use of --enable-domain-scale configures filesystem for domain-based
support.  when used, username@domain should be passed in as the userid and
$USERDIR/domain/username/ will be used instead of $USERDIR/username or
$USERDIR/u/us/username as done with large scale

[20040303.2208] jonz: applied bugfix patch by dennis pedersen

applied a bugfix to libdb3 and libdb4 fixing a bug that was presented in rc2
causing loop hangs.  submitted by dennis pedersen <dennis@moellegaard.dk>

[20040303.0243] jonz: added long username support

by default, the username length uses the same limits as the operating system.
if --enable-long-usernames is specified, however, the limit will be set to
256.

Version 2.10-rc2
----------------

[20040302.0007] jonz: implemented auto-whitelisting

implemented auto-whitelisting using --enable-whitelist function.  automatic
whitelisting will automatically whitelist any full 'From' addresses (including
the name) that have appeared in at least 10 innocent messages and zero spams.
when a message is forwarded as a spam, any automatic whitelisting for that
address is permanently deactivated.

[20040301.2339] jonz: fixed purge.sql

fixed some bugs in MySQL's purge.sql, optimized for speed thanks to another
patch submitted by bob glamm.

[20040229.1245] jonz: applied patch submitted by Sascha Blank

applied patch submitted by Sascha Blank for dspam_dump to allow lookup of
individual tokens.

[20040228.1618] jonz: train-on-error to perform source address tracking

train-on-error mode fixed to perform source address tracking

[20040224.2008] jonz: fixed high cpu utilization on large messages

fixed an iteration problem which caused high cpu utilization on large (2MB+)
text messages

[20040223.0350] jonz: fixed compile error in libdspam.c

fixed compile error in libdspam.c when HAVE_ISO_VARARGS isn't defined

Version 2.10-rc1
----------------

[20040222.1606] jonz: added support for global groups

global groups allows DSPAM to provide a "SpamAssassin type out-of-the-box
filtering" for all new users until they have built their own useful
dictionaries.  to create a global classification group, add something like
this to $USERDIR/group:

groupname:classification:*globaluser

This will automatically add globaluser as a classification peer to all users.
Any user who has less than 1000 innocent messages or 250 spam messages in 
their corpus, or whose filter is uncertain about a particular message will 
consult the global dictionary for an answer.

global groups will need to be trained using corpus or other means, or by
using the dspam_merge tool.  the global user (in this case 'globaluser') is
treated just as any other user on the system.

[20040221.2155] jonz: format changes to dspam_dump

dspam_dump formatting changes + display of token probability

[20040220.1700] jonz: added quick fix for \r stripping in dspam_corpus

added a quick fix to strip \r's in mailboxes when using dspam_corpus

[20040220.1700] jonz: fixed segfault bug

fixed a bug that caused DSPAM to segfault on empty MIME delimiters.  This
generally only occured with spams, as legitimate messages have RFC-compliant
delimiters.

[20040219.0150] jonz: added support for neural networking

see README for more details

[20040218.2300] jonz: added tweaking to BNR for small text samples

added tweaking of thresholds to BNR for small text sampes < 3.5k

[20040217.0724] jonz: fixed some miscellaneous compile warnings

fixed some miscellaneous compile warnings.  2 for when trusted user security
is disabled, 1 for dspam_2mysql.c:126

Version 2.10-beta-2
-------------------

[20040214.1632] jonz: added TOE support

added TOE (Train on Error) support using the --enable-toe configure function.
see the README file for more details.

[20040213.1549] jonz: fixed X-DSPAM header duplication bug

fixed a bug which caused X-DSPAM headers to be cumulatively appended when
a single message addresses multiple local users.

[20040214.1327] jonz: added --enable-client-compression configure flag

added option --enable-client-compression to use compression option between
data source and its clients (where available).  presently only available with
the mysql_drv storage driver.  you should enable this if the data source
is on a separate machine from the DSPAM agent(s), as it conserves bandwidth
at the expense of a few CPU cycles.

[20040214.1258] jonz: created speed and space optimized MySQL scripts

created both speed and space optimized mysql_objects.sql scripts.

[20040214.1235] jonz: added new stats to CGI

added FP stats + overall accuracy to CGI

[20040214.1235] jonz: added debug output for noise filtering

added noise level, spammy tokens, and eliminations to debug output

Version 2.10-beta-1
-------------------

[20040212.2208] jonz: added stale data purge / PURGE_ANY

added stale data purge to libdb3 and libdb4 purge tools.  based on PURGE_ANY,
defined in config.h, any stale data is removed after six months.

[20040212.2205] jonz: added DSF_NOISE flag

added DSF_NOISE flag to libdspam interface for activating Bayesian Noise 
Reduction. 

[20040211.0158] jonz: disabled mysql_drv _ds_delete_signature

disabled _ds_delete_signature in mysql_drv due to errors; added signature
purge to purge.sql script.  no longer necessary to run dspam_clean if using
the mysql storage driver.

[20040211.0155] jonz: mysql_drv get_one update

check to insure there was at least one token to be loaded, otherwise do not
perform query

Version 2.9.6
-------------
[20040208.1906] jonz: bugfix for BNR

BUGFIX: when BNR is activated on users with < 4000 innocent
messages, the filter forgets to load token stats for the user and marks
all messages as innocent.

Version 2.9.5
-------------
[20040204.0413] jonz: implemented Bayesian Dolby

implemented Bayesian Noise Reduction
(see http://www.nuclearelephant.com/projects/dspam/bnr.html)

[20040202.2216] jonz: added multipart frequency threshholds

body tokens in multipart messages now require a minimum frequency of 2 to be
included in the calculation.

[20040128.2021] jonz: only report source-addresses in mature corpuses

only report source-addresses when the user has >4000 innocent messages in
their corpus.

Version 2.9.4
-------------

[20030128.0334] jonz: added DSPAM SBL dropfile support

added support to source address tracking to drop SBL files to /var/spool/sbl
if exists, where client in directory watch mode can read.

Version 2.9.3 
-------------

[20040122.0700] jonz: hex decoding
                                                                                
a small piece of code to perform hex-decoding on 8bit encodings.  very useful, 
although hex encoding is still somewhat rare.
                                                                                
[20040121.0805] jonz: new stats watering-down code for high-spam users
                                                                                
implemented new code for watering down statistcs during the learning phase to
compensate for users with a high percentage of spam.  this should only affect
accuracy of normal (average spam) users for the first 1000 messages.
significant watering down takes place up to 1000 spams.  limited watering
down takes place up to 2500 spams if the user has more spam in their corpus
than innocent mail.
                                                                                
[20040121.0805] jonz: priority given to complex tokens
                                                                                
slight code tweak to give priority to more complex tokens (e.g. chained
tokens) to help improve accuracy.
                                                                                
[20030121.0805] jonz: signaure should not be stored when using --corpus
                                                                                
signatures are no longer stored when using the --corpus flag

Version 2.9.1
------------

[20031220.1442] jonz: added notification emails

three different notification emails can be configured to get sent:

- to a user the first time they receive a message through dspam (first run)
- to a user the first time a spam is caught through dspam on their behalf
- to a user when their quarantine box is > 2MB in size

to use notification emails, copy the txt/ directory from the distribution
into USERDIR and configure the emails accordingly.  more information is
available in the README.

Version 2.8.1
-------------

[20031205.0821] jonz: html preformatting only for html parts

html preformatting to be done only to html parts; html comments in
plain text parts should not be filtered out.

[20031205.0156] jonz: high-byte tokens not ignored

fixed a small bug causing tokens consisting of all high-bytes to be
ignored.
 
[20031205.0122] jonz: tweaked cgi spam ratio

tweaked cgi spam ratio to include misclassificatoins

[20031130.1016] jonz: dspam_merge to corpusfy totals

dspam_merge now moves all totals to corpusfed, so that a merged user can
easily start with fresh stats.

[20031129.1619] jonz: fixed quarantine agent arg skip bug

fixed minor bug which caused some arguments to be skipped then using a custom
quarantine agent
 
[20031129.1443] jonz: implemented opt-in/opt-out storage directory

moved all user.dspam and user.nodspam files to USERDIR/opt-in and
USERDIR/opt-out, respectively.  this saves from needing to have and set up
a directory for each user.
 
Version 2.8
-----------

[20031126.1633] jonz: stepped down insert query error to debug info

stepped down the query error on insert down to debug info, as it is a common
occurance on busy servers.

[20031124.0523] jonz: corrected buffer overrun in BDB drivers

corrected buffer overrun vulnerability in BDB drivers dealing with copying
tokens into memory.  discovered when working with corrupt dictionaries which
caused segfaults.  the dictionary would have to be manipulated in order to 
exploit, so risk was minimal.

[20031124.0459] jonz: fixed bug in dspam_2mysql

dspam_2mysql failed to place quotes around token value.

[20031123.1351] jonz: fixed libdb4,libdb3 shared group bug

fixed a bug that caused shared groups to fail with the following error:

DB_ENV->open failed: No such file or directory

[20031120.0405] jonz: fixed HTML boundary corruption with signature removal

fixed a bug that caused boundary corruption after an HTML part where a DSPAM
signature from a previous reply was removed by the agent.

[20031120.0405] jonz: do not remove old signatures from signed messages

corrected the dspam agent so that older signatures from signed messages were
not parsed out.  this caused the message to fail to authenticate.

Version 2.8-rc-1
----------------

[20031115.2042] jonz: fixed minor memory leak on initialization failure

minor memory leak caused in libdspam when dspam_init fails.  does not affect
DSPAM agent, only library.

[20031115.2042] jonz: DSM_CLASSIFY generated truncated signatures

fixed a bug where DSM_CLASSIFY generated truncated signatures 

[20031115.1540] jonz: corrected multipart analysis bug

corrected a bug that caused parts of a multipart message that were not
specifically marked as text with the "Content-Type" header to be ignored from
analysis.

[20031114.1949] jonz: corrected DSM_CLASSIFY in-memory totals bug

corrected a bug that changed in-memory totals when DSM_CLASSIFY was used

[20031113.1938] jonz: corrected DSM_CLASSIFY bug in libdspam

corrected two bugs in libdspam regarding the DSM_CLASSIFY mode:

1. CTX->signature would overwrite the provided signature with a new signature
   resulting in a potential memory leak

2. If no signature was provided, DSM_CLASSIFY would segfault instead of create
   a new signature

Version 2.8-beta-2
------------------

[20031103.1119] awn: libdspam version changed to the '4:0:0'

libdspam version changed to the '4:0:0' because introducing and
requiring of dspam_init_driver() at start and dspam_shutdown_driver() at
and is backward incompatible change.

[20031031.0402] jonz: fixed web stats for shared groups

shared group webstats fixed

[20031031.0340] jonz: added commandline options

added --stdout commandline option to deliver messages to stdout
added --deliver-spam commandline option to deliver spams to user's mailbox
changed --deliver flag to --deliver-fp, although --deliver still supported
  for backward compatibility.  option still only necessary when configuring
  with --enable-spam-delivery

[20031031.0324] jonz: changed default configure options

enabled the following as defaults in configure:

alternative-bayesian	(alternative Bayesian algorithm)
test-conditional	(test-conditional, iterative based training)

[20031030.1120] jonz: fixed caching bug

fixed caching bug in mysql_drv driver and ora_drv drivers causing dspam_stats
to return stats for first user, as stats for all users

[20031029.0538] jonz: added --classify commandline flag

the --classify commandline flag will classify the input message and output
to stdout "SPAM" or "HAM" depending on the result.  No changes will be made
to the user's tokens or totals.

[20031029.0538] jonz: changed totals mechanism

the following changes have been made to the totals mechanism:

- spam_misses has been changed to spam_misclassified
- false_positives has been changed to innocent_misclassified
- spam_corpusfed and innocent_corpusfed have been added

IMPORTANT UPGRADE NOTE: Please see the README for information on updating your
SQL databases to accept these changes if you are using a SQL-based driver.  If
you are using a BDB-based driver, these changes will automatically be 
implemented.

[20031028.2000] jonz: corrected CLASSIFY bug in mysql_drv and ora_drv

corrected a significant bug in mysql_drv and ora_drv which caused tokens and
totals to be incremented on all CLASSIFY calls.

[20031028.2000] jonz: changed DSF_CLASSIFY (flag) to DSM_CLASSIFY (mode)

the DSF_CLASSIFY flag is now a mode called DSM_CLASSIFY.

Version 2.8-beta-1
------------------

[20031028.0531] jonz: added customizable header for cgi

cgi spam account now has customizable header

[20031028.0448] jonz: classification catches to add as spam

spam catches by a member of a classification group should result in the
message being added as spam, as opposed to innocent.  this has been corrected.

[20031028.0204] jonz: X-DSPAM-User header only considered in managed groups

the X-DSPAM-User header field is only paid attention to when the user is
a member of a managed group (the only time where the original user is
necessary).

the parsing of the X-DSPAM-User header has also been corrected to chomp the
newline character, which was resulting in some systems including the character
in the username.

[20031028.0116] jonz: corrected a critical error in classification groups

corrected a critical error in classification groups causing DSPAM to crash
(and the message get delivered by the MTA's failsafe in most cases) when a
user in a classification group resulted in a spam being caught.

[20031027.0137] jonz: added mta whitelists for source address tracking

file USERDIR/mta.whitelist may now contain a list of internal MTA ip addresses,
which will cause DSPAM to skip to the next 'Received' header when processing
the source address.  each IP should be on a newline.

[20031026.1706] jonz: added signal handling to tools

added signal handling to tools, to unlock databases upon SIGINT, SIGPIPE or 
SIGTERM to avoid stale locks.

[20031025.1111] jonz: added rolling filter accuracy stats to cgi

rolling filter accuracy stats allows the user to measure their filtering
accuracy over a period of time (usually monthly or quarterly).  stats should
be reset after a good learning period (approximately 4000 spams and nonspams)
to measure accuracy accurately =)

[20031024.0007] jonz: libdb drivers reworked

libdb drivers reworked for better:
- locking (exclusive)
- recovery (simple recovery run on open)
- environment management (individual user environments)

IMPORTANT UPGRADE NOTE:

run the script 'dspam_movefiles [userdir]' in the tools directory to upgrade to
this new directory storage format.  after running, make sure you chown the
correct file ownership to the newly created directories.  this should be done
with the MTA shut down and no dspam processes running.

you will also need to reinstall/reconfigure the CGI

[20031023.1949] jonz: update to cgi to avoid missed messages

cgi now tracks the size of the quarantine between viewing and deleting all
messages, to avoid deleting messages that came in while reviewing the
quarantine.

[20031023.1727] jonz: compensated for converged boundaries

compensated for a slight break of RFC where two boundaries in a nested 
message appear without a blank space in-between, leading to message corruption.
fortunatley, this type of behavior is extremely scarce.

[20031023.0900] jonz: fixed classification group bug

fixed a bug that caused classification groups never to fire; datatype
CTX->confidence should be float, not int.

[20031022.2229] jonz: added "-d %u" to default cgi flags

added "-d %u" to default dspam cgi flags to assist new users

[20031022.0930] jonz: fixed bug preventing multiple group subscriptions

fixed a bug that caused a user to not be able to be subscribed to multiple
groups

Version 2.7.6.10
----------------

[20031022.0930] jonz: added support for managed shared groups

the group type 'shared' can be appended with ',managed' to convert the shared
group into a managed shared group.  a managed shared group is the same as a
shared group, only the managed version will share the quarantine box as well,
enabling one user (named after the group) to manage the handling of all
quarantine functions (false positive reporting, etc.).

this is generally not what users want, as personal information could potentially
be shared with the administrator of the group, however there are some
circumstances where this would be appropriate.

a regular shared group:

groupname:shared:user1,user2,userN

a managed shared group:

groupname:shared,managed:user1,user2,userN

[20031022.0930] jonz: corrected long-time stdin bug

corrected a long-time, just discovered but that caused stdin to be read in very
small chunks (32 bytes each).  correcting this bug has caused DSPAM to read
in messages much quicker.

[20031022.0930] jonz: cgi to use X-DSPAM-Signature

when message-id is not present, the cgi will now use the X-DSPAM-Signature
field to uniquely identify each message.

[20031022.0930] jonz: extended header assembly buffer to 4k

header assembly buffer extended to 4k; was truncating some longer fields at 1k.

[20031022.0930] jonz: minor crash bugfix

an obscure bug has been corrected which caused dspam to crash if the word
"boundary" was placed on a line in the message body, and that line began
with a space or tab.

[20031022.0900] jonz: false positives not delivered when spam-delivery enabled

false positives shouldn't be delivered when --enable-spam-delivery is enabled,
since they will be mailed in (or otherwise processed) directly from the user's
inbox.

to force false positives to be delivered, use the --deliver commandline
argument

Version 2.7.6.9
---------------

[20031021.1300] jonz: significant changes to mysql driver

the data type for the 'token' field in the dspam_token_data table has been
changed from BIGINT to VARCHAR.  This is due to a bug in MySQL being unable to
handle some of the large numeric values used for tokens.  

BEFORE UPGRADING, SHUT DOWN YOUR MTA AND ISSUE THE FOLLOWING MYSQL QUERY:

alter table dspam_token_data modify token varchar(32);

[20031021.1206] awn: Convenience symlinks for libdb{3,4}_deadlock

Convenience symlinks dspam_deadlock.libdb4 (in case of libdb4_drv),
dspam_deadlock.libdb3 (in case of libdb3_drv) and dspam_deadlock (in
case of both libdb*_drv) are added and pointed to the appropriate
libdb{3,4}_deadlock binary.

[20031021.1016] awn: configure: mysql and network-related libraries

-lnsl and -lsocket are added to the mysql client library check where
needed (e.g. on Solaris).

[20031021.0000] jonz: changed signature format to include frequency

WARNING: You should delete all your temporary signature information before
upgrading to this version, as the signature format has changed.  You can do
this by deleting all your .sig files or issuing a 
"delete from dspam_signature_data" query if using a SQL-based driver.

RATIONALE: When performing classification queries with signatures, the
frequency is necessary to insure an identical calculation.

[20031021.0000] jonz: added support for 'CLASSIFICATION' group

A 'CLASSIFICATION' group type has been added.  Classify groups are groups of 
users who share the results of spams against their own personal dictionaries.  
This means that for every message that comes in for any user in the group, 
dspam classifies that message for every user and if any user believes the 
message to be spam, it is marked as spam for the destination user.

To avoid false positives, external classification is only used when there is
a confidence level of 0.30 or higher of spam.  The confidence level is
calculated with Chi-Square.

Members of this type of group should only join after their initial training
period.  Members may also be part of an inoculation group, but users can
not be a part of both a classify group and a shared group.

[20031021.0000] jonz: changed default probability for single-corpus tokens

changed the probability for tokens that appear only in one corpus:

TYPE			FROM		TO
Appears +10 in Spam	.9901		.9999
Appears <10 in Spam	.9900		.9998
Appears +10 in Innocent	.0099		.0001
Appears <10 in Innocent	.0100		.0002

[20031019.2200] jonz: added test-conditional training support

added configure flag --enable-test-conditional which will enable test-
conditional training.  test-conditional tranining will automatically re-train
the user's dictionary on spam or false positive until the message condition is
met (e.g. until the user's dictionary no longer results in misclassification of
the message being retrained).  this training has a maximum number of 5
iterations, and will only invoke when:

- The user has > 4000 innocent messages in their corpus, and is reporting
  a spam

- The user is reporting a false positive (regardless of the number of
messages in their corpus)

[20031019.2016] jonz: added support for shared groups in mysql_drv driver

support has been added for shared groups using the mysql_drv driver, but with
one caveat: if you will NOT be enabling "virtual users" support, you will need
to create a user on your system for each group you add.  This is because the
mysql_drv driver maps user ids in the database to users on the system.  this
is not an issue when "virtual users" support is enabled.

Version 2.7.6.8
---------------

[20031019.1722] jonz: added mysql.sock functionality

added functionality for connecting via mysql.sock instead of TCP.  specify
pathname to socket in lieu of hostname to implement.

[20031019.1700] jonz: eliminated false-positive retrain headers

eliminated the additional X-DSPAM headers added when reclassifying a 
false positive.  the headers from the original classification are
preserved.

[20031019.1530] jonz: centralized syslog logging of mysql query errors

centralized/standardized syslog logging of all mysql query errors

[20031019.1530] jonz: corrected bug in virtual users w/mysql

corrected a bug causing some tools to fail when virtual users is enabled while
using the mysql_drv driver.

[20031018.1050] jonz: corrected type-o in dspam_corpus.in

fixed close(PIPIE) type-o in dspam_corpus.in

Version 2.7.6.7
---------------

[20031017.2230] jonz: enhanced overall inoculation processing

code cleanup of inoculation processing; one central subroutine.  fixed some
minor related bugs.

[20031017.2129] jonz: corrected external inoculation processing

external inoculations (--corpus --inoculate --addspam combination) resulted in
an error causing the user to never be inoculated, however all users in the
inoculation group were.  corrected this bug so that the destination user would
also be inoculated. 

Version 2.7.6.6
---------------

[20031017.1930] jonz: fixed bugs in CGI 'From' line reporting

fixed a bug that caused malformatting in the 'Fron' line when placing in spam
quarantine

[20031017.1930] jonz: fixed bugs in false positive processing

fixed a bug, which now strips out any quarantine message 'From' line added by
DSPAM prior to processing.

[20031017.1930] jonz: fixed variable definition problems with experimental code

fixed bugs in experimental code; should not affect normal users, but broke
the build anyway.

Version 2.7.6.5
---------------

[20031017.1730] jonz: added --enable-experimental

added --enable-experimental flag which activates experimental code, moved
the following code bases to experimental:

- Versatile Language Message Inoculation Format
  (standard for sending/receiving inoculations across multiple anti-spam
   platforms and systems)

- Counting of unknown tokens in messages

[20031017.1700] jonz: only inoculate users who require inoculation

inoculation now only inoculates users who would otherwise have misclassified
the message being presented
 
[20031017.1600] jonz: changed all /tmp files to USERDIR

all /tmp files now outputted to USERDIR to avoid a race condition.

[20031016.2207] awn: libdb detection is changed again (sigh)

Probing for -ldb-<major> and -ldb<major> is resurrected again (needed
for some version of Debian with libdb v3.2.9).  Difference from previous
one is using libtool for linking test frogram at the "header-
vs. library version" check stage.

[20031016.1837] jonz: changed high characters to 'z' instead of ignored

changed all high characters to z's; previously ignored them.  effective way to
improve filter rate on spams using wide characters.  credit for this technique
given to Brian Burton.

[20031016.1400] jonz: added warning about MySQL bug to README

added information about the bug in MySQL versions < 4.0.15.stable to the
MySQL README.

[20031016.1227] jonz: compensated for mysql_drv insert bug

compensated for mysql_drv insert bug; made better code in both mysql_drv and
ora_drv to handle insert failures with more grace

[20031016.1142] jonz: corrected token insert debug output

corrected debug output for token inserts to display correct query and disk
state.

Version 2.7.6.4
---------------

[20031016.0946] jonz: switched to MyISAM MySQL tables

InnoDB turned out to be much slower than MyISAM, so all MySQL objects have
been changed to be of type "MyISAM".

[20031015.1434] jonz: added exit code mirroring of LDA

added exit code mirroring of LDA; if any calls to LDA fail, dspam will return
the last failed exit code

[20031015.1045] jonz: added caching of getpwnam() and getpwuid() information

added caching of getpwnam() and getpwuid() information for non-virtual users
(already caches for virtual users).  this was added to keep some tools from
hammering on LDAP or other local authentication mechanisms.

Version 2.7.6.3
---------------

[20031014.2211] jonz: fixed 100% cpu utilization bug in libdbX_deadlock

fixed a bug in libdbX_deadlock causing 100% cpu utilization on linux
 
[20031014.1935] jonz: fixed auto-recovery in libdb drivers

fixed bugs in auto-recovery mechanism in libdb drivers

[20031014.1545] jonz: added support for accepting inoculation messages

Added support for "Inoculation Message Format", a new standard which
is currently in the form of an Internet-Draft, to allow inoculation
via email and trusted checksums.

[20031014.0824] jonz: added X-DSPAM-Signature

X-DSPAM-Signature is NOT a replacement for having in-line signatures
but is useful for debugging purposes

[20031014.0842] jonz: enhanced boundary recognition

enhanced boundary recognition to catch boundaries with malformatted 
definition lines

[20031013.2217] jonz: fixed bug in dspam_2mysql

fixed type-o in 'false-positives' field to false_positives

[20031013.1949] jonz: better html filtering

implemented better filtering of some useless html tag data, focus more on
content; resulted in the catching of a few more spams

[20031013.1832] jonz: added --inoculate flag

added support for inoculation using --inoculate flag.  this can be used in
conjunction with external inoculation as described in the README file.

Version 2.7.6.2
---------------

[20031013.1443] jonz: fixed algorithm initialization bug

fixed a bug in the initialization of algorithm data, which caused some
miscalculations whenever the first token was very innocent.

[20031013.1413] jonz: changed token sorting algorithm

token sorting now sorts by delta first, then by frequency; this means 
tiebreakers will be based in part on token frequency

[20031013.1329] jonz: added deadlock detection tool

for large-volume implementations, added a deadlock detection tool, 
libdb3_deadlock or libdb4_deadlock.  this tool can be run at system start and
will continue to perform deadlock operations in the background.
 
[20031013.1317] jonz: implemented deadlock detection

Implemented calls to libdb's deadlock detection mechanism

[20031013.1250] jonz: modified Chi-Square algorithm for better performance

Chi-Square algorithm changed to use 25 tokens, ignoring mid-range

[20031012.1831] jonz: changed group file format, added inoculation type

changed group format to:

groupname:grouptype:user1,user2,userN

BE SURE TO UPDATE IN YOUR GROUP FILE

there are now two types of groups: shared and inoculation.  the shared group
is the group everyone is used to, sharing dictionaries and signature dbs.

the inoculation group allows each member of the group to maintain their own
private dictionary and signature database, but members of the group will
automatically train eachother's dictionaries with spams they manually forward in
which will help 'inoculate' all other group members from new spams going out.

examples:

development:shared:bob,tom,bill

company:inoculation:jim,ted,robert

a user can be a member of multiple inoculation groups, but cannot be a member
of both a shared group and an inoculation group.

[20031012.0009] jonz: fixed freed-memory bug in decode.c

fixed freed-memory bug in deocde.c, which caused an occasional crash when
decoding encoded headers.

Version 2.7.6.1
---------------

[20031011.1236] jonz: added support for multiple algorithms

added support for multiple algorithms; e.g. if any of the enabled algorithms
suspect the message is spam, it is spam.  you can use the following flags:

--enable-chi-square
--enable-alternative-bayesian
--disable-traditional-bayesian

traditional bayesian is enabled by default

[20031011.1034] jonz: added Chi-Square specific per-token calculations

when using Chi-Square, added Chi-Square's expanded per-token calculations

[20031011.0923] jonz: fixed alternative bayesian calculations

fixed problem with the wrong definition names being used, which caused
alternative bayesian never to get invoked

[20031011.0923] jonz: fixed a bug in all calculations

a bug in 2.7.6 was fixed which resulted in spams to be missed if there were
fewer than 15 tokens available for calculation.  this could only occur in the
most rarest of circumstances, so it should not have affected much.

Version 2.7.6
-------------

[20031008.2200] jonz: added alternative calculation modes

added --enable-alternative-bayesian flag which invokes Brian Burton's 
alternative Bayesian algorithm 

added --enable-chi-square flag which invokes Chi-Square algorithm

only one or neither (for default bayesian) flags should be used.  debug
information for all three calculations is generated regardless.

[20031008.2029] jonz: fixed bug in libdb drivers

fixed a bug which used memory that had already been freed causing
some occasional unpredictible behavior.
 
[20031008.1431] jonz: added support for multipart/signed messages

added support for multipart/signed messages without altering message body.
signature is appended as a text attachment.

[20031007.1904] jonz: fixed bug in boundary detection

fixed a bug in boundary detection where boundary would fail to be detected if
it wasn't the first definition on the Content-Type heading.  For example:

Content-Type: multipart/signed; protocol="application/x-pkcs7-signature"; 
  boundary="------------ms010307080208090601090900"

would have failed.  this bug fix also improves overall boundary detection. 

[20031007.1724] jonz: added source address reporting

the source address for all messages are now reported via syslog. this uses 
the new dspam_getsource() function added to the API.  depending on whether the
message is spam or innocent, the message will be reported either to MAIL.INFO
or MAIL.DEBUG.  for example:

dspam[30965]: spam detected from X.X.X.X 

dspam[30414]: innocent message from X.X.X.X 

this can be used for creating automatic blacklists.  more to come.

[20031007.1557] awn: configure script changes

Configure script now detects version of libdb headers and guesses
appropriate library name from this version.  Probed libraries are:

    -ldb-<major>.minor>
    -ldb<major><minor>

As consequence and for example, no symlinking libdb41.so to the libdb-4.so is required now on FreeBSD.

Version 2.7.5
-------------

[20031007.0930] jonz: date field no longer ignored

date field is no longer ignored; time of day can sometimes play an effective
role in identifying spam or preventing false positives.

[20031006.1911] jonz: Oracle storage driver

first release of ora_drv; storage driver for Oracle.  please see README file
for more information.

[20031004.1423] awn: support for program-name transformation.

Configure options `--program-prefix', `--program-suffix' and
`--program-transform-name' are fully supported now except CGI.
(Was: dspam_corpus and dspam_genaliases don't honor transformed name of
dspam binary).

[20031003.1832] jonz: fix for base64-encoded binary messages 

bug fixed which caused corruption in some base64-encoded single-part
messages in which the only component was a binary file.

[20031003.0031] jonz: automatic recovery for libdb drivers

automatic recovery has been implemented for libdb drivers 

[20031003.0031] jonz: DB_ENV implemented for libdb drivers

DB_ENV locking has been implemented for libdb drivers.  This obsoletes 
storage driver dot-lock file locking, which is no longer used.  quarantine 
dot-lockfile locking is still used when writing to the quarantine.

Version 2.7.4
-------------

[20031002.1728] jonz: modified corpus flag to force results

use of corpus flag now forces results to match commandline flags, meaning
innocent messages no longer need to be fed in first.
 
[20031002.0800] jonz: added unique id to dspam_ngstats

for systems without a static public ip address, a unique id can be configured
in dspam_ngstats.c (NGSTATS_UID) comprised of alphanumeric characters, periods,
and underscores.  any invalid characters will cause stats to be ignored.

[20031002.0800] jonz: removed broken sanity checks

some sanity checks were firing off erroneous messages in 2.7.3; these have
been removed

[20031001.0800] jonz: fixed --enable-large-scale with mysql_drv

modified all drivers to add support for --enable-large-scale with mysql_drv

[20031001.0800] jonz: added dspam_ngstats

added dspam_ngstats, a global stats reporting tool designed for global
stats tracking for dspam

[20030930.1547] awn: Convenience symlinks for libdb{3,4}_purge

IMHO, `libdb3_purge' and `libdb4_purge' are not a very descriptive names.
Therefore, 2 convenience symlinks are added:
  o  dspam_purge.libdb4  (dspam_purge.libdb3 in case of libdb3 driver), and
  o  dspam_purge
both pointed to the appropriate libdb{3,4}_purge.

[20030930.1517] jonz: fixed problem with trailing commas in update command

Version 2.7.3
-------------

[20030929.1450] jonz: fixed problem with groups

groups has been repaired; apparently a line of code was inadvertantly deleted
from the source tree causing it to fail in 2.7.2.

[20030928.0253] awn: New scheme for conditional compilation of storage drivers

All following is for `configure.ac' and resulting `configure' script:

    Now configure doesn't assume that storage driver sources are have
    name `${storage_drv}.c' and `${storage_drv}.h'

    You need to list resulting .lo files in the `${storage_drv_objects}'
    variable instead.

    Storage driver specific subdirectories are should be listed in the
    `${storage_drv_subdirs}' variable also.

This allows to have any number (including zero) driver-specific sources
and subdirectories, build automatically driver specific tools in these
directories (like `libdb4_purge') and should work properly in the VPATH
environment.

[20030928.0248] awn: configure.ac bug fix

Fix CPPFLAGS related bugs in the storage drivers sections of
`configure.ac'.

All three storage sections in the configure.ac was have code like
    CPPFLAGS="$DB_LIBS $CPPFLAGS"
instead of
    CPPFLAGS="$DB_CPPFLAGS $CPPFLAGS"
(replace DB_ by MYSQL for give mysql case).

This was my bug, I know.

[20030927.1600] jonz: added docs for Courier MTA

added documentation for configuring Courier MTA with DSPAM.  contributed by
Michael Greb.

Version 2.7.2
-------------

[20030925.2231] jonz: added --disable-trusted-user-security

added configure flag --disable-trusted-user-security to disable trusted user
security, rather than trying to maintain two different versions of dspam.

[20030925.1103] jonz: added support for RedHat's built-in libdb4.0

added support for RedHat's built-in libdb-4.0.  This should also provide
compatibility with any other libdb-4.0.  An alias will still be necessary:

ln -s /usr/lib/libdb-4.0.so /usr/lib/libdb-4.so

[20030925.1103] jonz: removed -d $u from default LDA configuration

-d $u coming first in the argument list caused some problems; -d %u should now
be used instead in the MTA configuration.
 
[20030925.1103] jonz: patch to compensate for yahoo broken RFC bug

implemented patch to compensate for a bug in the yahoo client where yahoo
breaks RFC and writes an end boundary prematurely, causing the real boundary
to get corrupted.

[20030925.0855] jonz: changed compile flag --enable-virtual-uids

changed compile flag --enable-virtual-uids to --enable-virtual-users

[20030925.0852] jonz: fixed plain text html signature placement bug

fixed a small bug that caused DSPAM to place the signature in html code samples
in plain text.  

[20030924.0000] jonz: added support for virtual users

added support for virtual users in mysql_drv.  this is necessary when the
users don't actually exist on the system.  use --enable-virtual-users to
enable.  only necessary when using the mysql storage driver.

[20030923.2043] jonz: fix for multiple user bug

restored %u and adjusted docs for multiple local user bug with sendmail

Version 2.7.1
-------------

[20030923.0050] jonz: fixes for libdb tools

several small fixes to issues with compiling libdb tools

[20030923.0045] jonz: bug fix for header decoding

fixed a bug causing some headers to decode incorrectly

[20030923.0030] jonz: bug fix for attachments and signature

added code to specifically NOT append a signature to any segments that have
"Content-Disposition" of type attachment.

[20030922.1900] jonz: added more debug output 

added more debug output (on error) to mysql driver and libdspam

[20030920.0840] jonz: mysql_drv to use -lm -lz 

switched mysql_drv to use -lm -lz in place of -lcrypto.  both apparently have
compress/uncompress functions

Version 2.7
-----------

[20030919.0900] jonz: added dspam_merge tool

Version 2.7.beta.3
------------------

[20030915.0000] jonz: added mysql_drv storage driver

mysql_drv storage driver added for MySQL functionality.  please see README
and tools.mysql_drv for more information.

[20030914.1410] jonz: fixed bug in innocent_hits

fixed bug where some tokens received 2 innocent hits instead of 1 (apparently
is an old but but did not dramatically affect effectiveness)

[20030913.0956] jonz: implemented quarantine locking

implemented quarantine locking mechanism independent of driver locking

[20030913.0900] jonz: internalized API locking

all API locking performed internally (driver-specific).  no external locking
calls exist; part of _ds_init_storage and _ds_shutdown_storage.  reason:
not all drivers will require context locking (and hopefully someday neither
will libdb3/libdb4 drivers).

[20030912.0000] jonz: locks to use USERDIR

for driver compatibility, all .lock file locking takes place in USERDIR, even
for large-scale implementations

[20030911.0000] jonz: driver config script management

implemented driver configure script management and tools.[driver] for
driver-specific tools.

Version 2.7.beta.2
------------------

[20030910.0054] jonz: message header decoding

added message header decoding per RFC 2047

[20030909.1830] jonz: implmented standardized return codes

implemented standardized return codes for the major api functions:
EINVAL, EFAILURE, ELOCk, EFILE, EUNKNOWN

[20030909.1730] jonz: ported all tools to new driver API

ported all tools to new driver API.  dspam_purge has been replaced with
a driver-specific purge mechanism (default: libdb4_purge), due to the fact
that not all drivers will need to purge, and recreating datafiles is a very
specific function...still uses the storage driver api's locking mechanism.

[20030909.0051] jonz: removed dspam_convert

removed dspam_convert tool for 2.5->2.6 upgrades

[20030909.0051] awn: configure script changes

`--enable-gcc-warnings' configure option is added.

[20030908.2000] jonz: implemented storage driver API

implemented storage driver api.  default driver is libdb4_drv

[20030907.1627] awn: dspam_genaliases changes

dspam_genaliases now generates `nospam-USER' aliases (aliases for false
positive reporting) by explicitly request only.  New `--nospam' command
line option is used for this.

Version 2.7.beta.1
------------------

[20030907.1140] jonz: user identification and passthru changes

the method of user identification and passthru has been changed:

  - DSPAM no longer recognizes -d to identify the user, but instead --user
    must be used.  --user will never be passed onto the local delivery agent.

  - In order to pass the -d flag through to the local delivery agent, it
    must be specified either separately on the commandline, or at configure
    time. 

  - To allow -d flag support to be supported at configure time (and when
    overriding untrusted users), the $u variable has been added to dspam.
    any commandline arguments passed through DSPAM matching $u will be
    replaced with the actual destination username (specified with --user
    or automatically forced for untrusted users).

These changes require some modifications to the mailer configuration.  In the
following example for sendmail, you would change the following line in
the Mlocal block:

A=/usr/local/bin/dspam -d $u

to:

A=/usr/local/bin/dspam --user $u -d $u

--user is not passed through to the LDA, but -d is.  Alternatively, you could
remove '-d $u' from sendmail.cf, and configure dspam with:

--with-local-delivery-agent="/path/to/lda -d \$u"

NOTE: be sure to escape the $ in $u ONLY when specifying it on the commandline.
This will prevent $u from being overwritten with the shell's environment
variable 'u'.

Specifying this at configure time is especially useful if you plan on running 
dspam via commandline and do not want to have to specify -d [username] in 
addition to your --user [username] arguments.

[20030907.1440] jonz: removed --deliver-cmd and --quarantine-cmd

removed runtime --deliver-cmd and --quarantine-cmd functions; added configure
time --with-quarantine-agent="/path/to/agent" to override default quarantine
function.

[20030906.0000] jonz: fix for boundary definition identification

fix to detect non-lowercase multipart boundary definitions

[20030906.0000] jonz: partial rewrite of internal sorting routines

partial rewrite of tbt sort routines to drop recursion and potential stack
problems to follow.  problems only experienced when using API with
multithreaded code.  original patch submitted by Stuart Gathman 
<stuart@bmsi.com>

[20030906.0000] jonz: forced --deliver-cmd and --quarantine-cmd to require
trusted user permissions.  dspam also must be compiled with 
--enable-insecure-functions for them to be available.

[20030906.0000] jonz: trusted user implementation

implemented trusted user approach with user and passthru overrides for the
untrusted users.  see README for more information

Version 2.6.5.2
---------------

[20030906.0000] jonz: insecure parameter check

insecure parameter check; checks parameters for insecure characters:
| ; < > ` 

Version 2.6.5.1
---------------

[20030905.1105] jonz: partitioned insecure functions

partitioned potentially insecure functions to require the configure flag 
--enable-insecure-functions to be set to activate.  these include:

--deliver-cmd
--quarantine-cmd

special attention needs to be given to the execution permissions of the dspam
agent when enabling these functions to avoid users being able to 
execute arbitrary commands on the server.  it should be understood that these
are potentially insecure functions and could potentially lead to the execution 
of arbitrary code if exploited by a malicious user or CGI.

[20030905.0418] jonz: fixed bug: from header corruption

if MTA is passing in From headers, they were being corrupted by DSPAM's
header parsing.  fixed to specifically parse From headers differently

[20030904.1422] jonz: fixed bug with quoted-printable debugging

fixed a small bug that would fail to decode a quoted character immediately
following a line break

[20030904.1127] awn: c89 compatiblity

C89 compatiblity patch is applied.  Patch author: Albert Chin-A-Young
<china@thewrittenword.com>

	* configure.ac, base64.c, decode.cn dspam.c, error.c,
	error.h, libdspam.c, localdb.c, lock.c, signature.c,
	tools/dspam_dump.c: Allow building with a C89 compiler
	which does not have ISO varargs.

[20030904.1046] awn: work around Solaris' make

tools/Makefile.am doesn't uses $< authomatic variable because Solaris
make (at least some versions) doesn't supports its.

[20030904.0700] jonz: segfaulting on _ds_message_destroy

fixed a bug where destroying CTX->message caused a segfault.  fortunately, this
bug would have never been reached by the agent or the api.

[20030904.0700] jonz: nfs locking

modified lock.c to work over nfs mounts, only checking pid when hostname 
matches.  maximum 20-minute stale lock removal.
 
[20030903.1716] awn: dspam_corpus and dspam_genaliases update

dspam_corpus and dspam_genaliases are use real path to the dspam binary
instead of assuming default /usr/local/bin/dspam.

dspam_genaliases outputs aliases table to the stdout now by default.
Use new `-o filename' or `--output filename' option for redirect its to
the file.

dspam_genaliases generates `nospam-USER' aliases in addition to the
`spam-USER' aliases now.

[20030903.0145] jonz: fixed memory leak in dspam agent

fixed internal memory leak in dspam agent where CTX->message was not destroyed.
only leaked until dspam agent exited, then memory was reclaimed

[20030903.0145] jonz: updated example.c 

updated example.c to show correct CTX->message destruction

[20030903.0115] jonz: fixed bug in false positive reporting

fixed bug where innocent_hits incremented twice on false positive report

Version 2.6.5
-------------

[20030902.0000] jonz: added --version commandline parameter

added --version commandline parameter to display version; -v is not used as
it could be a passthru parameter to an LDA.

[20030902.0000] awn: dspam_purge changes

minor fixes to dspam_purge tool

[20030901.0000] awn: configure changes

- implemented checks (and use of results) for <sys/time.h> <time.h> 
- checking for math.h and fabs() were added, use -lm where need
- aesthetic changes

[20030901.0000] awn: removed compiler warnings

removed "no previous prototype" warnings with some compilers

[20030901.0000] awn: compiler warnings

miscellaneous changes to remove some compilation warnings

Version 2.6.5-rc1.1
-------------------

[20030831.0000] jonz: debug output

removed left over debug output

Version 2.6.5-rc1
-----------------

[20030829.0000] jonz: fixed broken rfc attachments

made compensation for broken rfcs with embedded attachments, where original
message should've been message/rfc822 but was instead attached as plain/text.
this caused attachments to be processed/consume large quantities of time.
decode.c modified to accept a new boundary definition from any header.

[20030829.0000] jonz: --corpus flag foregoes message delivery/quarantine

use of the --corpus flag will now prevent the messages fed in as corpus from
being delivered/quarantined

[20030829.0000] jonz: added commandline delivery override

commandline flags --deliver-cmd and --quarantine-cmd added to override the
default behavior for delivery (MLOCAL) and quarantine (either MLOCAL or
quarantine depending on configuration).  syntax:

dspam --deliver-cmd "/path/to/cmd -flags" 
dspam --quarantine-cmd "/path/to/cmd -flags"

(be sure not to use = sign).

when overridden values used, the user id is by default NOT passed through to
the called program.  use --with-passthru to pass ARG_USER %USER through to
the called program.  example:

dspam --deliver-cmd "/bin/cat" --with-passthru

actually calls: /bin/cat -d [username]

dspam --deliver-cmd "/bin/cat"

actually calls: /bin/cat

[20030829.0000] jonz: signature insertion moved inside body tag

dspam signature now inserted (wherever possible) inside HTML body tags to
avoid droppage under certain conditions.

[20030829.0000] jonz: changed dspam signature

dspam signature changed to a visble signature to work with clients that 
reformat only visible data (Eudora).  new signature:

!DSPAM:[SERIAL]!

Version 2.6.5-beta-2
--------------------

[20030826.1800] jonz: added --enable-delivery-to-stdout option

added --enable-delivery-to-stdout option which causes all delivered messages
to be printed to stdout rather than piped to an LDA.  if you wish to have spams
printed to stdout as well, use the --enable-spam-delivery option in 
conjunction.

[20030825.0031] jonz: signature attachment mode

coded signature-attachments mode, rewriting messages to include a dspam
signature attachment with full data, instead of writing the server-side
attachment.  use --enable-signature-attachments to enable. 

[20030824.2345] jonz: application/dspam-signature media type

added application/dspam-signature media type recognition

Version 2.6.5-beta-1.1
----------------------

[20030823.2010] jonz: fixed bug for empty headers

fixed a bug where segments with empty headers would be dropped in reassembly 
(currently these only seem to appear in mailer-daemon messages)

Version 2.6.5-beta-1
--------------------

[20030823.1804] jonz: groups now share same signature file

groups now share same signature file enabling them to use a single group alias 
for forwarding spams.

[20030823.1339] jonz: added new configure flags

--enable-homedir-dotfiles
When enabled, instead of checking for $USERDIR/$USER[.nodspam|.dspam],
DSPAM will check for a .nodspam|.dspam file in the user's home directory.
 
--enable-opt-in
Causes DSPAM to filter mail only for users with a .dspam dotfile.  The default
is opt-out, which requires a .nodspam file to exist to bypass filtering.

when using --enable-homedir-dotfiles, dspam installs as setuid root.

[20030823.1100] jonz: fixed segfaulting on signature reversal

[only affected alpha-4-internal]
fixed a bug where dspam segfaulted while reversing a signature making it
impossible to train dspam using signatures with alpha-4-internal.

[20030823.1100] jonz: added support for message/rfc822

[only affected alpha-4-internal]
added support for parsing message/rfc822 components; signature was not being
found in forwarded messages using this media type.

[20030822.0929] jonz: added fp alerts to cgi

added customizable false positive alerts to cgi.  alerts list will be
compared to message headers and hilight all messages that match in yellow.
alerts are stored as $USERDIR/$USER.alerts.

[20030822.0929] jonz: fixed decoding header bug

fixed a bug in the header decoding where the original encoding type was
reassembled into the message, instead of the decoded type.  fix only
affected alpha-4 (internal). 

[20030822.0929] jonz: moved signature append to process

moved appending of signature out of delivery_message and into the process
function, using the new message structures instead of parsing.  this also 
fixes a problem in that on memory failure, the delivery_message function
will no longer need to allocate memory.

[20030822.0016] jonz: adjusted lock timeout

adjusted lock timeout from 10 to 20 seconds.  depending on the load of your
machine, this could be set higher or lower.  the higher the setting, the less
chance of any failover deliveries being made, and the more chance of multiple
processes lined up waiting for a lock on a user's mailbox.

[20030822.0014] jonz: documentation tweaks

a few miscellaneous tweaks

[20030821.2145] jonz: added --enable-spam-delivery

added configure flag --enable-spam-delivery causing all spams to be delivered
instead of quarantined (for use with X-DSPAM header filtering

[20030821.1935] jonz: rewrite of message post-processing

Message post-processing rewritten; including appending of signature, 
message re-write, etcetera.  

[20030821.1908] jonz: added header information

X-DSPAM-Result: Spam || Innocent
X-DSPAM-Probability: (Actual Probability)

[20030821.1820] jonz: removed CTX->copyback

CTX->copyback is now obsolete.  All base64 decoding is performed on 
CTX->message, which is available from the context, or via calling
_ds_assemble_message() function using the message structure as a parameter.

[20030821.1730] jonz: changes to DSPAM_CTX

+  struct _ds_message *message;          /* Message Components */

for compatibility with existing API, dspam_process still accepts a const char *,
however tools that already perform message actualization (such as the DSPAM
agent) can set CTX->message to the existing struct _ds_message * to avoid
reprocessing the message, and to carry over any encoding changes.

[20030821.1730] jonz: implemented new decode/actualization functions in sig

implemented use of new actualization and decoding functions [decode.c] in
dspam.c's signature scan code. 

[20030821.1729] jonz: finished block decoding functions

/* Public decode function */
char *                  _ds_decode_block(struct _ds_message_block *block);
                                                                                                                                                                   
/* Private decoding functions */
char *                  _ds_decode_base64(const char *body);
char *                  _ds_decode_quoted(const char *body);

[20030820.0015] jonz: finished preliminary message actualization

decode.c: finished preliminary actualization code (code responsible for
actualizing a message into its individual components).  experiments with
plain messages and non-embedded multipart messages succeeded.  next phase of
testing to include embedded multipart messages, including spams that are
designed to frequently break RFC.  once testing/patching is complete,
decoding routines to follow.

[20030819.0000] jonz: signature embeddedding changes

signatures are now embedded in every text segment of a message to
insure they are forwarded properly

[20030818.1350] awn: fix for empty messages

(Submitted by Andrew W. Nosenk  <awn@bcs.zp.ua>)

* added check for empty data to prevent segfault

[20030817.1336] awn: configure script changes

(Submitted by Andrew W. Nosenko  <awn@bcs.zp.ua>)

* configure.ac: Work around versioning issues of some versions of
  db-4.  E.g. db_create() may be not a real function but simple
  forwarding macro to the db_create_4001().

* configure.ac: New configure option `--with-db4-libraries' (as
  pair for `--with-db4-includes')

[20030817.1230] jonz: added --disable-bias configure flag

when configure is run with --disable-bias, dspam no longer biases the
statistics in favor of innocent mail.  This may increase the filter's
effectiveness in catching spam, but could also potentially result in less
false positive protection.  some argue that eliminating bias is more
accurate, not less.

[20030815.0300] jonz: added dspam_genaliases script

a small script to create an aliases table from /etc/passwd

[20030814.1928] jonz: added large-scale directory support to tools

ported tools to support large-scape directory support (see below).

[20030814.0005] jonz: added large-scale directory support

when configure is run with --enable-large-scale, dspam stores all its user
files in large-scale mode.  for example, user root's files would be stored in
/etc/mail/dspam/r/ro/root.  directories are created automatically as needed. 

Version 2.6.4.1
---------------
                                                                                
[20030816.2352] jonz: parse fix for boundaries with spaces
                                                                                
added fix for multipart emails with spaces in the boundary definition
(e.g. boundary= "blah").  Discovered in some of the newer 'Urgent Response'
type spams.

Version 2.6.4
-------------

[20030809.1115] jonz: corpus spams marked as misses

spams learned through dspam_corpus are now marked as misses instead of 
caught spam.

[20030808.1945] jonz: changes to header processing

Message-ID is now considered for useful information.  Received header is now
considered, but parsed in a different manner preserving IP addresses and
other useful information.

[20030808.1945] jonz: blank signatures will no longer get written

blank signatures are a result of a failover passthrough for a particular
user.  dpsam has been changed to not write a signature if the signature
itself is blank, preventing <!DSPAM:> from appearing in an email.

[20030808.1945] jonz: added .nodspam file functionality

in an attempt to conserve disk space, a username.nodspam file may be
touched in the /etc/mail/dspam directory, which will cause all messages
for that user to be passed through dspam and not processed.  this will
prevent a dictionary or signature file from being built and save disk
space.  users wishing not to use dspam can still simply not use it,
but dropping a .nodspam file will prevent any files from being created. 

[20030805.1630] jonz: fixed multiple header destroy calls

fixed bug where the header nodetree was destroyed a second time in some errors
that cleaned up and returned, causing a segmentation fault.

[20030805.1400] jonz: added quoted-printable decoding

added quoted-printable decoding; decodes hex codes into actual characters.

[20030805.1230] jonz: documentation correction for dspam_corpus

dspam_corpus uses --addspam flag, not -a anymore

[20030805.1200] jonz: added verbose debugging option

added --enable-verbose-debug for verbose debugging information to be written
to /tmp/dspam.debug

[20030805.1200] jonz: new line unbreaking code

new line unbreaking code to unbreak only quoted-printable lines

Version 2.6.3
-------------

[20030801.0930] jonz: debug after context destruction

fixed a bug in dspam.c that reported debug information for a context
after it had been destroyed.

20030801.0930] jonz: dspam_clean to create new databases

dspam_clean tool rewritten to create new databases when called in the same 
fashion as dspam_purge.  this helps keep the databases in good health and
smaller filesize.
 
[20030801.0900] jonz: fix for PGP signatures

fixed formatting bug causing PGP signatures to be corrupted.  fix required
removing line unbreaking from message which could potentially cause dspam to
lose one or two signatures when messages are being forwarded from Microsoft
Outlook.  does not appear to be a significant issue.

[20030801.0900] jonz: fix for unchecked malloc calls

fixed two unchecked malloc calls
=> struct nt *nt_create(int nodetype)
=> struct nt_node *nt_add(struct nt *nt, void *data)

submitted by Thomas Lussing <lussnig@smcc.net>

[20030731.0852] jonz: added syslog logging 

added syslog logging using mail facility

[20030730.2323] jonz: documentation addition for username case

  added this to the README:

  NOTE: Some authentication mechanisms are case insensitive and will
   authenticate the user regardless of the case they type it in.  DSPAM,
   on the other hand, is case sensitive and the case of the username used
   will need to match the case on the system.  If you suffer from this
   authentication problem, and are certain all of your users' usernames are
   in lowercase, you can add the following line of code to the CGI right
   after the call to &ReadParse...

   $ENV{'REMOTE_USER'} = lc($ENV{'REMOTE_USER'});

[20030730.2311] jonz: fixed bug in dspam_stats

fixed formatting bug in dspam_stats causing problem with usernames > 16 
characters.  submitted by Stuart Gathman <stuart@bmsi.com>

Version 2.6.2.03
----------------

[20030729.2205] jonz: fixed more line parsing bugs

fixed some additional bugs in line parsing which may have caused some emails
to appear blank in Microsoft Outlook

Version 2.6.2.02
----------------

[20030729.0225] jonz: internal cleanup

removed unused variables and added prototypes for some functions lacking them

[20030729.0225] jonz: implemented strsep to fix processing snag

large messages resulted in significant processor consumption due to previous
method of splitting up messages line-by-line.  strsep now implemented to remove
this bottleneck.

Version 2.6.2.01
----------------

[20030710.1000] jonz: fixed bug in dspam_stats

dspam_stats now reports TS (total spams) as total spams minus spam misses.

[20030710.1000] jonz: fixed bug in false positives

fixed a bug where false positives reported without a signature would fail to
decrease the total number of spams.  this event should never occur using
dspam, and only addresses this as an issue for any third party software using
the dspam library.

[20030710.1000] jonz: added support for reusable contexts

added support for reusable contexts, enabling a context to be processed 
multiple times.

[20030704.1827] jonz: fixed condition in chomp

fixed a condition in chomp where it could potentially cause a segment fault if
called with a NULL pointer, or a string with zero length.  this should never
occur anyway considering the calling code.

Version 2.6.2
-------------

[20030701.0000] jonz: added DSF_CLASSIFY flag

added DSF_CLASSIFY flag to libdspam.  use of this flag causes libdspam _not_ to
record statistics for a specific operation, but only to evaluate and return
the operation's result.
 
[20030701.0000] jonz: fixed bit assignment bug

fixed a bit assignment bug resulting in clearing of all flags when headers
ignored
submitted by Stuard D. Gathman [stuart@bsmred.dmsi.com]

[20030701.0000] jonz: fixed bugs related to corpus mail

fixed a bug causing corpus mail's headers to be ignored
submitted by Stuard D. Gathman [stuart@bsmred.dmsi.com]

Version 2.6.1.01
----------------

[20030627.1924] jonz: fixed memory free of copyback buffer

copyback buffer is now freed in dspam.c when context is destroyed

Version 2.6.1.00
----------------

[20030622.0000] jonz: added ` as delimiter

[20030620.0000] jonz: added support for group dictionaries

Group dictionaries enable a group of users with similar email behavior to
share the same dictionary while still maintaining a private quarantine box.
Please see README for more information.

[20030620.0000] jonz: added dspam_stats tool

The dspam_stats tool can be used to display the statistics for one or all
users on the system.  Please see README for more information.

Version 2.6.0.69
----------------

[20030618.0000] jonz: line unbreaking correction

correction made to line unbreaking to sanity check for consecutive
equal signs

Version 2.6.0.68
----------------

[20030612.0000] jonz: change to configure tool

changed configure tool to look for db_strerror instead of
db_env_create in the event that libdb was built without
environmental functions

Version 2.6.0.67
----------------

[20030609.0021] jonz: bugfix in line unbreaking

fixed a bug in line unbreaking (where clients use an equal sign
followed by a carriage return to break up long lines) causing
some attachments to be unreadable by some mail clients.  lines
are now only unbroken in text segments.

[20030607.1020] jonz: bugfix in attachment boundaries

fixed a small bug that wrote the boundary twice at the end of
an attachment

Version 2.6.0.66
----------------

[20030603.1900] jonz: bugfix in line unbreaking

fixed a bug in line unbreaking (where clients use an equal sign 
followed by a carriage return to break up long lines) causing 
unquoted signatures ending with an equal sign to be malparsed,
causing the email to become slightly jumbled.

[20030603.1800] jonz: DSF_CORPUS flag

added DSF_CORPUS flag for processing messages that are from corpus; 
prevents innocent totals/hits from being subtracted when spam corpuses
are fed in. 

Version 2.6.0.65 
----------------

[20030601.0000] jonz: bugfix for locking

a bug in the locking mechanism for tools fixed; occasionally could cause
a corrupt dictionary

Version 2.6.0.64
----------------

[20030525.2300] jonz: bugfix for boundaries

fixed a bug causing boundaries ending in == to be parsed incorrectly
fixed a bug in parsing boundaries that used = without quotes

[20030523.2300] jonz: bugfix for attachments

fixed bug causing attachments to be dropped

[20030523.2300] jonz: optimizations for large databases

increased database cache to 4MB and implemented alternative btree
sorting routine to greatly speed up database functions

[20030523.2000] jonz: addition of libtool/shared libs

libtool is now implemented to build a shared libdspam library.

[20030523.1830] jonz: bugfixes

bugfix for multipart messages that caused message to be truncated
bugfixes to signature management causing some segfaults
bugfixes to crc64 calls, some calls returned a different crc every time

[20030523.0100] jonz: partial rewrite

Rewrote dspam engine into libdspam, enabling developers to link in libdspam
to provide "drop-in" spam filtering for their projects.

Migrated to 64-bit tokens; previous 2.6-Beta databases using 32-bit tokens
will not work with this new version.

Server-side-signature presently the only signature storage method; looking
into a different method of incorporating signature in emails.

Implemented tracking of spam misses and false positives.  Reported in CGI

[20030521.2315] jonz: url tokens ignored outside of urls

tokens found inside urls are ignored as individual tokens, and only 
represented as Url*token.

[20030520.0200] jonz: bugfix for base64 decoding

fixed a bug that failed to decode non-multipart base64 messages

[20030519.0000] jonz: ignore all html tags without spaces

ignore all html tags without spaces; frequently used to separate tokens

[20030519.0000] jonz: ignored collapsible html tags 

collapsed (rather than overwrote) html tags to join together tokens that
some spammers use such tags to separate.  

[20030518.1500] jonz: addition of dspam_crc tool

dspam_crc tool converts a string into the numeric crc used for storage in
the dspam dictionary; makes it easier to use dspam_dump and grep for a 
particular token

[20030517.1930] jonz: bugfix for as_spam signature

fixed a bug causing the signature not to be displayed
on messages marked as spams

[20030517.1300] jonz: bugfixes 

fixed bugs in signature storage (delete .sig files to fix)
fixed bugs in dspam_purge
fixed bugs causing segfault under some circumstances

[20030516.0052] jonz: exim documentation corrections by Jerome Alet

Exim configuration to directors, not routers

[20030516.0020] jonz: massive rewrite and optimizations

addition of tbt and lht dynamic data structures
rewrite of debugging functions
rewrite of database functions
conversion to crc32 long integers for token management
addition of dspam_convert to convert old databases
renamed dbdump to dspam_dump, removed dbset/dbdelete

these rewrites/optimizations convert all tokens to numeric (long)
values, making processing and sorting much faster.  tbt implements
a binary tree sorting mechanism eliminating qsort.  storing tokens
in numeric format also removes the necessity for the zlib compression
librayr.

[20030514.1500] jonz: bugfix in content identification

small bugfix in content identification that led some emails to miss a
dspam signature

[20030514.1500] jonz: error message output added to debug

error messages previously only made it to stderr.  when --enable-debug
option is used, errors are also printed to debug

Version 2.5.4 - May 14 2003
---------------------------

[20030514.0240] jonz: added autoconf support contributed by Andrew W. Nosenko

thanks to Andrew W. Nosenko for contributing the files/patches to provide
autoconf support to dspam.  please read the README file for instructions.

[20030514.0200] jonz: changed hash to support ints

hash.c modified to support ints or character pointers.  makes tracking
token frequency much faster.

[20030513.2345] jonz: bug in dspam_clean corrected

corrected a bug in dspam_clean causing it to fail

[20030513.2300] jonz: experimental tokenized rules

playing with a few experimental tokenized rules

[20030513.2300] jonz: freebsd makefile setuid root

modified the freebsd makefile to install as setuid root.  this is due to 
freebsd's mail.local requiring the ability to change its uid.  dspam will
not work correctly on the commandline (for example when reporting false 
positives)

[20030513.0325] jonz: changed probabilities for single-corpus tokens

probabilities of 0.0100 and 0.0101 were previously assigned to tokens
appearing only in the innocent corpus.  this has been changed to
0.0099 and 0.0100 to balance out the 0.9900 and 0.9901 used for tokens
that appear only in the spam corpus.  this very small change corrected
3 false positives that appeared.

[20030513.0250] jonz: added documentation for exim

documentation thanks to David Shirley 

[20030512.1930] jonz: applied changes submitted by Andrew W. Nosenko

(DELIMITERS): Plain `^M' character is replaced by appropriate
	escape sequence `\r' for avoiding gcc-3.2.2 warning "multi-line
	string literals are deprecated"

(MAX_FILENAME_LENGTH, MAX_USERNAME_LENGTH): Use system-defined
	limits when available (for example max. filename length under
	Linux is not 128 as harcoded, but 4096).

(USERDIR): Define USERDIR only if not defined somewhere else
	(e.g. from command line).  Very convenient for building binary
	package.

Version 2.5.3 - May 12, 2003
----------------------------

[20030512.1430] jonz: bugfix for ignored headers

a bug was fixed that caused all headers to be ignored if a message was stored
as a raw message in the signature database.

[20030512.1400] jonz: embedded boundary recognition

added embedded boundary recognition to recognize emails with embedded bounaries,
such as those sent by Eudora when special formatting is enabled.
 
[20030512.1200] jonz: documentation

added better documentation for the correct permissions of the dspam 
directories and the correct group memberships for the MTA user. 

[20030512.1200] jonz: locking bugfix

fixed bug in locking that caused a loop if a lockfile could not be created 
(due to file permissions).  also increased lock debugging verbosity.

[20030511.2025] jonz: false positives adjustment

false positives reported now hit a token 3 times innocent instead of 2,
for faster re-learning.

[20030511.2010] jonz: header parsing bug

fixed a header parsing bug that did not carry the original header name
across multiple lines, for example the Received header.

[20030511.1945] jonz: dspam_purge complete

dspam_purge completed and expanded to delete old non-qualifying tokens
and defragment/shrink user dictionaries

[20030511.1945] jonz: rewrite of dspam tools

dspam tools rewritten to support new spam_record structure. 

[20030511.1945] jonz: implementation of struct spam_record

new spam_record structure implemented for database storage; include last
hit date for new purge tool.  subroutines backward compatible to work
with old databases.

[20030511.1827] jonz: bugfix for lock sleep

fixed a bug that caused all dspam processes to sleep for 1 second, even
if a lock was successfully acquired on the first try.

[20030511.1719] jonz: addition of probability information to spams

messages marked as spams now to include the tokens and probabilities used in
the message

[20030511.1600] jonz: body tag filtering

now ignoring body tags.  the only frequently used tags that are being 
considered are font, img, and meta

Version 2.5.2 - May 11, 2003
----------------------------

[20030510.1615] jonz: token word joins with punctuation

token word joins modified to include dollar signs and exlamation points. for
example:

$S A V E$

previously would result in 3 tokens: $S, AV, E$ but now results in one: $SAVE$

[20030510.1500] jonz: bugfix for multipart boundary

a bug fixing a problem with multipart boundaries not being detected when defined
without using quotes has been corrected.  this resulted in the dspam signature
(or identifier) never making it into the message.  for example:

Content-Type: multipart/alternative; 
  boundary='~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~'

is now detected correctly

[20030510.0035] jonz: additional filtering

added additional filtering to ignore words with control characters, 
numbers that are not prefixed with $ or end with %, and any tokens that
do not begin with an alphanumeric character, with the exception of $ and #.

[20030510.0020] jonz: bug fix for lock failures

a bug has been fixed that caused dspam to loop, sending multiple emails
in the event of a lock failure

[20030509.2100] jonz: Makefile for FreeBSD

added makefile for freebsd

[20030509.2015] jonz: procmail fix

added small fix to accomodate some procmail implementations 
that require an empty argument after -a

[20030509.0130] jonz: addition of dspam_purge

please see README for more details

[20030509.0130] jonz: tools to output to stderr

dspam tools to output to stderr

[20030509.0130] jonz: removed probability from db storage

removed the 13-character probability from the hash databases; was 
taking up considerable space and wasn't necessary for the calculation.
is backwards compatible, so there is no need to delete any db's.

[20030509.0040] jonz: ! is now treated as a delimeter

the ! character has been added to the delimiter list

[20030508.2330] jonz: added .lock locking mechanism 

added a .lock locking mechanism to prevent database corruption and/or
quarantine mailbox corruption.

[20030508.1915] jonz: filtering of boundaries

multipart boundaries are now filteres

[20030508.1800] jonz: token word joins

if a token is only one character long, and is adjacent to other similar
tokens, each token will be joined to create a single token.  for example

V I A G R A

will be tokenized as "VIAGRA"

[20030508.1800] jonz: header array abolished

the array holding each header line has been replaced with a nodetree
(dynamic data storage)

[20030508.0800] jonz: bugfix for dspam_clean

dspam_clean segfaults after processing the first user signature file.  this
was due to an invalid database handle being closed.  the correct handle is
now used

Version 2.5.1; May 8 2003:
--------------------------

[20030508.0045] jonz: bugfix for inline comments

inline comments normally used to break up guilty spam words such as
S<!1234>E<!1234>X<!1234>

were only partially filtered, leaving gaps between the letters and causing 
DSPAM to miss the whole word.  this has been corrected to eliminate the space
the comments previously used, bringing the words together for calculation.

[20030508.0025] jonz: strdup() overusage

if only one destination user is specified, strdup() is not used to duplicate 
the original header/body pairs to pass to process_user()

[20030507.1130] jonz: bugfix for multiple users

when multiple users are specified in the local mailer parameters, the first
user process, due to a bug in setting ADD_AS_SPAM, determined whether the
message was spam for all other users.  ADD_AS_SPAM is now reset to its original
value prior to each user's calculation.

[20030507.2200] jonz: increased html filtering

<div and <p html tags are now ignored

Version 2.5; May 7 2003:
------------------------

[20030507.0500] jonz: increased html filtering

td, tr, and table tags are now ignored

[20030507.0500] jonz: increased bare corpus safeguards

the following safeguards have been implemented to prevent false positives
in immature corpuses:

- the minimum number of hits for a token to register at anything above .40
  has been raised from 5 to 20 if the user has fewer than 500 innocent
  messages
- if the user has fewer than 1000 messages, the minumum number of hits
  is equal to 5 + (the spam ratio / 2)

[20030507.0500] jonz: commandline multiple user support 

multiple users on the same commandline (e.g. -d user1 user2 user3) are now 
processed individually.  prior to this, only the first user was processed 
(even though the message was delivered to all users).  this results in each
user having their own unique record of the message in their dictionary and 
signature.

[20030507.0500] jonz: libdb1 -> libdb4 migration

libdb 4 has been implemented after running into some problems with db1 
segmentation faults on large record insertions. as a result, to upgrade to 
this and all newer versions, it will be necessary to delete all existing user 
databases on the system. libdb4 can be found at www.sleepycat.com. it should 
be relatively easy to re-code the db functions for db2 or db3, if the 
administrator doesn't want to use db4. 

[20030506.0400] jonz: buffer.c memcpy implementation

modified buffer.c to use memcpy() instead of strcat() resulting in a 
_significant_ speed increase. the delay caused by strcat() in messages 
with large attachments resulted in message parse times to be +20 seconds. 
using memcpy(), parse time is down to less than a fraction of a second. 
this fix addresses issues with dspam on low-end machines.

[20030506.0400] jonz: server-side storage options

if a token string is longer than the original message, the original message
is stored on the server instead and re-parsed.

[20030506.0400] jonz: zlib compression library

zlib (-lz) is now used to compress server-side signatures. zlib can be found 
at http://www.gzip.org/zlib/.  if you will not be using server-side
signatures, remove the -lz library flag from the makefile.

[20030504.0400] jonz: server-side signatures 

server-side token signatures (SSTS) have been implemented with an optional 
compile flag (set by default). using SSTS will eliminate long, annoying 
DSPAM signatures at the expense of server disk space. the signature appended 
to each email is replaced with a single comment to include a reference token. 
this also enables the complete set of tokens from a message to be recorded 
(although only the top 15 are used in actual calculation).   

compiling without SSTS mode enabled will only record 15 or 60 tokens from a
message, depending on whether more than 5 tokens are recognized.  SSTS mode
will record all tokens.  in either mode, only the most interesting 15 tokens
are used in the calculation.

[20030504.0400] jonz: chained tokens

chained tokens have been implemented providing several new analysis features. 
for example the text 'FREE FOR ALL' will parse into five tokens: 

FREE
FOR
ALL
FREE FOR
FOR ALL

this parsing is not specific to just words, but any type of valid token. 
please read the white paper at: 

http://www.networkdweebs.com/products/dspam/Chained_Tokens.pdf 

...for more information.

[20030504.0400] jonz: token precedence

words not appearing in the opposite corpus were previously assigned a 
probability of .99 or .01. now, priority is given to a token that appears 
more than ten times in a single corpus.  

[20030504.0400] jonz: token case

previously, tokens were case insensitive unless they were in all caps. now,
all tokens are case sensitive. 

[20030504.0400] jonz: short html tags

short HTML tags (less than 15 characters) are filtered out. this helps 
prevent false positives that could be caused by a lack of HTML-based email 
in an innocent corpus. it is normally not desirable behavior to assign a 
higher probability of spam to a message simply because it's in HTML, but we 
don't want to filter out all HTML so longer tags will still be tokenized. 

[2003.0503.0400] jonz: special tokens for urls

URLs are broken down into URL-specific tokens. for example, 
http://www.networkdweebs.com/products/dspam/ will be broken down into: 

Url*www
Url*networkdweebs
Url*com
Url*products
Url*dspam

this should help separate emails with suspicious URLs from emails with the 
same tokens outside of a URL.  

[20030503.0400] jonz: misreported number of messages in quarantine

due to a small bug, the number of messages in a quarantine box can be 
misreported. this has been fixed. 

[20030503.0400] jonz: dspam signature change

the DSPAM signature of previous versions is unfortunately rewritten 
incorrectly by some email clients such as Microsoft Outlook. The signature 
has been modified, and the signature retrieval tool has been coded with more 
of a wildcard approach, to help avoid missing reversal information. 
this only applies to administrators running DSPAM outside of its default 
SSTS mode. 

[20030503.0400] jonz: closing html tags
 
some spams fail to close their /html tag in an attempt to evade some spam 
tools. DSPAM now closes the tag to avoid the dpsm signature being ignored.

[20030503.0400] jonz: ignoring of useless header information

the 'Message-ID', 'Received' and 'Date' headers are now ignored; they 
seemed to be filling up more than half the tokens with useless information 

[20030503.0400] jonz: high asccii characters

tokens with high ASCII characters are now ignored 

[20030503.0400] jonz: forwarded message headers

dspam now ignores message headers for messages forwarded by user as spam with 
no identifiable signature.  this prevents irrelevent information from being
recorded, which could lead to any message in reply to be marked as a false
positive.
 
[20030503.0400] jonz: minor code cleanup for linux build

made some minor changes to code to build without warnings on linux

[20040503.0400] jonz: reequired use of long --addspam flag

the shortened flag for --addspam (-a) has been removed for compatibility 
with procmail (procmail uses -a). in order to use this latest build, 
all spam-box aliases (e.g. spam-bob) must be changed to --addspam. 

[20030503.0400] jonz: flag for chained tokens

added -DCHAINED_TOKENS (enabled by default) switch; those who don't have 
the extra disk space for chained tokens can now turn them off by removing
this compile flag.

[20030503.0400] jonz: debug rework

-DDEBUG now results in debug going to /tmp/dspam.debug 

Version 2.4.1; April 29 2003
----------------------------

[20030429.0000] jonz: dspam_signature tool addition

Added dspam_signature tool for decoding dspam signatures via commandline 

Version 2.4; April 27 2003
--------------------------

[20030427.0000] jonz: signature change

changed the signature to a base64-encoded, BEGIN/END delimited signature. 
people seem to feel more comfortable with it, as it resembles the signatures 
used with PGP, Server Certs, and other encrypted signatures...it's also 
less messy. 

[20030427.0000] jonz: false positive recall mechanism

in the unlikely event of a false positive, a mechanism is now available to 
reverse the information from the false positive and email the message to the 
user. this is made possible via a button while viewing a message in the 
user's quarantine box. 

[20030427.0000] jonz: base64 decoding

new code to Base64 Decode any encoded text segments. some SPAMs being sent 
out today are encoded in an attempt to bypass any filtering.  they are
now decoded prior to analysis and delivery.  this only applies to text 
segments (text/plain, text/html, etc.) and should not affect attachments. 

Version 2.35; April 24 2003
---------------------------

[20030424.0000] jonz: makefile corretion

Makefile.linux: -ldb -> -ldb1

[20030424.0000] jonz: prefixed from line

prefixed messages headed to quarantine with a 'From' header to make mailbox
format compliant.

[20030424.0000] jonz: quarantine box showing no spams

fixed a bug that resulted in caught spams to not show up in quarantine box

Version 2.3; April 20 2003
--------------------------

[20030420.0000] jonz: token insertion bug

fixed a bug that occurs when inserting token information on some
multipart emails, which inserts it into the text/plain segment instead of
the text/html segment

Version 2.2; April 17 2003
--------------------------

[20030417.0000] jonz: reversal information

reversal information is now used in spams to reverse the original 15 tokens
(unlearn and relearn as spam).

Version 2.1; April 14 2003
--------------------------

[20030414.0000] jonz: production changes

applied 0.40 value to words with less than 5 hits
changed spam threshhold from .8 to .9

[2003.0414.0000] jonz: attachments

repaired minor bug in filtering out attachments and html comments

Version 2.0; April 11 2003
--------------------------

Version 2 Initial release

