Extensive MRTG Based Monitoring for MailScanner

The Future for MailScanner-MRTG

What is this page about?

    See the title! This page is a loose collection of my thoughts about the future directions this project may take. This doesn't mean that all (or even any) of the things mentioned will happen and those that do happen (assuming any do) may not happen anytime real soon. This is intended to be a document for discussion, and feedback is welcomed (please use the Open Discussion Forum, so that others can read and contribute). I may even revise this document as time progresses, I might even record those changes for all to see!

A little background

    MailScanner-MRTG was originally written by Dale Lovelace in 2002, and Dale produced the versions up to 0.05. Around September 2003 I sent Dale some patches to improve the efficency of the log parsing routines. Dale wrote back to me suggesting that he wasn't able to dedicate much time to the project any longer and asking if I would be interested in taking over as maintainer and perhaps adding some of the features people were requesting.

    Since October 2003 numerous changes have been made by myself and others. The main focus of these changes was improving OS and MTA support, improving execution time (mainly by consolidating functions) and trying to improve the accuracy and relevence of the graphs. In the process of doing this (and with a view to possible future developments) the code has been reorganised in a more modular way

Other MailScanner monitoring programs

    There are two other popular packages for monitoring MailScanner, David While's Vispan (formerly called Mailstats) and Steve Fregard's MailWatch. Both these packages have some overlap with the functionality of MailScanner-MRTG (as well as the advantage of shorter names!). There are also substantial differences between the programs, I don't use MailWatch (purely because I don't need its features - I have heard it is very good) but I do use Vispan because of the additional information it provides. I regularly get feature requests along the lines of "Can you add such-and-such feature like MailStats/MailWatch has?" The general answer to this is no, becausei...
   a) I'm not so rude to plain copy their implementation
  b) I don't want to write code for something that has already been done
  c) Different users have different needs, having three packages that duplicate each others functionality isn't the best way to address that
This is one of the main reasons for this document. I want to set out what I see as MailScanner-MRTG's function, and suggest a direction for its development to provide something that is both useful and distinct.

Mission statement

   That sounds far too pretentious (and smacks of "management speak"), suggestions please for a better section name....

   Anyway, I envisage that the future focus of the project will be on monitoring the health of a MailScanner installation (rather than concentrating on the actual mails/ viruses/ spams etc.) This will expand MSMRTG's role from providing passive monitoring (where users check out MSMRTG if they think they may have a problem) to also some element of active monitoring where MSMRTG can respond to events by triggering alerts.

   Any changes which are made should attempt to preserve historic data, except where that historic data is hopelessly broken(!) We will attempt to preserve all current monitoring, new features will be in addition to the current ones.

Active monitoring

   In the first instance I intend to add functionality that can send passive service checks to Nagios, based on configurable thresholds. Its possible (if there is demand) that this could be further expanded to include sending alerts by other means (probably this would be simply providing a means to pass arguments to an external program that could, for example, send SMS messages, pager messages, play a wav file, etc...) Maybe we'll even look at the possibility of sending snmp traps, maybe.

   It may also be worth investigating what other (non-graphable) perfomance data could be usefully monitored, to provide alerts based on that. In the future we could expand to permit active service checks from Nagios, and possibly even queries via SNMP.

Web interface

   The number of graphs has now grown to 'more than a screenful' even for those with huge screens and teeny resolutions. There are likely to be even more graphs in future, so it seems sensible to do something about this. Adding the concept of state (i.e. Critical, Warning etc.) also potentially creates extra demands for screen space. I would also like to include descriptions of the graphs, providing hints to interpreting them (someone sent me some good ones a while back). We should also endeavour to internationalise any text content. There have been situations where it would be sensible to be able to supress graphs which are either blank or duplicate (for example the disk space graphs for people with their whole installation on one partition). Providing a new web interface, preferably created by server side scripts (php is looking to be a likely candidate) is now a priority.

   If/ when we do improve the web interface it would also be a good idea to update the web site to maintain a constant image throughout.

MRTG vs RRDtool

   Moving to RRDtool from MRTG requires serious consideration. As we add graphs we increase the contribution to system load caused by regenerating the graphs every five minutes - most of the time these are not needed. RRDtool generates the graphs only when they are requested by the web page. They can also be made more flexible (to show custom time periods for example). Using RRDtool would also permit us to run the MailScanner-MRTG script directly from cron, with it passing data directly into the RRD database, without having MRTG repeatedly calling us. This in itself would give us more control. RRDtool also handles non-integer data (which MRTG cannot). In general RRDtool is a better tool for the job. However, changing at this stage presents challenges. We would need to ensure that the transition could be achieved smoothly and that historic data could be maintained. For these reasons I have not decided whether I think this should be done.

Thats all folks!

   Well, theres a fair bit of work just in the above, so I think I'll leave it there for now. Please feel free to discuss the issues above in the forum, I'll be most interested to hear some feedback.

Kevin Spicer
February 2004 Logo