So you have setup some services and now you want to be able to make the ping is up and also ensure that the webserver is responding. But hang on a minute you only want to check if the webserver is responding only when the router is pinging. And also I dont care weather my router is down. And who lets me know when they come back up. You can see slowly every thing starts becoming complicated.
This is where our mon friend comes in. So why use mon?
Lets just get our hands dirty, and install the mon deamon. For this tutorial I will using Fedora.
yum install mon
The above will install the binaries and configuration files as required. Lets get our hands dirty and start by looking at the configuration file in /etc/mon/mon.cf
. This is the main template that defines what our monitors will be. So lets dig in. There are three main parts to the mon.cf
file.
cfbasedir = /etc/mon
pidfile = /var/run/mon.pid
statedir = /var/lib/mon/state.d
logdir = /var/lib/mon/log.d
dtlogfile = /var/lib/mon/log.d/downtime.log
alertdir = /usr/lib64/mon/alert.d
mondir = /usr/lib64/mon/mon.d
maxprocs = 20
histlength = 100
randstart = 60s
authtype = pam
userfile = /etc/mon/userfile
The above shows where all the files that the daemon uses are located, you can explore these directories and dig a little deeper to discover more, or use man mon
. Files I recommend you investigate are mondir
and alertdir
.
The mondir
stores the monitoring scripts. These are pre-loaded scripts to handle common monitors like ping and httpd checks however you can add your own, in any language. The most interesting part of the monitoring scripts is that the daemon only cares about the return code of the script. So if you create a custom monitor script remember exit 0
is success and exit NONZERO_UNSIGNED_INT
is a failure.
The alertdir
stores the alert handling function, So for example the alert that sends out the email can be found here. You can customize or add your own alert handling here.
Lets start setting up some monitoring.
Before you can define monitors you need to set watch-groups, watch-groups are just a way of grouping multiple server types together, you can define these as you wish. An example group can be defined as: hostgroup
hostgroup router router.localhost
hostgroup webservers web1.shahmirj.com web2.shahmirj.com
In our case we have a server sitting in a isolated network pretending to see if web1.shahmirj.com and web2.shahmirj.com are responding to http requests. We want to first ensure we have a internet connection, therefore we set up a watchgroup called local which monitors our server's access to the internet and a second external watchgroup called webservers which monitors the servers that are dear to us. Lets look at the router watchgroup example:
watch router
service ping
interval 1m
monitor ping.monitor
period wd {Mon-Sun}
watch router what hostgroups would you like to monitor
A service ping is a name of the service group, you can assign multiple monitors to one service.
The interval 1m specify when to run run the service, in this case it is set to 1 minute. The values here are defaulted to seconds when only an integer is given, where as an integer with m
or h
denotes minutes or hours.
monitor ping.monitor - Is the actual monitor script that is run. This script is included with the mon daemon, which can be found at /usr/lib64/mon/mon.d/
. You can change this directory using the mondir
config variable mentioned above. If you browse through the directory you will see a whole host of other scripts that can be handy.
period wd {Mon-Sun} allows you to set when to monitor watch-group router. In our example its set to every day of the week.
Underneath our watch router we now manage our webservers, So our definition will look as follows:
watch webservers
service ping
interval 15m
failure_interval 1m
monitor ping.monitor
depend local:ping
period wd {Mon-Sun}
alertafter 1
alert mail.alert -f mon@shahmirj.com me@shahmirj.com
alertevery 1h
upalert mail.alert -f mon@shahmirj.com -S "Ping is responding" me@shahmirj.com
failure_interval 1m sets the monitoring interval when the monitor fails.
depend router:ping lets you use dependencies in monitoring, the format is depend <watchgroup>:<service>
. In our case we make sure that the router is pinging before continuing to monitor the web servers.
alertafter 1 sets how-many times should a monitor fail in succession before an alert is sent out.
alert, define which method show an alert be raised. In our case we send an email to me@shahmirj.com. The -f
variable represents the from field when sending an email. By default no alert is sent till an alert is specified
upalert is a helpful trigger that is used for succeeding monitors which have previously failed accordingly to the rules, It has the same rules as alert.
To test if all is working I usually use the following command, and manually break a few things to check the link. If you are sending email alerts make sure the alert email is getting through your email spam filters.
$ mon -d
Once all is well start the daemon, and set it to start up on boot.
$ chkconfig mon on
$ service mon start
Mon daemon is one of the most powerful monitoring scripts out there, from the above you can see how a simple monitor can be tuned to the exact needs and deliver a peace of mind to your systems.
I will recommend to read the following to get the full power of the mon daemon
Any questions just ask