update at 28th feb 2005 : apply to Nagios 1.2. I will update it when I'll upgrade to Nagios 2

The inexpensive way of monitoring SQL server availability as well as other windows services is to have a Linux box running Nagios.// I installed Nagios because I had a strange problem when installing and using SQL Server Reporting Services for the first time : the ReportServer service stopped unexpectedly during the night (one hour before some reports were scheduled to run… Murphy's law). I wanted to monitor this service and, of course, use an open-source tool for that. So I discovered Nagios, that allowed me also to ping the machines and check if SQL Server was up.

What do you need ?

Nagiosthe monitoring program itself, running on the Linux box
Nagios Pluginsthe plugins allowing you to monitor SQL server and other machines/services
FreeTDSgives you the connectivity to SQL server from Linux
NSClientThe Nagios client for windows

And if you want to use NRPE :

NRPEThe plugin allowing you to run commands on remote machines
NRPE_NTport of the NRPE deamon as windows service

Installation

I am sure that running Red Hat or Mandrake will allow you to install Nagios, the plugins, NRPE with nice RPMs packages.

An important thing to do anyway is to read the HTML documentation of Nagios. It is quite well done and precise. You just have to follow the instructions step by step to install Nagios. At several places it is repeated that Nagios is difficult to configure. There is a bunch of configuration files indeed, but you should be able to grasp it quite easily, and if you are not afraid of using a text editor (vim of course), there should be no problem.

Before thinking of changing these configuration files, you need to compile/install the plugins. The make install copies most of them in the libexec/ directory, but the one I was interested in, the shell script that connect to SQL Server, was not installed, so you need to copy it from the plugin sources directory (in the contrib/ subdir). Its name is check_mssql.sh.

For this plugin, you need to have FreeTDS installed. It allows you to establish a connection to SQL Server from your Linux box. Once done, the plugin (the shell script can be tested directly by passing as parameters the hostname, user, pwd and server version. It runs a simple select against sysprocesses… but everything is visible inside the script source).

So, after installing all this, I configured Nagios to ping my SQL Server machines, and to run the SQL Server tests. This has to be configured inside <tt>services.cfg</tt>. The machines have to be defined inside hosts.cfg. All the config files were created by the make install process into the etc/ subdir of Nagios, with the .sample extension. The best way is obviously to copy (cp) them into the same files without .sample and to slim them down to what you want. You can perfectly start with one host and one service (just a PING) to monitor, to test Nagios.

For the SQL Server plugin, I created in SQL Server a login named nagios with just datareader rights in tempdb (while for what the script does, I could have given it no specific database access anyway). After that I added the username and password into user variables in the resource.cfg config file.

Monitoring other services on Windows boxes

After that, I needed to check the health of my Reporting Services services. So I went first to NRPE and the NRPE_NT client. It is a lightweight exec running as a windows service, and allowing you simply to execute some remote command, like a RPC call. That is useful if you want to check something specifically, by for example just creating a WSH script returning a values passed by to nagios.

But for my needs, I found an easier solution : the NSClient, which is also a windows service, that provide a set of fixed functionalities that is what most of us need : check CPU, RAM, disk, services, processes… Also interesting for me : it is done in Delphi.

After downloading NSClient, you just need to put it somewhere on your server, to run pNSClient.exe /install to install the service and make a net start NSClient. Read the documentation to add an optional password to send to it from Nagios for a bit of security.

After that, you need to check that your check_nt plugin is installed, it should be in the libexec/ subdir of your Nagios installation. Then just test it to see if it is working fine :

./check_nt -H 10.1.1.1 -p 1248 -v SERVICESTATE -d SHOWALL -l ReportServer

Replace of course the IP address by yours.

If everything is ok, add the following lines into your checkcommands.cfg config file :

# NSClient commands
define command{
        command_name    check_nt_disk
        command_line    $USER1$/check_nt -H $HOSTADDRESS$ -p 1248 -v USEDDISKSPACE -l $ARG1$ -w $ARG2$ -c
 $ARG3$
}

define command{
        command_name    check_nt_cpuload
        command_line    $USER1$/check_nt -H $HOSTADDRESS$ -p 1248 -v CPULOAD -l $ARG1$
}

define command{
        command_name    check_nt_uptime
        command_line    $USER1$/check_nt -H $HOSTADDRESS$ -p 1248 -v UPTIME
}

define command{
        command_name    check_nt_clientversion
        command_line    $USER1$/check_nt -H $HOSTADDRESS$ -p 1248 -v CLIENTVERSION
}

define command{
        command_name    check_nt_process
        command_line    $USER1$/check_nt -H $HOSTADDRESS$ -p 1248 -v PROCSTATE -l $ARG1$
}

define command{
        command_name    check_nt_service
        command_line    $USER1$/check_nt -H $HOSTADDRESS$ -p 1248 -v SERVICESTATE -l $ARG1$
}

define command{
        command_name    check_nt_memuse
        command_line    $USER1$/check_nt -H $HOSTADDRESS$ -p 1248 -v MEMUSE -w $ARG1$ -c $ARG2$
}

define command{
        command_name    check_nt_fileage
        command_line    $USER1$/check_nt -H $HOSTADDRESS$ -p 1248 -v FILEAGE -l $ARG1$ -w $ARG2$ -c $ARG3
$
}

and you can add a service in services.cgf like this :

 # Service definition
 define service{
        use                             generic-service
        host_name                       mysqlserver
        service_description             ReportServer
        is_volatile                     0
        check_period                    24x7
        max_check_attempts              3
        normal_check_interval           5
        retry_check_interval            1
        contact_groups                  sql-admins
        notification_interval           240
        notification_period             24x7
        notification_options            c,r
        check_command                   check_nt_service!ReportServer
        }

As it took me time to find the info, I add here a extract of the NSClient documentation about the check_nt_fileage :

Syntax: check_nt -H <hostname> -p <port> -v FILEAGE –l <filename> [-w <warning> ] [-c <critical >]
  • <filename> : file to check. Don’t forget to use
    for each \ (c:\\autoexec.bat)
  • <warning> and <critical> : maximum number of minutes since the last update of the file.

Example:

./check_nt -H 192.168.1.1 -p 1248 -v FILEAGE –l “c:\\program files\\nsclient\\pnsclient.exe” -w 1440 -c 2880
 
sql_server/outils/nagios.txt · Dernière modification: 2006/03/05 14:43 (édition externe)