Frequently Asked Questions (FAQs)


Index

I'm having trouble compiling Netsaint - What can I do?

Compiling NetSaint on different OSes doesn't really seem to be much of a problem anymore, unless you're missing some string functions...

If you're getting errors about the strncat(), strncpy(), or snprintf() functions, you probably don't have the glibc libraries installed on your system. This tends to happen most often on HP-UX and Solaris boxes. I've tried to prevent potential buffer overflows in NetSaint and the CGIs by using these functions, so they are all over the code. If you don't want to install the glibc libraries for some reason, you'll have to find some other way to get everything compiled. If all you're missing is the snprintf() function, try grabbing the snprintf.c file from http://www.ijs.si/software/snprintf/ and adding it to the Makefiles so that it gets included during when you compile things.


I can't find or am having trouble compiling the statusmap and/or trends CGIs...

If you compile all the CGIs, but don't find the statusmap CGI or trends CGI, you probably don't have Thomas Boutell's gd library installed correctly on your system. The gd library (and thus the statusmap CGI) also requires that you also have the zlib and png libraries installed. Version 1.6.3 or higher of the gd library is required, as the CGI generates a PNG image of your network layout.

If you find that the statusmap CGI has not been compiled, make sure you have the gd library installed on your system, run 'make clean' and rerun the configure script with the following options:

./configure --with-gd-lib=LIBDIR --with-gd-inc=INCDIR [other options...]

Replace LIBDIR with the directory in which the gd library is installed (usually /usr/lib or /usr/local/lib) and replace INCDIR with the directory in which the header files for the gd library are installed (usually /usr/include or /usr/local/include).

After you rerun the configure script, make sure to recompile the CGIs and install them in their proper location.

If you are having a lot of trouble getting the configure script to recognize that you have the gd libraries installed (and you're running RedHat Linux), I would recommend installing the 1.8.1-2 version of both the gd and gd-devel RPMs from www.rpmfind.net - they seem to work without requiring any special arguments to the configure script.


"NetSaint process may not be running" warnings in the CGIs

If you are getting erroneous messages about the NetSaint process not running while viewing the CGIs, its probably due to one of the following items:

  1. You haven't defined a command to check the status of the NetSaint process. This is done by supplying a value for the process_check_command directive in the CGI configuration file.

  2. If you have defined a command, perhaps it is not returning the proper exit code. The command must follow the same rules as other plugins (see the plugin guidelines for more info): a return code of 0 indicates that NetSaint is running, other values indicate that NetSaint is either not running or in some degraded state.

  3. If you're using the check_netsaint plugin, check the sanity of the arguments that you're passing it. The first argument is the full path to the status file, the second argument is the number of minutes that the status file should be "fresher" than, and the third argument is a string that matches the NetSaint process command line obtained from the ps command. Try running ps axuw | grep netsaint to see what string you should use - a common example of a matching string is "/usr/local/netsaint/bin/netsaint -d /usr/local/netsaint/etc/netsaint.cfg"

  4. If you have defined a process check command that uses the check_netsaint plugin, make sure that the plugin is functioning as it should. Execute the check_netsaint plugin from the command line and check the results. If the plugin is reporting that the NetSaint process cannot be found or if it returns a "Could not open pipe" message, you may need to edit the PS_RAW_COMMAND definition in the common/config.h file of the plugin distribution to match the syntax for the ps command on your system. For example, under FreeBSD you should use either "/bin/ps -ao 'state user ppid args'" or "/bin/ps -axo 'state user ppid command'" (it seems to vary). Once you've changed the PS_RAW_COMMAND definition, recompile the plugins and test the newly compiled check_netsaint plugin to see if it works.

The CGIs will not allow you to sumbit any commands while they think the NetSaint process is not running. This is done primarily to prevent people from accidentally submitting multiple shutdown/restart commands that don't get processed until NetSaint is started at some future time.


Hosts are incorrectly listed as being DOWN and/or services have a status of "HOST DOWN"

This seems to be one of the biggest issues for new users. 99.9% of the time this problem is due to an incorrect command definition for the host check command you specified in the host definition.

A major cause for this problem was due to a syntax change to the command line arguments of the check_ping plugin. You need to make sure that the host check command is using the proper syntax for the version of the check_ping plugin that you have. You can check to see if the command works properly by executing it manually from the command line. Recent versions of the check_ping plugin require that a -p flag be used to specify the number of packets to send. Previous versions of the plugin did not require this flag - that's where the problem lies. Check your host check command definition(s) to make sure they are using the proper syntax. Example:

command[check-host-alive]=/usr/local/netsaint/libexec/check_ping $HOSTADDRESS$ 100 100 1000.0 1000.0 -p 1

Important! Just because you have a service that is monitoring ping statistics for a host does not mean that the actual host status is being checked. The status of a host is only checked when a service check results in a non-OK state or if the host was previously down and a service check results in an OK state.

Some symptoms of incorrect host check commands include:

  1. Hosts incorrectly being listed as DOWN
  2. Services that have a status of "HOST DOWN", even though the host they reside on is actually UP
  3. Alternating alerts/notifications about host problems and recoveries


When hosts go down, I get notification about services instead of hosts and the service notifications contain incorrect data

Several people have reported this problem and I spent hours trying to find the problem until I realized it wasn't a bug in the code. If you get service notifications when you should be getting host notifications (and the service notifications you get seem to contain bogus data), check your contact definitions in the host config file. They are most likely incorrect.

Make sure that you are not using the same notification command for service and host notification commands. Service and host notifications are very different and make use of macros which are not transferrable between each type. Look at the sample host config file provided with NetSaint to see what the contact definitions look like and how the service and host notification commands differ. If you're wondering what macros can be used in either type of notification, look at this table.


Can I monitor a host without defining any services for it?

No. You must define at least one service for each host you want to monitor. NetSaint is primarily geared towards monitoring services - hosts are really only checked when there are problems or recoveries with services.


How can I change the timeout values for service checks?

First you need to identify where the timeout is occurring. Most plugins time out after 10 seconds of not being able to contact a service (FTP, HTTP, etc). If the plugins are timing out after a short period of time, increase the timeout value for the plugin (use an appropriate command line argument).

In addition to plugins having timeouts, NetSaint enforces its own timeout value on all service checks that run. By default, this is set to 30 seconds. If the plugin executes for more than 30 seconds, NetSaint will automatically kill it off and return a critical error for that service. If you see entries in the log file that say a service check timed out, this may be your problem. You can adjust the maximum timeout value for service checks by using the service_check_timeout directive.

As a side note, there are also directives for setting the maximum timeout for host checks, notifications, event handlers, and the ocsp command.


"Return code of x is out of bounds" errors

If the plugin output for a host or service check give a "(Return code of x is out of bounds)" error it usually means one of two things:

  1. The plugin you're using to perform the host or service check is not returning the proper return code when it exits (as described in the plugin developer guidelines)

  2. The path to the plugin is invalid (i.e. the binary or script does not exist). This is most likely the case if you get errors about a return code of 127 being out of bounds. If this is the error you're getting, check your command definitions and make sure the path to all executables is correct (and that they're actually installed on your system).


Debugging "unknown variable" errors during configuration file verification or runtime

When trying to run NetSaint or verify your configuration file data using the -v argument, NetSaint may print out a message like "Error in configuration file 'xxxxxxx.cfg' - Line 34 (Unknown variable)". A few simple checks will usually resolve this problem...

  1. Make sure you are passing the path to the main configuration file and not the host configuration file on the command line. Many people have made this mistake. The correct syntax would be as follows (modified for your system, of course):
    ./netsaint -v /usr/local/netsaint/etc/netsaint.cfg

  2. Make sure that you don't have any invalid variables defined in your configuration file. Notice that the error message will contain a reference to the name of the configuration file and the line number on which the error was encountered. Make sure that all comment lines contain a pound sign (#) in the first character of the line. If you're not sure about what variables are valid, check the documentation for the main and host configuration files.

  3. Make sure all variable identifiers are in lower case. Example:
    "admin_email=someaddress@somedomain.com" instead of "ADMIN_EMAIL=somedomain@nowhere.com"


How do I run multiple instances on NetSaint on the same machine?

You can run multiple instances of NetSaint on the same machine, if you ensure that the following variables are unique for each instance of NetSaint...

If you are using the web interface, you will have to setup separate directories to hold the CGIs for each instance of NetSaint and create appropriate script aliases in your web server configuration file. This is necessary because CGI configuration file must be unique for each setup of CGIs, as it contains a reference to which main configuration file the CGIs should read.

One last thing you should check is your init script (if you're using one). The init script should start, stop, restart, and reload all copies of NetSaint (if that's what you want it to do).


When I access the CGIs I don't see everything I should or I get authorzation errors...

If you believe you are unable to see all the information in the CGIs or if you are getting authorization errors, you probably haven't configured the web server to require authentication or haven't setup authorzation correctly. See the documentation on authentication and authorization in the CGIs here.


Where can I find the traceroute and daemonchk CGIs?

The traceroute and daemonchk CGIs are now included in the contrib/ subdirectory of the main NetSaint distribution.


How do I requre users to authenticate before accessing the web interface?

See the documentation on authentication and authorization in the CGIs here.


How do I get those pretty pretty host icons to display in my CGIs?

If you want to associate images with particular hosts for use in the status, status map, status world, and extended information CGIs, you must define extended host information entries in your CGI configuration file.


I'm getting errors when attempting to commit commands to NetSaint via the command CGI

If you are getting 'Could not open command file somefile for update' errors when attempting to commit commands to NetSaint via the command CGI, the most likely problem is with directory and/or file permissions. Here is what you can do to fix it...

  1. Make sure you've created the directory to hold the command file as outlined here.

  2. Make sure you restart your web server so that it inherits the new group permissions you just assigned


NetSaint shuts down with warnings about permissions on the command file

If NetSaint is shutting itself down after it processes external commands and you get warnings in the log file about incorrect permissions on the command file, make sure to read the directions found here.


How do I monitor remote host information?

Several people have asked how to use various plugins that check information on the local host to report information from remote hosts. Various methods for doing this are described below..

If you need to actually execute a plugin on a remote host and get the results back, you can use one of the following methods...

  • Use the check_by_ssh "plugin" to execute a plugin on a remote host. The check_by_ssh plugin is basically a wrapper for executing a plugin on a remote host using SSH. You must have SSH installed and configured properly in order to use this.
  • Use the nrpe addon to accomplish this. The plugins and the nrpe daemon reside on the remote host. The check_nrpe plugin (included with the nrpe package) sends a request to the nrpe daemon to execute the plugin on the remote host and then grabs the results for NetSaint.
  • Use the nrpep addon. This addon works in a similiar manner to the nrpe package, but it encrypts the transmitted data, runs as a service from inetd, and makes use of the TCP Wrappers package for access control.
  • Use rsh to execute the plugin remotely, although I guess I wouldn't recommend this..

If all you need is to check disk space, etc. on a remote host, you can use one of the methods below...

  • Use one of the plugins included with the netsaint_statd addon for NetSaint. The addon, written by Charlie Cook, includes a Perl daemon which runs on the remote host and four plugins which are used to gather the remote host information from the daemon. The daemon is designed to run on Linux, IRIX, HP-UX, SunOS, and OSF/1 systems. Modifying the code for other systems should be fairly easy. More on the netsaint_statd plugin can be found here.
  • Use the check_overcr plugin to query information from a remote host. The remote host must be running Eric Molitor's Over-CR collector in order for this to work.
  • Use the check_snmp plugin to check the value of various OIDs on the remote host. You must have SNMP services installed and running on the remote host in order to do this.


How can I monitor Windows NT servers?

Yes, you can monitor NT servers with NetSaint. There are basically three ways it can currently be done...

  • By using SNMP
  • By using the NTSTAT addon (service and plugin)
  • By using the NSSERVICER addon (service and plugins)

SNMP

The good news is that NT has a lot of performance data that you can monitor. The bad news is that its difficult to do. Your best bet is probably going to be to install SNMP services on all your NT boxes. Ian Cass has written a FAQ on how to do this at http://elton.dev.knowledge.com/snmpfaq.html

In order to expose NT performance counters for monitoring, you'll have to run the SNMP service on all servers you want to monitor. You'll also have to install any necessary performance MIBs for the services you want to monitor. I believe these can be found in the NT Resource Kit or in various server admin packages. If you've feeling extra lucky you can try to search the Microsoft site for the terms SNMP and MIB and maybe you'll find something...

You can search the MRTG mailing list archives for more information on configuring NT servers to expose various performance counters via SNMP. I know this has been discussed in the past, as many people are graphing various NT performance statistics using MRTG. In fact, somebody from Microsoft is actually doing it - you can find their web page at http://snmpboy.rte.microsoft.com/.

Once you've actually got the SNMP stuff working, you can use the check_snmp plugin to query your NT servers and generate alarms.

NTSTAT Addon

A while back I wrote an NT service (ntstat) and corresponding plugin (check_ntstat) that can be used to monitor basic information about NT servers. The plugin and service are capable of monitoring CPU utilization (overall or individually on up to four processors), physical memory usage, paged memory usage, and disk usage. The service has been reported to work on NT 4 servers, as well as Windows 2000 servers.

The ntstat service and check_ntstat plugin can both be found at http://www.netsaint.org/download/alpha/. Don't let the directory name scare you. Several people (myself included) have been using the service and plugin for some time without any problems.

NSSERVICER Addon

Jan Christian Kaldestad and Hallstein Lohne have written the nsservicer addon for monitoring NT servers. The addon is similiar to the ntstat package and includes a service that runs on your NT servers and several plugins that run from the NetSaint host. The nsservicer addon is capable of monitoring the event log, disk usage, process usage, and other info.

You can find the addon at http://www.netsaint.org/download/contrib/addons/.

I plan on merging the functionality of the ntstat and nsservicer addons into one package late this summer. When an updated package is ready for testing, it will be announced on the netsaint-announce mailing list.


How can I monitor Novell Netware servers?

You can monitor basic stats on your Novell server like disk usage, user connections, LRU sitting time, cache buffers, long term cache hits, and processor load by using the check_nwstat plugin (which is included in the main plugin distribution). In order for the plugin to work, you have to install and run James Drew's MRTGEXT NLM on your Novell server. The NLM can be obtained here.


Can NetSaint send SNMP traps to management hosts?

Yes, but not directly. NetSaint relies on plugins to handle the gathering of service and host information and event handler scripts to handle events that occur with services and hosts. If you want to have NetSaint send an SNMP trap to a management host in the event that a particular service has a problem, you will have to write a service event handler script and add it to the event_handler option of the service definition. If you have the UCD-SNMP package installed on your host, you could have the script call the snmptrap command to actually send a trap message, depending on what type of service event occurred. Look at the example event handler script to get a better idea of how to write a script.


Can NetSaint log host and service events to an external database?

Not directly, but this can be done fairly easily. You'll probably want to define global host and service event handlers to do this. The global event handlers could call a script which inserts the appropriate event information into a database of your choosing. This would allow you to run queries and generate more detailed reports than what are available in the CGIs.


Something isn't working properly - How can I track down the problem?

I've worked in tech support for a few years and have spent my share of time on a helpdesk. Most people are vague when they report a problem and have no desire whatsoever to try and track down the problem - they just want you to fix it now. I hope you are not that type of person. NetSaint is relatively new and is probably chock full of bugs, so things will not always work properly. If you suspect that either the service check or notification routines are not working, here are a few things you can do to try and track down the problem...

This first thing you should do is verify your configuration data by running NetSaint with the -v option. Example:

/usr/local/netsaint/bin/netsaint -v /usr/local/netsaint/etc/netsaint.cfg

If no errors are found, proceed to the next steps. If NetSaint reports some error, go back and fix your configuration files.

The next step will take more time, but will give you more information on what is going on inside of NetSaint. When I first developed NetSaint I added a lot of debugging code to help me track down problems. I still use that code when I add new features or track down bugs myself. Here is how to use the debugging code...

Reconfigure NetSaint and enable one or more debug options as follows, replacing the "--enable-DEBUGx" with one or more of the values from the table below:

./configure --prefix=/your/netsaint/directory --enable-DEBUGx

Debugging Options

Debug Option Description
--enable-DEBUG0 Used to trace function calls. A lot of messages will be printed out if you uncomment this option, but it very useful to trace what functions are being called. Note that not all functions will print an exit message if code within the function causes an early exit (before reaching the end of the function).
--enable-DEBUG1 Used to print out informational messages about variable settings. Most useful when trying to debug the configuration data as it is being read or verified.
--enable-DEBUG2 Used to print out warning messages, usually when configuration data is being read or verified.
--enable-DEBUG3 Used to print out informational messages during host and service checks. Good to use if you suspect problems are occuring during service checks.
--enable-DEBUG4 Used to print out informational messages during host and service notifications. Good to use if you suspect problems are occurring during the notification events.

Recompile NetSaint.

Verify your configuration data again - you'll see a lot more information this time if you have enabled the DEBUG1 option. Try redirecting output to a file so that you can view or print it at a later time.

If you have defined either the DEBUG3 or DEBUG4 options, run NetSaint as a foreground process and start monitoring your services. Example:

/usr/local/netsaint/bin/netsaint /usr/local/netsaint/etc/netsaint.cfg

Kill NetSaint at an approprate point (i.e. after a service check fails) and look through the output. It should help you track down where the problem is occurring. You may want to redirect the output to a file to make it easier to review it. Some code tweaking may be necessary on your part in order to fix things. Let me know if you have to make any such alterations so I can include the fix in future releases.

If you are unable to determine or fix the problem on your own, email me the following items (give me some warning if you're planning on sending a large attachment):

  1. The version of NetSaint you are running
  2. A description of what is going wrong and what you suspect is the problem
  3. The OS/distribution/version/architecture you're running NetSaint on (i.e. RedHat Linux 6.1 for Intel)
  4. Your configuration files (netsaint.cfg and hosts.cfg)
  5. Output from the program run (with debugging options on)