Index
I'm having trouble compiling Netsaint - What can I do? | ||||||||||||
Compiling NetSaint on different OSes doesn't really seem to be much of a problem anymore, unless you're missing some string functions... If you're getting errors about the strncat(), strncpy(), or snprintf() functions, you probably don't have the glibc libraries installed on your system. This tends to happen most often on HP-UX and Solaris boxes. I've tried to prevent potential buffer overflows in NetSaint and the CGIs by using these functions, so they are all over the code. If you don't want to install the glibc libraries for some reason, you'll have to find some other way to get everything compiled. If all you're missing is the snprintf() function, try grabbing the snprintf.c file from http://www.ijs.si/software/snprintf/ and adding it to the Makefiles so that it gets included during when you compile things. |
||||||||||||
I can't find or am having trouble compiling the statusmap and/or trends CGIs... | ||||||||||||
If you compile all the CGIs, but don't find the statusmap CGI or trends CGI, you probably don't have Thomas Boutell's gd library installed correctly on your system. The gd library (and thus the statusmap CGI) also requires that you also have the zlib and png libraries installed. Version 1.6.3 or higher of the gd library is required, as the CGI generates a PNG image of your network layout. If you find that the statusmap CGI has not been compiled, make sure you have the gd library installed on your system, run 'make clean' and rerun the configure script with the following options: ./configure --with-gd-lib=LIBDIR --with-gd-inc=INCDIR [other options...] Replace LIBDIR with the directory in which the gd library is installed (usually /usr/lib or /usr/local/lib) and replace INCDIR with the directory in which the header files for the gd library are installed (usually /usr/include or /usr/local/include). After you rerun the configure script, make sure to recompile the CGIs and install them in their proper location. If you are having a lot of trouble getting the configure script to recognize that you have the gd libraries installed (and you're running RedHat Linux), I would recommend installing the 1.8.1-2 version of both the gd and gd-devel RPMs from www.rpmfind.net - they seem to work without requiring any special arguments to the configure script. |
||||||||||||
"NetSaint process may not be running" warnings in the CGIs | ||||||||||||
If you are getting erroneous messages about the NetSaint process not running while viewing the CGIs, its probably due to one of the following items:
The CGIs will not allow you to sumbit any commands while they think the NetSaint process is not running. This is done primarily to prevent people from accidentally submitting multiple shutdown/restart commands that don't get processed until NetSaint is started at some future time. |
||||||||||||
Hosts are incorrectly listed as being DOWN and/or services have a status of "HOST DOWN" |
||||||||||||
This seems to be one of the biggest issues for new users. 99.9% of the time this problem is due to an incorrect command definition for the host check command you specified in the host definition. A major cause for this problem was due to a syntax change to the command line arguments of the check_ping plugin. You need to make sure that the host check command is using the proper syntax for the version of the check_ping plugin that you have. You can check to see if the command works properly by executing it manually from the command line. Recent versions of the check_ping plugin require that a -p flag be used to specify the number of packets to send. Previous versions of the plugin did not require this flag - that's where the problem lies. Check your host check command definition(s) to make sure they are using the proper syntax. Example: command[check-host-alive]=/usr/local/netsaint/libexec/check_ping $HOSTADDRESS$ 100 100 1000.0 1000.0 -p 1 Important! Just because you have a service that is monitoring ping statistics for a host does not mean that the actual host status is being checked. The status of a host is only checked when a service check results in a non-OK state or if the host was previously down and a service check results in an OK state. Some symptoms of incorrect host check commands include:
|
||||||||||||
When hosts go down, I get notification about services instead of hosts and the service notifications contain incorrect data |
||||||||||||
Several people have reported this problem and I spent hours trying to find the problem until I realized it wasn't a bug in the code. If you get service notifications when you should be getting host notifications (and the service notifications you get seem to contain bogus data), check your contact definitions in the host config file. They are most likely incorrect. Make sure that you are not using the same notification command for service and host notification commands. Service and host notifications are very different and make use of macros which are not transferrable between each type. Look at the sample host config file provided with NetSaint to see what the contact definitions look like and how the service and host notification commands differ. If you're wondering what macros can be used in either type of notification, look at this table. |
||||||||||||
Can I monitor a host without defining any services for it? |
||||||||||||
No. You must define at least one service for each host you want to monitor. NetSaint is primarily geared towards monitoring services - hosts are really only checked when there are problems or recoveries with services. |
||||||||||||
How can I change the timeout values for service checks? |
||||||||||||
First you need to identify where the timeout is occurring. Most plugins time out after 10 seconds of not being able to contact a service (FTP, HTTP, etc). If the plugins are timing out after a short period of time, increase the timeout value for the plugin (use an appropriate command line argument). In addition to plugins having timeouts, NetSaint enforces its own timeout value on all service checks that run. By default, this is set to 30 seconds. If the plugin executes for more than 30 seconds, NetSaint will automatically kill it off and return a critical error for that service. If you see entries in the log file that say a service check timed out, this may be your problem. You can adjust the maximum timeout value for service checks by using the service_check_timeout directive. As a side note, there are also directives for setting the maximum timeout for host checks, notifications, event handlers, and the ocsp command. |
||||||||||||
"Return code of x is out of bounds" errors |
||||||||||||
If the plugin output for a host or service check give a "(Return code of x is out of bounds)" error it usually means one of two things:
| ||||||||||||
Debugging "unknown variable" errors during configuration file verification or runtime | ||||||||||||
When trying to run NetSaint or verify your configuration file data using the -v argument, NetSaint may print out a message like "Error in configuration file 'xxxxxxx.cfg' - Line 34 (Unknown variable)". A few simple checks will usually resolve this problem...
|
||||||||||||
How do I run multiple instances on NetSaint on the same machine? | ||||||||||||
You can run multiple instances of NetSaint on the same machine, if you ensure that the following variables are unique for each instance of NetSaint...
If you are using the web interface, you will have to setup separate directories to hold the CGIs for each instance of NetSaint and create appropriate script aliases in your web server configuration file. This is necessary because CGI configuration file must be unique for each setup of CGIs, as it contains a reference to which main configuration file the CGIs should read. One last thing you should check is your init script (if you're using one). The init script should start, stop, restart, and reload all copies of NetSaint (if that's what you want it to do). |
||||||||||||
When I access the CGIs I don't see everything I should or I get authorzation errors... | ||||||||||||
If you believe you are unable to see all the information in the CGIs or if you are getting authorization errors, you probably haven't configured the web server to require authentication or haven't setup authorzation correctly. See the documentation on authentication and authorization in the CGIs here. |
||||||||||||
Where can I find the traceroute and daemonchk CGIs? | ||||||||||||
The traceroute and daemonchk CGIs are now included in the contrib/ subdirectory of the main NetSaint distribution. |
||||||||||||
How do I requre users to authenticate before accessing the web interface? | ||||||||||||
See the documentation on authentication and authorization in the CGIs here. |
||||||||||||
How do I get those pretty pretty host icons to display in my CGIs? | ||||||||||||
If you want to associate images with particular hosts for use in the status, status map, status world, and extended information CGIs, you must define extended host information entries in your CGI configuration file. |
||||||||||||
I'm getting errors when attempting to commit commands to NetSaint via the command CGI | ||||||||||||
If you are getting 'Could not open command file somefile for update' errors when attempting to commit commands to NetSaint via the command CGI, the most likely problem is with directory and/or file permissions. Here is what you can do to fix it...
|
||||||||||||
NetSaint shuts down with warnings about permissions on the command file | ||||||||||||
If NetSaint is shutting itself down after it processes external commands and you get warnings in the log file about incorrect permissions on the command file, make sure to read the directions found here. |
||||||||||||
How do I monitor remote host information? | ||||||||||||
Several people have asked how to use various plugins that check information on the local host to report information from remote hosts. Various methods for doing this are described below.. If you need to actually execute a plugin on a remote host and get the results back, you can use one of the following methods...
If all you need is to check disk space, etc. on a remote host, you can use one of the methods below...
|
||||||||||||
How can I monitor Windows NT servers? | ||||||||||||
Yes, you can monitor NT servers with NetSaint. There are basically three ways it can currently be done...
SNMP The good news is that NT has a lot of performance data that you can monitor. The bad news is that its difficult to do. Your best bet is probably going to be to install SNMP services on all your NT boxes. Ian Cass has written a FAQ on how to do this at http://elton.dev.knowledge.com/snmpfaq.html In order to expose NT performance counters for monitoring, you'll have to run the SNMP service on all servers you want to monitor. You'll also have to install any necessary performance MIBs for the services you want to monitor. I believe these can be found in the NT Resource Kit or in various server admin packages. If you've feeling extra lucky you can try to search the Microsoft site for the terms SNMP and MIB and maybe you'll find something... You can search the MRTG mailing list archives for more information on configuring NT servers to expose various performance counters via SNMP. I know this has been discussed in the past, as many people are graphing various NT performance statistics using MRTG. In fact, somebody from Microsoft is actually doing it - you can find their web page at http://snmpboy.rte.microsoft.com/. Once you've actually got the SNMP stuff working, you can use the check_snmp plugin to query your NT servers and generate alarms. NTSTAT Addon A while back I wrote an NT service (ntstat) and corresponding plugin (check_ntstat) that can be used to monitor basic information about NT servers. The plugin and service are capable of monitoring CPU utilization (overall or individually on up to four processors), physical memory usage, paged memory usage, and disk usage. The service has been reported to work on NT 4 servers, as well as Windows 2000 servers. The ntstat service and check_ntstat plugin can both be found at http://www.netsaint.org/download/alpha/. Don't let the directory name scare you. Several people (myself included) have been using the service and plugin for some time without any problems. NSSERVICER Addon Jan Christian Kaldestad and Hallstein Lohne have written the nsservicer addon for monitoring NT servers. The addon is similiar to the ntstat package and includes a service that runs on your NT servers and several plugins that run from the NetSaint host. The nsservicer addon is capable of monitoring the event log, disk usage, process usage, and other info. You can find the addon at http://www.netsaint.org/download/contrib/addons/. I plan on merging the functionality of the ntstat and nsservicer addons into one package late this summer. When an updated package is ready for testing, it will be announced on the netsaint-announce mailing list. |
||||||||||||
How can I monitor Novell Netware servers? | ||||||||||||
You can monitor basic stats on your Novell server like disk usage, user connections, LRU sitting time, cache buffers, long term cache hits, and processor load by using the check_nwstat plugin (which is included in the main plugin distribution). In order for the plugin to work, you have to install and run James Drew's MRTGEXT NLM on your Novell server. The NLM can be obtained here. |
||||||||||||
Can NetSaint send SNMP traps to management hosts? | ||||||||||||
Yes, but not directly. NetSaint relies on plugins to handle the gathering of service and host information and event handler scripts to handle events that occur with services and hosts. If you want to have NetSaint send an SNMP trap to a management host in the event that a particular service has a problem, you will have to write a service event handler script and add it to the event_handler option of the service definition. If you have the UCD-SNMP package installed on your host, you could have the script call the snmptrap command to actually send a trap message, depending on what type of service event occurred. Look at the example event handler script to get a better idea of how to write a script. |
||||||||||||
Can NetSaint log host and service events to an external database? | ||||||||||||
Not directly, but this can be done fairly easily. You'll probably want to define global host and service event handlers to do this. The global event handlers could call a script which inserts the appropriate event information into a database of your choosing. This would allow you to run queries and generate more detailed reports than what are available in the CGIs. |
||||||||||||
Something isn't working properly - How can I track down the problem? | ||||||||||||
I've worked in tech support for a few years and have spent my share of time on a helpdesk. Most people are vague when they report a problem and have no desire whatsoever to try and track down the problem - they just want you to fix it now. I hope you are not that type of person. NetSaint is relatively new and is probably chock full of bugs, so things will not always work properly. If you suspect that either the service check or notification routines are not working, here are a few things you can do to try and track down the problem... This first thing you should do is verify your configuration data by running NetSaint with the -v option. Example: /usr/local/netsaint/bin/netsaint -v /usr/local/netsaint/etc/netsaint.cfg If no errors are found, proceed to the next steps. If NetSaint reports some error, go back and fix your configuration files. The next step will take more time, but will give you more information on what is going on inside of NetSaint. When I first developed NetSaint I added a lot of debugging code to help me track down problems. I still use that code when I add new features or track down bugs myself. Here is how to use the debugging code... Reconfigure NetSaint and enable one or more debug options as follows, replacing the "--enable-DEBUGx" with one or more of the values from the table below: ./configure --prefix=/your/netsaint/directory --enable-DEBUGx Debugging Options
Recompile NetSaint. Verify your configuration data again - you'll see a lot more information this time if you have enabled the DEBUG1 option. Try redirecting output to a file so that you can view or print it at a later time. If you have defined either the DEBUG3 or DEBUG4 options, run NetSaint as a foreground process and start monitoring your services. Example: /usr/local/netsaint/bin/netsaint /usr/local/netsaint/etc/netsaint.cfg Kill NetSaint at an approprate point (i.e. after a service check fails) and look through the output. It should help you track down where the problem is occurring. You may want to redirect the output to a file to make it easier to review it. Some code tweaking may be necessary on your part in order to fix things. Let me know if you have to make any such alterations so I can include the fix in future releases. If you are unable to determine or fix the problem on your own, email me the following items (give me some warning if you're planning on sending a large attachment):
|
||||||||||||