Monitoring Services on Down or Unreachable Hosts
The main purpose of NetSaint is to monitor services that run on or are provided by physical hosts or devices on your network. It should be obvious that if a host or device on your network goes down, all services that it offers will also go down with it. Similarly, if a host becomes unreachable, NetSaint will not be able to monitor the services associated with that host.
NetSaint recognizes this fact and attempts to check for such a scenario when there are problems with a service. Whenever a service check results in a non-OK status level, NetSaint will attempt to check and see if the host that the service is running on is "alive". Typically this is done by pinging the host and seeing if any response is received. If the host check commmand returns a non-OK state, NetSaint assumes that there is a problem with the host. In this situation NetSaint will "silence" all potential alerts for services running on the host and just notify the appropriate contacts that the host is down or unreachable. If the host check command returns an OK state, NetSaint will recognize that the host is alive and will send out an alert for the service that is misbehaving.
Local Hosts
"Local" hosts are hosts that reside on the same network segment as the host running NetSaint - no routers or firewalls lay between them. Figure 1 shows an example network layout. Host A is running NetSaint and monitoring all other hosts and routers depicted in the diagram. Hosts B, C, D, E and F are all considered to be "local" hosts in relation to host A.
The <parent_host> option in the host defintion for a "local" host should be left blank, as local hosts have no depencies or "parents" - that's why they're local.
Monitoring Local Hosts
Checking hosts that are on your local network is fairly simple. Short of someone accidentally (or intentially) unplugging the network cable from one of your hosts, there isn't too much that can go wrong as far as checking network connectivity is concerned. There are no routers or external networks between the host doing the monitoring and the other hosts on the local network.
If NetSaint needs to check to see if a local host is "alive" it will simply run the host check command for that host. If the command returns an OK state, NetSaint assumes the host is up. If the command returns any other status level, NetSaint will assume the host is down.
Remote Hosts
"Remote" hosts are hosts that reside on a different network segment than the host running NetSaint. In the figure above, hosts G, H, I, J, K, L and M are all considered to be "remote" hosts in relation to host A.
Notice that some hosts are "farther away" than others. Hosts H, I and J are one hop further away from host A than host G (the router) is. From this observation we can construct a host dependency tree as show below in Figure 2. This tree diagram will help us in deciding how to configure each host in NetSaint.
The <parent_host> option in the host defintion for a "remote" host should be the short name of the host directly above it in the tree diagram (as show below). For example, the parent host for host H would be host G. The parent host for host G is host F. Host F has no parent host, since it is on the network segment as host A - it is a "local" host.
Monitoring Remote Hosts
Checking the status of remote hosts is a bit more complicated that for local hosts. If NetSaint cannot monitor services on a remote host, it needs to determine whether the remote host is down or whether it is unreachable. Luckily, the <parent_host> option introduced in 0.0.4 allows NetSaint to do this.
If a host check command for a remote host returns a non-OK state, NetSaint will "walk" the depency tree (as shown in the figure above) until it reaches the top (or until a parent host check results in an OK state). By doing this, NetSaint is able to determine if a service problem is the result of a down host, an down network link, or just a plain old service failure.