Status File Format
Introduction
In order to give external applications (such as the CGIs) access to the current host and service status information in NetSaint, all status information is saved to the file specified by the status_file option in the main config file. External applications can read the contents of this file to determine the current status of any monitored host or service. External applications should not write anything to the status file. NetSaint does not read the status file to determine current service and host information - it is simply provided as a means for third-party apps to access the internal status information in an easy manner.
File Format
The status file contains three types of entries: a program entry, one or more host status entries, and one or more service status entries. The format for each type of entry it described below.
Program Entry Format:
[<timestamp>] PROGRAM;<start_time>;<daemon_mode>;<program_mode>;<last_mode_change>;<last_command_check>;<last_log_rotation>;<executing_service_checks>;<accept_passive_service_checks>;<enable_event_handlers>;<obsess_over_services>
where...
- timestamp is the time in time_t format (seconds since UNIX epoch) that the program entry was last updated.
- start_time is the time in time_t format (seconds since UNIX epoch) that NetSaint was last (re)started.
- daemon_mode in an integer that indicates whether or not NetSaint is running as a daemon. If this value is 1, NetSaint is running in daemon mode. If this value is 0, NetSaint is running as a normal (foreground or background) process.
- program_mode a string which identifies what program mode NetSaint is currently in. If this string is "ACTIVE", NetSaint is in active mode. If this string is "STANDBY", NetSaint is in standby mode.
- last_mode_change is the time in time_t format (seconds since UNIX epoch) when the last program mode change occurred.
- last_command_check is the time in time_t format (seconds since UNIX epoch) that NetSaint last checked for external commands. A value of zero means that NetSaint has not checked for external commands since it was last (re)started.
- last_log_rotation is the time in time_t format (seconds since UNIX epoch) that NetSaint last rotated the main log file. A value of zero means that the log file has not been rotated since NetSaint was last (re)started.
- execute_service_checks in an integer that indicates whether or not NetSaint is actively executing service checks. Values: 0=checks are *not* being executed, 1=checks are being executed.
- accept_passive_service_checks in an integer that indicates whether or not NetSaint is accepting passive service checks. Values: 0=passive service checks are *not* being accepted, 1=passive checks are being accepted.
- enable_event_handlers in an integer that indicates whether or not host and service event handlers are enabled. Values: 0=event handlers are *not* enabled, 1=event handlers are enabled.
- obsess_over_services in an integer that indicates whether or not is running "obsessing" over service check results and running a obsessive service check processor command. Values: 0=Netsiant is *not* obsessing, 1=NetSaint is obsessing.
Host Status Format:
[<timestamp>] HOST; <host_name>;<state>;<last_state_change>;<problem_has_been_acknowledged>;<time_up>;<time_down>;<time_unreachable>;<last_notification>;<current_notification_number>;<notifications_enabled>;<event_handler_enabled>;<checks_enabled>;<plugin_output>
where...
- timestamp is the time in time_t format (seconds since UNIX epoch) that the host was last checked (or its current state was assumed).
- host_name is the short name of the host (as defined in the host configuration file) that the state information corresponds to.
- state is a string that indicates the current state of the host. Values include "PENDING", "UP", "DOWN", and "UNREACHABLE".
- last_state_change is the time in time_t format (seconds since UNIX epoch) that the host last experienced a hard state change.
- problem_has_been_acknowledged is an integer indicating whether or not this host problem has been acknowledged. If the host is UP, or it is DOWN or UNREACHABLE and has not been acknowledged, this is set to 0. If this host is DOWN or UNREACHABLE and the problem has been acknowledged, this is set to 1.
- time_up is the number of seconds (since monitoring began) that the host has been in an UP state.
- time_down is the number of seconds (since monitoring began) that the host has been in a DOWN state.
- time_unreachable is the number of seconds (since monitoring began) that the host has been in an UNREACHABLE state.
- last_notification is a timestamp in time_t format (number of seconds since UNIX epoch) that indicates when the last notification for this host was sent out. If no notifications have been sent out (or if the host is UP), this value is set to zero.
- current_notification_number is an integer representing the number of notifications that have been sent out about this host problem. If no notifications have been sent out since the host last changed state (of if it is in an UP state), this value is set to zero.
- notifications_enabled is an integer that indicates whether or not notifications for this host are enabled. Values: 0=notifications are *not* enabled, 1=notifications are enabled.
- event_handler_enabled is an integer that indicates whether or not the event handler for this host are enabled. Values: 0=event handler is *not* enabled, 1=event handler is enabled.
- checks_enabled is an integer that indicates whether or not checks this host are enabled. Values: 0=checks are *not* enabled, 1=checks are enabled.
- plugin_output is the output from the last host check (text)
Service Status Format:
[<timestamp>] SERVICE; <host_name>;<svc_description>;<state>;<current_attempt>/<max_attempts>;<state_type>;<next_check>;<check_type>;<checks_enabled>;<passive_checks_accepted>;<last_state_change>;<problem_has_been_acknowledged>;<last_hard_state>;<time_ok>;<time_unknown>;<time_warning>;<time_critical>;<last_notification>;<current_notification_number>;<notifications_enabled>;<check_latency>;<execution_time>;<plugin_output>
where...
- timestamp is the time in time_t format (seconds since UNIX epoch) that the service was last checked.
- host_name is the short name of the host that this service is associated with.
- svc_description is the description of the service (as defined in the host configuration file) that the state information corresponds to. Together, the host_name and svc_description fields uniquely identify a service definition.
- state is string indicating the current state of the service. Values include "OK", "UNKNOWN", "WARNING", "CRITICAL", "RECOVERY", "UNREACHABLE", and "HOST DOWN". A value of "RECOVERY" indicates that the service is in an OK state, but just recovered from a non-OK state. Values of "UNREACHABLE" and "HOST DOWN" indicate that the host that the service is associated with is either down or unreachable.
- current_attempt is an integer representing the current service check attempt number. This value will be set to 1 if the host that the service is associated with is either down or unreachable.
- max_attempts is an integer representing the maximum number of check attempts for this service.
- state_type is a string indicating what type of state the service is currently in. Values include "SOFT" and "HARD".
- next_check is the time in time_t format (seconds since UNIX epoch) that the service is next scheduled to be checked.
- check_type is a string indicating what type of service check this was. Values include "ACTIVE" and "PASSIVE".
- checks_enabled is an integer representing whether or not checks for this service are enabled. Values: 0=checks are *not* enabled, 1=checks are enabled.
- accept_passive_checks is an integer representing whether or not passive checks are being accepted for this service. Values: 0=passive checks are *not* being accepted, 1=passive checks are being accepted.
- event_handler_enabled is an integer representing whether or not the event handler for this service is enabled. Values: 0=event handler is *not* enabled, 1=event handler is enabled.
- passive_checks_accepted is an integer representing whether or not passive checks are being accepted for this service. If this value is 1, they are being accepted. If this value is 0, passive checks are not being accepted.
- last_state_change is the time in time_t format (seconds since UNIX epoch) that the service last had a hard state change.
- problem_has_been_acknowledged is an integer indicating whether or not this service problem has been acknowledged. If the service is in an OK state, or it is in a non-OK state and has not been acknowledged, this is set to 0. If this service is in a non-0K state and the problem has been acknowledged, this is set to 1.
- last_hard_state is a string that indicates the last hard state of the service. Values include "OK", "UNKNOWN", "WARNING", and "CRITICAL".
- time_ok is the number of seconds that the service has been in an OK state.
- time_warning is the number of seconds that the service has been in a WARNING state.
- time_unknown is the number of seconds that the service has been in an UNKNOWN state.
- time_critical is the number of seconds that the service has been in a CRITICAL state.
- last_notification is a timestamp in time_t format (number of seconds since UNIX epoch) that indicates when the last notification for this service was sent out. If no notifications have been sent out or if the service is currently in an OK state, this value is set to zero.
- current_notification_number is an integer representing the number of notifications that have been sent out about this service problem. If no notifications have been sent out since the service last changed state (of if it is in an OK state), this value is set to zero.
- notifications_enabled is an integer that indicates whether or not notifications for this service have been enabled. Values: 0=notifications are *not* enabled, 1=notifications are enabled.
- check_latency is an integer indicating the number of seconds that the service check lagged behind its scheduled execution time (actual time of execution - scheduled time of execution = latency)
- execution_time is an integer indicating the number of seconds that this service check took to execute
- plugin_output is the output from the last service check (text)