State Retention File Format
Introduction
In order to preserve host and service state information (current status, state time statistics, etc.) between program restarts, users can opt to enable the state retention feature by using the retain_state_information option in the main config file. If this option is enabled, state retention information is stored in the file specified by the state_retention_file directive in the main configuration file. Immediately before shutting down (or restarting) NetSaint will write all current host and service state information to the retention file. Upong restarting, NetSaint will read the information stored in the retention file, initialize host and service information, and delete the file.
At any time while NetSaint is running, you can have it save service and host state information, by using the SAVE_STATE_INFORMATION external command. You can also force NetSaint to read in previously save state information by using the READ_STATE_INFORMATION command, although this is not recommend, as the current state information that NetSaint has will be replaced with whatever is stored in the state retention file.
It should be noted that NetSaint will only save state information for service and hosts that have been checked at the time the file is written. Also, NetSaint will only save the last hard state for the host or service.
File Format
The state retention file contains four types of entries: a creation timestamp, program state information, host state information and service state information. The format for each type of entry it described below.
Creation Time Format:
CREATED: <timestamp>
where...
- timestamp is the time in time_t format (seconds since UNIX epoch) that the state information was saved.
Program Information Format:
PROGRAM: <program_mode>;<execute_service_checks>;<accept_passive_service_checks>;<enable_event_handlers>;<obsess_over_services>
where...
- program_mode is an integer that represents the last program mode that NetSaint was in. Values: 0=standby mode, 1=active mode.
- execute_service_checks is an integer indicating whether or not service checks were being executed when NetSaint was running. Values: 0=checks were *not* being executed, 1=checks were being executed.
- accept_passive_service_checks is an integer indicating whether or not passive service checks were being accepted when NetSaint was running. Values: 0=passive checks were *not* being accepted, 1=passive checks were being accepted.
- enable_event_handlers is an integer indicating whether or not host and service event handlers were enabled when NetSaint was running. Values: 0=event handlers were *not* enabled, 1=event handlers were enabled.
- obsess_over_services is an integer indicating whether or not NetSaint was obsessing over service checks when it was running. Values: 0=NetSaint was *not* obsessing, 1=NetSaint was obsessing.
Host Information Format:
HOST: <host_name>;<state>;<last_check>;<checks_enabled>;<time_up>;<time_down>;<time_unreachable>;<last_notification>;<current_notification_number>;<current_notification_number>;<notifications_enabled>;<event_handler_enabled>;<problem_has_been_acknowledged>;<plugin_output>
where...
- host_name is the short name of the host (as defined in the host configuration file) that the state information corresponds to.
- state is an integer corresponding to the state of the host (UP, DOWN, or UNREACHABLE). See the base/netsaint.h file for the integer values of different states.
- last_check is a timestamp in time_t format (number of seconds since UNIX epoch) that indicates when the host status was last checked.
- checks_enabled is an integer indicating whether or not checks of this host have been enabled. Values: 0=checks have been disabled, 1=checks are enabled.
- time_up is the number of seconds that the host has been in an UP state.
- time_down is the number of seconds that the host has been in a DOWN state.
- time_unreachable is the number of seconds that the host has been in an UNREACHABLE state.
- last_notification is a timestamp in time_t format (number of seconds since UNIX epoch) that indicates when the last notification for this host was sent out. If no notifications have been sent out, this value is set to zero.
- current_notification_number is an integer representing the number of notifications that have been sent out about this host problem. If no notifications have been sent out since the host last changed state (of if it is in an UP state), this value is set to zero.
- notifications_enabled is an integer that indicates whether or not notifications for this host have been enabled. Values: 0=notifications have been disabled, 1=notifications are enabled.
- event_handler_enabled is an integer indicating whether or not the event handler for this host has been enabled. Value: 0=event handler has been disabled, 1=event handler is enabled.
- problem_has_been_acknowledged is an integer indicating whether or not this host problem has been acknowledged. If the host is UP, or it is DOWN or UNREACHABLE and has not been acknowledged, this is set to 0. If this host is DOWN or UNREACHABLE and the problem has been acknowledged, this is set to 1.
- plugin_output is the output from the last host check (text)
Service Information Format:
SERVICE: <host_name>;<svc_description>;<state>;<last_check>;<time_ok>;<time_warning>;<time_unknown>;<time_critical>;<last_notification>;<current_notification_number>;<notifications_enabled>;<checks_enabled>;<accept_passive_checks>;<event_handler_enabled>;<problem_has_been_acknowledged>;<plugin_output>
where...
- host_name is the short name of the host that this service is associated with.
- svc_description is the description of the service (as defined in the host configuration file) that the state information corresponds to. Together, the host_name and svc_description fields uniquely identify a service definition.
- state is an integer corresponding to the state of the state (OK, WARNING, UNKNOWN, or CRITICAL). See the base/netsaint.h file for the exact values of different states.
- last_check is a timestamp in time_t format (number of seconds since UNIX epoch) that indicates when the service status was last checked.
- time_ok is the number of seconds that the service has been in an OK state.
- time_warning is the number of seconds that the service has been in a WARNING state.
- time_unknown is the number of seconds that the service has been in an UNKNOWN state.
- time_critical is the number of seconds that the service has been in a CRITICAL state.
- last_notification is a timestamp in time_t format (number of seconds since UNIX epoch) that indicates when the last notification for this service was sent out. If no notifications have been sent out, this value is set to zero.
- current_notification_number is an integer representing the number of notifications that have been sent out about this host problem. If no notifications have been sent out since the host last changed state (of if it is in an UP state), this value is set to zero.
- notifications_enabled is an integer that indicates whether or not notifications for this service have been enabled. Values: 0=notifications have been disabled, 1=notifications are enabled.
- checks_enabled is an integer that indicates whether or not checks of this service have been enabled. Values: 0=checks have been disabled, 1=checks are enabled.
- accept_passive_checks is an integer representing whether or not passive checks are being accepted for this service. If this value is 1, they are being accepted. If this value is 0, passive checks are not being accepted.
- event_handler_enabled is an integer indicating whether or not the event handler for this service has been enabled. Value: 0=event handler has been disabled, 1=event handler is enabled.
- problem_has_been_acknowledged is an integer indicating whether or not this service problem has been acknowledged. If the service is in an OK state, or it is in a non-OK state and has not been acknowledged, this is set to 0. If this service is in a non-0K state and the problem has been acknowledged, this is set to 1.
- plugin_output is the output from the last service check (text)