Building a Self-Healing Network
Pages: 1, 2, 3
Configuring NAGIOS
Two articles by Oktay Altunergil cover NAGIOS in depth. The first, Installing Nagios, covers installing NAGIOS from source. The second, Nagios, Part 2, has an in-depth discussion of the configuration files that are at the heart of NAGIOS's behavior.
The configuration files for NAGIOS typically live in /etc/nagios. The hosts.cfg file defines which hosts NAGIOS should monitor. This file simply defines the web server and its IP address.
# Generic host definition template
define host{
name generic-host ; Host template
notifications_enabled 1
event_handler_enabled 1
flap_detection_enabled 1
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
register 0 ; DONT REGISTER THIS TEMPLATE
}
# our apache server host definition
define host{
use generic-host ; template to use
host_name webserver
alias Our apache webserver
address 192.168.0.20
check_command check-host-alive
max_check_attempts 10
notification_interval 120
notification_period 24x7
notification_options d,u,r
}
services.cfg contains definitions of which services to monitor for each host. This file checks the reachability (via ping) and the availability of the HTTP server.
# Generic service definition template
define service{
name generic-service ; This is a template.
active_checks_enabled 1
passive_checks_enabled 1
parallelize_check 1
obsess_over_service 1
check_freshness 0
notifications_enabled 1
event_handler_enabled 1
flap_detection_enabled 1
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
register 0 ; DONT REGISTER TEMPLATE
}
# Service definition
define service{
use generic-service ; Name of template
host_name webserver
service_description PING
is_volatile 0
check_period 24x7
max_check_attempts 3
normal_check_interval 2
retry_check_interval 1
contact_groups admins
notification_interval 120
notification_period 24x7
notification_options c,r
check_command check_ping!100.0,20%!500.0,60%
}
# Service definition
define service{
use generic-service ; Name of template
host_name webserver
service_description HTTP
is_volatile 0
check_period 24x7
max_check_attempts 3
normal_check_interval 2
retry_check_interval 1
contact_groups admins
notification_interval 120
notification_period 24x7
notification_options w,u,c,r
check_command check_http
event_handler_enabled 1
event_handler handle_cfrun
}
The configuration file contacts.cfg defines who to contact when a monitoring event occurs and how to make the contact. A basic configuration simply mails root.
define contact{
contact_name nagios
alias Nagios Admin
service_notification_period 24x7
host_notification_period 24x7
service_notification_options w,u,c,r
host_notification_options d,u,r
service_notification_commands notify-by-email,notify-by-epager
host_notification_commands host-notify-by-email,host-notify-by-epager
email root@localhost.localdomain
pager root@localhost.localdomain
}
contactgroups.cfg defines groupings of contacts.
define contactgroup{
contactgroup_name admins
alias Apache Server Administrators
members nagios
}
The hostgroups.cfg file contains a mapping of hosts to groups. You only have one host in its own group, associated with your one contact group.
define hostgroup{
hostgroup_name webserver
alias Apache Web Servers
contact_groups admins
members webserver
}
Zero out the files dependencies.cfg and escalations.cfg (for example, cp /dev/null to each of these) since you don't need these files in this configuration.
Finally, edit cgi.cfg. If you are in a lab or isolated environment, set use_authentication=0. Otherwise, set up an appropriate htaccess configuration for your /nagios/ directory with sane values. For more information on how NAGIOS manages CGI security, review the NAGIOS CGI Authentication Documentation.
Start up your NAGIOS server: service nagios start.
Go to http://monitor/nagios/ and click service checks. After a few moments, you should see an http & ping in the green. One final note: if you have just installed Apache on your web server, make sure there's a /var/www/html/index.html document so that the server returns OK. Otherwise, it will return 203/NOT AUTHORIZED, which will cause health checking to fail.
You've now created a very vanilla NAGIOS and Cfengine environment. This is something you may have already put into place in your network. But hold on to your hat--here's where I make it interesting.