Writing Apache's Logs to MySQL
Pages: 1, 2, 3
Querying the Database
Queries can run on the database immediately. The examples below are in straight SQL syntax. Some use the nested query syntax available only in MySQL 4.1 or above.
If you intend to write automated scripts in Perl or Python or whatever, it
may be easier to run multiple SELECT statements or to compute
derived values inside the scripting language itself. Whenever you request
reports on web server performance, you'll need to specify the date range you
want to cover.
Query: total bytes in and out
You want to find out how much network traffic your web server sends and receives. Keep in mind that this reflects traffic levels only at the application layer of the TCP/IP stack. It won't record the size of TCP or IP packets. That's a question SNMP polling tools can answer better.
select sum(bytesin), sum(bytesout) from blackbox \
where datetime = curdate();
select sum(bytesin),sum(bytesout) from blackbox where datetime >=
date_sub(curdate(),interval 7 day);
The first example gives the total bytes in and bytes out for the given day. The second example gives the total for the past seven days.
Query: what percentage of hits goes to RSS syndication pages compared with HTML pages?
This example checks the number of hits against a specific RSS page against every possible hit against the server.
select (select count(url) from blackbox where url regexp '\.rss$')/
(select count(url) from blackbox where url regexp '\.html$');
RSS hits are typically automated. Because of this, a lot of sites have a higher percentage of pure robot/agent traffic on their site than they may have had two years ago. By checking the RSS hits, you can determine whether the agent traffic is overwhelming your site.
Query: how often do users skip the Flash animation?
If you see a web site with one of those Flash animations on the main page, it usually has a "Skip" link at the bottom. If you're going to include one, throw in a meaningless query string at the end of the link, so you can determine which hits are redirects after the animation has finished and which ones come from people clicking on the link.
select (select count(url) from blackbox where url="/home.html?skipped"),
(select count(url) from blackbox where url="/home.html"),
Verifying that the database is working
Here's a two-part script you can run to verify that the database really is logging hits from the servers. First, add this script to the cgi-bin directory of every web site that uses mod_log_mysql.
#!/usr/local/bin/perl
# returnid.pl
use strict;
use warnings;
print("X-ServerId: $ENV{'SERVERID'}\n");
print("X-UniqueId: $END{'UNIQUE_ID'}\n");
print "Content-type: text/plain\n\n";
print("Server id is $ENV{'SERVERID'}\n");
print("Unique id is $ENV{'UNIQUE_ID'}\n");
exit 0;
The server will return its serverid and unique ID for that particular
hit. The next program will act as an HTTP client. It will retrieve this URL,
then connect to the database and search the logs for the unique ID.
#!/usr/local/bin/perl
# checkid.pl
#
# Pragmas
#
use strict;
use warnings;
#
# Includes
#
use LWP::UserAgent;
use HTTP::Request;
use HTTP::Response;
use DBI;
#
# Global Variables
#
use vars qw/$Url $Agent $Db $ServerId $UniqueId /;
use constant {
DSN => "DBI:mysql:database=apachelog",
DBUSER => "logwriter",
DBPASS => "logpass",
QUERYSTRING =>
"select datetime,uniqueid from blackbox where uniqueid=? and serverid=?",
DEFAULTURL => "http://www.example.com:8080/cgi-bin/returnid.pl",
};
#
# Subroutines
#
sub getId
{
my ($agent,$url) = @_;
my $request = HTTP::Request->new(GET => $url);
my $response = $agent->request($request);
if ($response->is_success)
{
my $serverid=$response->header("X-ServerId");
my $uniqueid=$response->header("X-UniqueId");
print("Unique ID from server is $uniqueid\n");
print("Server ID is $serverid\n");
return ($serverid,$uniqueid);
}
return undef;
}
sub findId
{
my ($db,$serverid,$uniqueid) = @_;
my $query = $db->prepare(QUERYSTRING);
$query->execute($uniqueid,$serverid);
if (my ($dt,$uid)=$query->fetchrow_array())
{
print("Database recorded $uid at $dt\n");
} else {
print("Can't find a database record for unique-id $uniqueid\n");
}
return;
}
#
# Main Block
#
# Initialize the user agent
$Agent = LWP::UserAgent->new();
# Initialize the database client
$Db = DBI->connect(DSN,DBUSER,DBPASS);
# URL
$Url = shift || DEFAULTURL;
if (($ServerId,$UniqueId) = getId($Agent,$Url))
{
findId($Db,$ServerId,$UniqueId);
} else {
print("Unable to get data from webserver");
exit 1;
}
If you run this program at a regular polling interval, it will warn you when the remote database is not responding or if the Blackbox table is not recording hits from the web servers.
Final Thoughts
If you've read the first article, you should already understand why you want to log your server performance data. The core concepts are still the same; I'm just introducing a few variations on improving the process.
The two new logging directives provide more flexibility with virtual hosting environments. It also allows having just one Blackbox log file for each running server.
If you want to take the really big step, consider the option of writing your logs straight to a database. The initial setup process may be complex, but after that there is a huge administrative benefit. It's an ideal solution for dealing with large server farms.
Chris Josephes works as a system administrator for Internet Broadcasting.
Return to the Apache DevCenter.
-
Database logging modules very bad idea
2005-02-21 22:14:31 Coyote-Moon [View]



