O'Reilly Databases

oreilly.comSafari Books Online.Conferences.

We've expanded our coverage and improved our search! Search for all things Database across O'Reilly!

Search Search Tips

advertisement
AddThis Social Bookmark Button

Print Subscribe to Databases Subscribe to Newsletters

Writing Apache's Logs to MySQL
Pages: 1, 2, 3

Querying the Database

Queries can run on the database immediately. The examples below are in straight SQL syntax. Some use the nested query syntax available only in MySQL 4.1 or above.

If you intend to write automated scripts in Perl or Python or whatever, it may be easier to run multiple SELECT statements or to compute derived values inside the scripting language itself. Whenever you request reports on web server performance, you'll need to specify the date range you want to cover.

Query: total bytes in and out

You want to find out how much network traffic your web server sends and receives. Keep in mind that this reflects traffic levels only at the application layer of the TCP/IP stack. It won't record the size of TCP or IP packets. That's a question SNMP polling tools can answer better.

select sum(bytesin), sum(bytesout) from blackbox \
	where datetime = curdate();

select sum(bytesin),sum(bytesout) from blackbox where datetime >= 
        date_sub(curdate(),interval 7 day);

The first example gives the total bytes in and bytes out for the given day. The second example gives the total for the past seven days.

Query: what percentage of hits goes to RSS syndication pages compared with HTML pages?

This example checks the number of hits against a specific RSS page against every possible hit against the server.

select (select count(url) from blackbox where url regexp '\.rss$')/
        (select count(url) from blackbox where url regexp '\.html$');

RSS hits are typically automated. Because of this, a lot of sites have a higher percentage of pure robot/agent traffic on their site than they may have had two years ago. By checking the RSS hits, you can determine whether the agent traffic is overwhelming your site.

Query: how often do users skip the Flash animation?

If you see a web site with one of those Flash animations on the main page, it usually has a "Skip" link at the bottom. If you're going to include one, throw in a meaningless query string at the end of the link, so you can determine which hits are redirects after the animation has finished and which ones come from people clicking on the link.

select (select count(url) from blackbox where url="/home.html?skipped"),
        (select count(url) from blackbox where url="/home.html"),

Verifying that the database is working

Here's a two-part script you can run to verify that the database really is logging hits from the servers. First, add this script to the cgi-bin directory of every web site that uses mod_log_mysql.

#!/usr/local/bin/perl

# returnid.pl

use strict;
use warnings;

print("X-ServerId: $ENV{'SERVERID'}\n");
print("X-UniqueId: $END{'UNIQUE_ID'}\n");
print "Content-type: text/plain\n\n";

print("Server id is $ENV{'SERVERID'}\n");
print("Unique id is $ENV{'UNIQUE_ID'}\n");

exit 0;

The server will return its serverid and unique ID for that particular hit. The next program will act as an HTTP client. It will retrieve this URL, then connect to the database and search the logs for the unique ID.

#!/usr/local/bin/perl

# checkid.pl

#
# Pragmas
#

use strict;
use warnings;

#
# Includes
#

use LWP::UserAgent;
use HTTP::Request;
use HTTP::Response;

use DBI;

#
# Global Variables
#

use vars qw/$Url $Agent $Db $ServerId $UniqueId /;

use constant {
  DSN         => "DBI:mysql:database=apachelog",
  DBUSER      => "logwriter",
  DBPASS      => "logpass",
  QUERYSTRING => 
   "select datetime,uniqueid from blackbox where uniqueid=? and serverid=?",
  DEFAULTURL  => "http://www.example.com:8080/cgi-bin/returnid.pl",
};

#
# Subroutines
#

sub getId
{
    my ($agent,$url) = @_;
    my $request      = HTTP::Request->new(GET => $url);
    my $response     = $agent->request($request);

    if ($response->is_success)
    {
        my $serverid=$response->header("X-ServerId");
        my $uniqueid=$response->header("X-UniqueId");
        print("Unique ID from server is $uniqueid\n");
        print("Server ID is $serverid\n");
        return ($serverid,$uniqueid);
    }

    return undef;
}

sub findId
{
    my ($db,$serverid,$uniqueid) = @_;

    my $query                    = $db->prepare(QUERYSTRING);

    $query->execute($uniqueid,$serverid);

    if (my ($dt,$uid)=$query->fetchrow_array())
    {
        print("Database recorded $uid at $dt\n");
    } else {
        print("Can't find a database record for unique-id $uniqueid\n");
    }

    return;
}

#
# Main Block
#

# Initialize the user agent
$Agent = LWP::UserAgent->new();

# Initialize the database client
$Db = DBI->connect(DSN,DBUSER,DBPASS);

# URL
$Url = shift || DEFAULTURL;

if (($ServerId,$UniqueId) = getId($Agent,$Url))
{
        findId($Db,$ServerId,$UniqueId);
} else {
        print("Unable to get data from webserver");
        exit 1;
}

If you run this program at a regular polling interval, it will warn you when the remote database is not responding or if the Blackbox table is not recording hits from the web servers.

Final Thoughts

If you've read the first article, you should already understand why you want to log your server performance data. The core concepts are still the same; I'm just introducing a few variations on improving the process.

The two new logging directives provide more flexibility with virtual hosting environments. It also allows having just one Blackbox log file for each running server.

If you want to take the really big step, consider the option of writing your logs straight to a database. The initial setup process may be complex, but after that there is a huge administrative benefit. It's an ideal solution for dealing with large server farms.

Chris Josephes works as a system administrator for Internet Broadcasting.


Return to the Apache DevCenter.


Comments on this article

1 to 1 of 1
  1. Database logging modules very bad idea
    2005-02-21 22:14:31  Coyote-Moon [View]

1 to 1 of 1


Tagged Articles

Be the first to post this article to del.icio.us

Related to this Article

Data Jujitsu: The Art of Turning Data into Product Data Jujitsu: The Art of Turning Data into Product
November 2012
$0.00 USD

Designing Great Data Products Designing Great Data Products
March 2012
$0.00 USD

Sponsored Resources

  • Inside Lightroom
Advertisement
O'reilly

© 2013, O’Reilly Media, Inc.

(707) 827-7019 (800) 889-8969

All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners.

About O'Reilly

  • Academic Solutions
  • Jobs
  • Contacts
  • Corporate Information
  • Press Room
  • Privacy Policy
  • Terms of Service
  • Writing for O'Reilly

Community

  • Authors
  • Community & Featured Users
  • Forums
  • Membership
  • Newsletters
  • O'Reilly Answers
  • RSS Feeds
  • User Groups

Partner Sites

  • makezine.com
  • makerfaire.com
  • craftzine.com
  • igniteshow.com
  • PayPal Developer Zone
  • O'Reilly Insights on Forbes.com

Shop O'Reilly

  • Customer Service
  • Contact Us
  • Shipping Information
  • Ordering & Payment
  • The O'Reilly Guarantee