|
|
Spidering Hacks Hack #37: Downloading Comics with dailystripsLove comics but hate visiting multiple sites for your daily dose? Automate your stripping with some easy-to-use open source Perl software. It's hard to believe that, across all the cultures of the Internet, there's one common denominator of humor. Can you guess what it is? No, no; it's not the "All Your Base Are Belong to Us" videos. It's the comic strip. Whether you're into geek humor, political humor, or unfortunate youngsters forever failing to kick a football, there's a comic strip for you. In fact, there may be several comic strips for you. There may be so many that it's a pain to visit all the sites containing said comic strips to view them. But there's a great piece of software available to ease your woes: dailystrips grabs all the strips for you, presenting them in one HTML file. Combine it with cron [Hack #90] and you've got a great daily comic strip supplement right in your mailbox or web site. The author, Andrew Medico, makes it clear that if you set this up to run on a web site, you must ensure that you've configured your site to restrict access to you alone or risk some legal consequences. Getting the Codedailystrips is available at http://dailystrips.sourceforge.net/, and this hack covers Version 1.0.27. There are two components to the program: the program itself and the definitions file, which defines the details of the available comic strips. As of this writing, dailystrips supports over 500 different comic strips. Once you've downloaded the program, go back to the download page and grab the latest definitions file, which is updated often. Save it over the strips.def file that comes packaged in the ZIP archive with the application. Running the HackAfter installation (see the INSTALL file or installation instructions online at http://dailystrips.sourceforge.net/1.0.27/install.html), dailystrips runs from the command line with several options. Here are a few of the more important ones:
To grab the latest "Get Fuzzy" comic and save to a local file, run:
While the program is running, you'll get a count of any errors in retrieving the images of the strips. From my experiments, it looked like the nonsyndicated comics were easier to get and more consistent than the syndicated ones. Once the program is finished, it'll either spit some HTML to Hacking the HackIn this hack, we're not hacking the hack so much as hacking the defs file. The defs file defines from where the strips are retrieved and the code snippets that are used to retrieve them. The defs file also includes groups, which are shortcuts to retrieving several comics at once. More extensive information on how to define strips is available from the README.DEFS file. Defining strips by URLThe first way to define new strips is by generating a URL based on the current date. Here's an example for James Sharman's "Badtech" comic:
The first line specifies a unique strip name that you'll use to add the strip
to a group or get it from the command line. The second line, The final line, Finding strips with a searchThe other type of URL generation, searching, is as follows:
Notice that the options are similar to the options in the previous example.
The
Gathering strips into a groupIf you want to get a set of the same comic strips every day, it's kind of a pain to type them all in. dailystrips lets you specify a group name that gathers several comic strips at the same time. Groups go at the top of the definitions file and look like this:
Kevin Hemenway is the coauthor of Mac OS X Hacks, author of Spidering Hacks, and the alter ego of the pervasively strange Morbus Iff, creator of disobey.com, which bills itself as "content for the discontented." Tara Calishain is the creator of the site, ResearchBuzz. She is an expert on Internet search engines and how they can be used effectively in business situations. Return to the Web Development DevCenter. |
|
|
|
|
||||||||