HTML Tools on the Mac Command Lineby Robert Daeley
In a recent blog entry here (The Tell-Tail Heart), I covered using the tail utility -- one of the many text-editing and manipulation programs available on the Mac OS X command line. And when it comes right down to it, HTML editing is text editing. There will be times -- such as SSHing into your server -- that being able to do things via the CLI will be invaluable. Even if you're working on your local development box, having some powerful utilities in your tool belt can't hurt at all.
I'll be focusing on how these few utilities can help while working with HTML on Mac OS X. If you haven't already, you'll need to install the Developers Tools, available with your install disks, or from developer.apple.com. Also, the following assumes you are using Tiger (10.4) and are familiar with using the Terminal and bash shell. It may also apply to earlier system versions, but I don't have any of those available to confirm.
Not sure if you've got the utilities? Open up a Terminal window and type:
which foo and hit return, where foo is the name of the utility in question. If you have foo, the system will respond with a pathname where it is. In the case of the utilities below, they will be probably be located in
textutil is something of a Rosetta stone for different file types. It allows you to convert amongst various text formats more or less instantly and without a fuss: txt, html, rtf, rtfd, doc, wordml, and webarchive. One of my favorite uses is to feed it a bunch of paragraphs of text from somebody and get nice HTML-tagged paragraphs out.
Basic usage is quite easy. Let's say you have a plain text file -- five paragraphs of political ranting called screed.txt -- that you want to post on a website. You're still white-hot angry over whatever it was you were ranting about and would rather not spend any more time HTMLizing the whole thing. Here's the command:
textutil -convert html screed.txt
This produces a file called screed.html in the same directory, leaving the original screed.txt file alone. That new file is a complete HTML document, even down to the
DOCTYPE declaration at the beginning. You'll find a bit of CSS code, but otherwise it's fairly clean. Anything you want to get rid of you can do quickly in your favorite text editor. One thing I've noticed in this regard is that it will take blank lines between text paragraphs and attempt to replicate the space using a
<p><br></p> combo -- you can eliminate them in the original .txt file, or simply find and replace them in the HTML.
At this point, you could upload the file to your server, or (if you only need the content portion) copy and paste the
<p></p> paragraphs wherever you need them.
There are quite a number of command-line options available for textutil, too many to go into in this article, but there are a couple of techniques I'd like to highlight. Check out
man textutil to see them all.
First off is some metadata handling. Let's say I wanted to make sure my screed gets a proper title and attribution in the HTML head section. Here's one way. Type on one line:
textutil -convert html -title "Death to all extremists" -author "Robert Daeley" screed.txt
<title>Death to all extremists</title> shows up in our HTML file, as well as an author metatag. You also have subject, keywords, comment, editor, and other metadata available to use.
Another one of my favorite textutil uses is the wildcard functionality. If you have not one file but a directory of screed text files that you want to convert into a single HTML page, it's as simple as doing this (on one line):
textutil -cat html -output screed.html -title "Screedorama" -author "Robert Daeley" *.txt
-cat argument takes the the contents of all the *.txt files in the working directory and outputs them to screed.html, with the given title and author.