The curl command-line tool is a one-stop shop for data transfer. It supports HTTP, FTP, LDAP, and other protocols. However, people who use it as just a download tool don't do it justice.
curl's inner workings use the libCURL client library. So can your programs, to make them URL aware. libCURL-enabled tools can perform downloads, replace fragile FTP scripts, and otherwise take advantage of networking without any (explicit) socket programming. The possibilities are endless, especially with libCURL using a MIT/X-style license agreement.
This article explains how to use libCURL's "easy" API, which is simple and should suit most needs. (I plan to cover the more powerful but more complex "shared" and "multi" interfaces in in a future article.) It uses the following scenarios to demonstrate libCURL programming:
GET: to fetch content from a URLPOST: to simulate a web form, such as a search engine callStubs though they may be, the samples are working tools that you can use as building blocks for your own libCURL experiments. Feel free to download the example code and join in.
libCURL is a C library. My examples are in C++, but a proficient C programmer should be able to follow along. That said, I've discovered a template technique that should make libCURL a little easier for C++ programmers.
I tested the sample code under Fedora Core 3, libCURL version 7.12.3. As libCURL is under active development, the examples may require slight modifications to work under different library versions.
A typical client/server scenario involves a connection, plus one or many request/response iterations. Consider an HTTP transfer:
GET or POST operation)libCURL sits in the middle of this process. To use it, configure a
context object with request data (URL, parameters) and response
handlers (callback functions). Pass this context to the library, which handles
low-level network transport (connection initiation and teardown, data transfer)
and calls your response handler(s).
Notice that libCURL doesn't really do anything with the data; it's more of a data transfer framework that fires your callbacks to do the heavy lifting. This clean separation of transport and handling abstracts your development from low-level networking and protocol concerns so that you can focus on writing your application.
Using libCURL's "easy" interface, then, involves the following sequence of API calls:
curl_global_init(), to initialize the curl library (once per
program)curl_easy_init(), to create a contextcurl_easy_setopt(), to configure that contextcurl_easy_perform(), to initiate the request and fire
any callbackscurl_easy_cleanup(), to clean up the contextcurl_global_cleanup(), to tear down the curl library (once per
program)The function curl_easy_setopt() deserves attention:
curl_easy_setopt(CURL* ctx , CURLoption key , value )
The parameters are the context, the option name, and the option value,
respectively. Think of value as a void* (it's really
not, but bear with me), because it can be any data type. That data should,
however, befit whatever key sets.
GET: Fetch a Web PageThe stub program step1 performs a simple HTTP GET operation. It
prints the response headers to standard error and the body to standard
output.
First, step1 calls curl_easy_init() to create a
context object (CURL*):
CURL* ctx = curl_easy_init() ;
It then calls curl_easy_setopt() several times to configure the
context. (CURLOPT_URL is the target URL.)
curl_easy_setopt( ctx , CURLOPT_URL , argv[1] ) ;
CURLOPT_WRITEHEADER is an open FILE* to which
libCURL will write the response headers. step1 sends them to
stderr.
Similarly, CURLOPT_WRITEDATA is a FILE*
destination (here, stdout) for the response body. This is text
data for HTTP requests but may be binary data for FTP or other transfer types.
Note that libCURL defines "read" as "sent data" and "write" as "received data";
some people may find these terms confusing.
CURLOPT_VERBOSE is helpful for debugging. This option tells
libCURL to print low-level diagnostic messages to standard error.
curl_easy_perform() makes the actual URL call. In the event of
an error, curl_easy_strerror() prints an error message:
const CURLcode rc = curl_easy_perform( ctx ) ;
// for curl v7.11.x and earlier, look into
// the option CURLOPT_ERRORBUFFER instead
if( CURLE_OK != rc ){
std::cerr << "Error from CURL: "
<< curl_easy_strerror( rc) << std::endl ;
} ...
Otherwise, you can call curl_easy_getinfo() to fetch transfer
statistics. Similar to curl_easy_setopt(), it takes a constant as
a key and a void* in which to store the data:
long statLong ;
curl_easy_getinfo( ctx , CURLINFO_HTTP_CODE , &statLong )
std::cout << "HTTP response code: " << statLong << std::endl ;
You must match the key constant to the pointer you provide: for example,
CURLINFO_HTTP_CODE (the numeric HTTP response code, such as
200 or 404) requires a long variable,
whereas CURLINFO_SIZE_DOWNLOAD (the number of bytes downloaded)
requires a double. Call curl_easy_cleanup() to
clean up the context object. Do this after any calls to
curl_easy_getinfo(), or you risk a segmentation fault.
curl_easy_setopt() doesn't copy any of the pointers you assign
to context values, nor does curl_easy_cleanup() destroy them. You
are responsible for ensuring pointer validity throughout the context's lifetime
and for cleaning up any resources after the context's teardown.
Web services are gaining steam, but plenty of systems still use plain old FTP jobs to transfer data between applications.
Scripts typically feed the ftp command instructions via
standard input. expect offers better error handling, because it
simulates an interactive session; but in my experience high-end expect
skills are fairly rare. Most annoying is that these scripts run outside of the
main application, so they bypass any tracing or error-handling facilities.
step2 addresses these concerns by moving the FTP pull into the native-code application itself. (Pretend that step2 is a code excerpt from a larger, long-running app.) It also demonstrates how to process the data as it downloads, so you don't have to store it in a temporary file.
step2 and step1 share a lot of code, some of which I've already explained.
The CURLOPT_WRITEFUNCTION context option specifies the function
libCURL will call as it downloads the remote file:
size_t showSize( ... ) ;
curl_easy_setopt( ctx , CURLOPT_WRITEFUNCTION , showSize ) ;
The for() loop in lines 202-239 sets up the FTP calls. The
CURLOPT_URL option is a URL created by concatenating the name of
the target file with the name of the server. libCURL will try to use the same
network connection for all of the FTP calls, because they share a context.
The value assigned to CURLOPT_WRITEDATA is available in the
CURLOPT_WRITEFUNCTION callback (here, showSize()).
This can be any data type, either native or user defined. The callback uses the
value as a means to keep state between invocations. In step2, this
is a custom XferInfo* object that stores information about the
downloaded file and the number of times the library has invoked the
callback:
class XferInfo {
void add( int more ) ;
int getBytesTransferred() const {
int getTimesCalled(){
} ;
...
XferInfo info ;
curl_easy_setopt( ctx , CURLOPT_WRITEDATA , &info ) ;
In turn, the showSize() callback does all of the work. It
tracks the size of the files downloaded from the FTP server. Note its
signature:
extern "C"
size_t showSize(
void *source ,
size_t size ,
size_t nmemb ,
void *userData
)
All CURLOPT_WRITEFUNCTION callbacks use this signature.
C++ users must expose callback functions with C linkage, hence the
extern "C" declaration. You can't specify an object member
function as a callback, but I've found a template technique to pass the work to
an object indirectly.
source is a buffer of data. I usually cast this to a
char* because I process text data (HTML, XML). This example
doesn't use this parameter because showSize() doesn't do anything
with the data itself.
Because source is not NULL-terminated, you can't
use standard string functions to determine its length. Instead, use the
product of size*nmemb.
userData is the value assigned to the
CURLOPT_WRITEDATA context option. Note that the libCURL manual
calls this parameter stream, likely because it's a
FILE* when using the default (libCURL internal) write function. I
call it userData because that's a little less confusing.
As userData is void*, you must cast it back to its
proper data type. showSize() casts it to an XferInfo
object and calls its add() member function to record the number of
bytes transferred in this call:
extern "C"
size_t showSize( ... ){
XferInfo* info = static_cast< XferInfo* >( userData ) ;
const int bufferSize = size * nmemb ;
info->add( bufferSize ) ;
On success, your callback should return the number of bytes it processed
(size*nmemb). libCURL compares this with the number of
bytes it passed your function and aborts the transfer if they don't match.
Return 0 to indicate the end of processing or some number less
than size*nmemb to indicate that an error occurred.
A callback may fire several times for the same download, because the library
hands you the file data in chunks. This is memory efficient if your code
operates on piecemeal data, such as with low-level text parsing. Otherwise, you
must store the data yourself as it comes in and handle it after the download,
after the call to curl_easy_perform() returns.
|
Legacy system uploads and downloads go hand in hand. The stub program step3 uses libCURL to log in to a remote FTP host and upload a file. It also describes a way to use a true C++ object as the callback handler (albeit indirectly).
step2 takes the remote FTP host, log in, and password as arguments. (A real app would read in this data from a config file; for now, pretend it's not a security problem to specify it on the command line.) The example merges the hostname with the target file name to form the URL:
ftp://host/file
It also concatenates the log in and password into a string used as the
context option CURLOPT_USERPWD:
login:password
Note that you can also put the log in info in the URL:
ftp://login:password@host/file
This URL format is simply another means to pass the log in information to the
API. Unlike a browser, libCURL doesn't use a cache or history bar that prying
eyes can later discover. (Hopefully, the remote FTP server doesn't log that
kind of information, either.) Nor is the information available via
ps browsing.
For firewall-friendly transfers, the context option
CURLOPT_FTP_USE_EPSV tells the library to use passive FTP.
libCURL doesn't limit you to file transfers alone. You can also send
arbitrary FTP commands, such as mkdir or cwd. Store
the commands in a libCURL linked list (curl_slist*):
struct curl_slist* commands = NULL ;
commands = curl_slist_append( commands , "mkdir /some/path" ) ;
commands = curl_slist_append( commands , "mkdir /another/path" ) ;
...
The CURLOPT_QUOTE context option executes commands after
logging in to the FTP server but before transferring data.
CURLOPT_POSTQUOTE specifies a list of commands to execute after
having transferred data.
curl_easy_setopt( ctx , CURLOPT_QUOTE , commands ) ;
// ... call curl_easy_perform() to run the FTP session ...
curl_slist_free_all( commands ) ;
You can use these context options to curl-ify your old FTP scripts.
step3 uses CURLOPT_QUOTE to call cwd / such
that the file uploads relative to the root directory of the FTP server.
(Without an explicit directory change, uploads operate to a path relative to
the user's home directory.)
The context option CURLOPT_UPLOAD tells the library this will
be an upload call.
CURLOPT_FTPAPPEND tells libCURL to append to the target file
instead of overwriting it. It's not necessary in this example, but it's
something you often see in legacy FTP jobs.
Similar to downloading data, when uploading you have a choice between passing the libCURL library a file handle or creating the data yourself in a callback.
To upload data from an existing file handle, set that FILE* as
the context option CURLOPT_READDATA.
To use a callback instead (for example, to generate upload data on the fly),
assign a function to the context option CURLOPT_READFUNCTION. The
function signature is very similar to that of
CURLOPT_WRITEFUNCTION:
size_t function(
char* buffer ,
size_t size ,
size_t nitems ,
void* userData
) ;
The difference is that buffer is where you store data
in this case, and the product size*nmemb is the maximum number of
bytes you can put there. (Return the number of bytes you put in the buffer.)
userData is the value assigned to
CURLOPT_READDATA.
step3's callback function is rather brief. I employ a C++ template technique to use an object indirectly as a callback handler. If you're not familiar with templates, note that the declaration
template< typename T >
class UploadHandler {
...
means the class UploadHandler is incomplete as written. The
data type T comes from elsewhere in the code, when registering the
function with libCURL:
curl_easy_setopt(
ctx ,
CURLOPT_READFUNCTION ,
UploadHandler< UploadData >::execute
);
Here, UploadData is the type of object the handler function
will use to do the work.
In turn, the static class function UploadHandler::execute() is
a mere pass-through: it casts the userData value to type
T and invokes T::execute() to do the actual work.
static size_t execute(
char* buffer ,
size_t size ,
size_t nitems ,
void* userData
){
T* realHandler = static_cast< T* >( userData ) ;
return( realHandler->execute( buffer , size , nitems ) ) ;
}
UploadHandler will work with any class that implements a fitting
execute member function. I could have used standard inheritance
instead:
// ... inside UploadHandler::execute() ...
Handler* h = static_cast< Handler* >( userData ) ;
return( h->execute( ... ) ) ;
I prefer the flexibility of templates, though. Inheritance would tie all handler
objects to the Handler interface.
Similar to download callbacks, the library may call upload callbacks called several times for a single file. Code accordingly.
POST (Populate a Web Form)HTTP POST operations send form data to a web server as well as
making code-to-code calls such as those in web services. The request body comprises the
POST data.
This article's final example, step4, demonstrates how to use
libCURL for an HTTP POST. It also explains how to set up custom
HTTP request headers, such as browser identification.
The POST body is just a string with &
characters between key=value pairs:
const char* postData = "param1=value1¶m2=value2&..." ;
Pass this string to the library by assigning it to the
CURLOPT_POSTFIELDS option:
curl_easy_setopt( ctx , CURLOPT_POSTFIELDS , postData ) ;
Assign a curl_slist* to CURLOPT_HTTPHEADER to set
custom HTTP headers:
curl_slist* responseHeaders = NULL ;
responseHeaders = curl_slist_append(
responseHeaders ,
"Expect: 100-continue"
) ;
// ... other curl_slist_append() calls ...
curl_easy_setopt(
ctx ,
CURLOPT_HTTPHEADER ,
responseHeaders
) ;
Note that libCURL clients skip the intermediate step of downloading and
processing a form's HTML. In turn, it is unaware of any hidden fields or
client-side technologies used therein (such as JavaScript). Put another way,
you have to know what fields the web server expects before you can use a libCURL
client to POST data.
libCURL provides clean, simple networking for your native-code applications.
With this API in your toolbox, you can incorporate one-off FTP operations into
your main application, automate HTTP POST requests, and more.
There's much more to libCURL than I've presented here. The examples should, however, give you a head start in putting libCURL to use in your own apps.
The article's sample code
includes the source for the stub programs, as well as a JSP and PHP page with
which to test step4. (The JSP requires a servlet spec 2.4
container, such as Tomcat 5, and a proper 2.4 web.xml.)
The pages simply echo the request headers and POST parameters
received from the client.
The curl web site has documentation and tutorials.
The TCPMon utility ships with Apache's Axis web services project. It's a
listening proxy that shows client/server conversations in a GUI window. I've
found it invaluable for debugging problems with my curl code, especially HTTP
POST operations.
TCPMon is a Java application and is thus portable to any Java-enabled platform that meets the JDK version requirements.
libCURL is especially useful for creating REST-based web services
clients. Also known as XML over HTTP, REST web service calls encapsulate HTTP
GET or POST requests instead of wrapping them in a SOAP envelope. Amazon.com, for example, offers
its public web services API via REST as well as SOAP. Yahoo's web services use REST
exclusively.
Q Ethan McCallum grew from curious child to curious adult, turning his passion for technology into a career.
|
Related Reading Linux Network Administrator's Guide |
Return to the Linux DevCenter.
Copyright © 2009 O'Reilly Media, Inc.