Cooking with C#, Part 2
by Stephen Teilhet and Jay HilyardEditor's note: Stephen Teilhet and Jay Hilyard, authors of the recently released C# Cookbook, hand-selected these recipes to excerpt on ONDotNet to give you a real glimpse at the kinds of solutions you'll find in the book. Like all the recipes in this latest release in O'Reilly's cookbook series, the solutions here get straight to the heart of the problem, like how to use the GetHTMLFromURL method to grab the HTML you want from a URL. And in case you missed them, check out the first batch of recipes the authors chose for publishing here.
Recipe 13.8: Obtaining the HTML from a URL
Problem
You need to get the HTML returned from a web server in order to examine it for items of interest. For example, you could examine the returned HTML for links to other pages or for headlines from a news site.
Solution
We can use the methods for web communication we have set up in Recipe 13.5
and Recipe 13.6 to make the HTTP request and verify the response; then, we can
get at the HTML via the ResponseStream property of the
HttpWebResponse object:
public static string GetHTMLFromURL(string url)
{
if(url.Length == 0)
throw new ArgumentException("Invalid URL","url");
string html = "";
HttpWebRequest request = GenerateGetOrPostRequest(url,"GET",null);
HttpWebResponse response = (HttpWebResponse)request.GetResponse( );
try
{
if(VerifyResponse(response)== ResponseCategories.Success)
{
// get the response stream.
Stream responseStream = response.GetResponseStream( );
// use a stream reader that understands UTF8
StreamReader reader = new StreamReader(responseStream,Encoding.UTF8);
try
{
html = reader.ReadToEnd( );
}
finally
{
// close the reader
reader.Close( );
}
}
}
finally
{
response.Close( );
}
return html;
}
Discussion
The GetHTMLFromURL method is set up to get a web page using the
GenerateGetOrPostRequest and
GetResponse methods, verify the response using the VerifyResponse
method, and then, once we have a valid
response, we start looking for the HTML that was returned.
|
Related Reading C# Cookbook |
The GetResponseStream method on
the HttpWebResponse provides access to the body of the message that
was returned in a System.IO.Stream object. In order to read the
data, we instantiate a StreamReader with the response stream and
the UTF8 property of the Encoding class to allow for the UTF8-encoded text data
to be read correctly from the stream. We then call ReadToEnd on the
StreamReader, which puts all of the content in the string variable
called html and return it.
See Also
See the "HttpWebResponse.GetResponseStream Method," "Stream Class," and "StringBuilder Class" topics in the MSDN documentation.
Pages: 1, 2 |


