There are lots of examples of people using text searching and regular expressions to find data in webpages. These examples are doing it wrong.
NSXMLDocument
and an XPath query are your friends. They really make finding elements within a webpage, RSS feed or XML documents very easy.
I haven’t used XPath before, but after seeing Matt’s example code, I am convinced he’s right, because I’ve seen the other side of things. (I’ll let you in on a dirty little secret — right now the worst bit of the code-base I’m working on parses XML.)
NSError *error;
NSXMLDocument *document =
[[NSXMLDocument alloc] initWithData:responseData options:NSXMLDocumentTidyHTML error:&error];
[document autorelease];
// Deliberately ignore the error: with most HTML it will be filled with
// numerous "tidy" warnings.
NSXMLElement *rootNode = [document rootElement];
NSString *xpathQueryString =
@"//div[@id='newtothestore']/div[@class='modulecontent']/div[@class='list_content']/ul/li/a";
NSArray *newItemsNodes = [rootNode nodesForXPath:xpathQueryString error:&error];
if (error)
{
[[NSAlert alertWithError:error] runModal];
return;
}
(I added [document autorelease];
to the above code, because you should always immediately balance an alloc
/init
with autorelease
, outside of your own init
methods.)