Vincent Gable’s Blog

September 24, 2008

XML Parsing: You’re Doing it Wrong

Filed under: Cocoa,MacOSX,Objective-C,Programming,Quotes,Sample Code,Tips | ,
― Vincent Gable on September 24, 2008

There are lots of examples of people using text searching and regular expressions to find data in webpages. These examples are doing it wrong.

NSXMLDocument and an XPath query are your friends. They really make finding elements within a webpage, RSS feed or XML documents very easy.

Matt Gallagher

I haven’t used XPath before, but after seeing Matt’s example code, I am convinced he’s right, because I’ve seen the other side of things. (I’ll let you in on a dirty little secret — right now the worst bit of the code-base I’m working on parses XML.)

    NSError *error;
    NSXMLDocument *document =
        [[NSXMLDocument alloc] initWithData:responseData options:NSXMLDocumentTidyHTML error:&error];
    [document autorelease];
    
    // Deliberately ignore the error: with most HTML it will be filled with
    // numerous "tidy" warnings.
    
    NSXMLElement *rootNode = [document rootElement];
    
    NSString *xpathQueryString =
        @"//div[@id='newtothestore']/div[@class='modulecontent']/div[@class='list_content']/ul/li/a";
    NSArray *newItemsNodes = [rootNode nodesForXPath:xpathQueryString error:&error];
    if (error)
    {
        [[NSAlert alertWithError:error] runModal];
        return;
    }

(I added [document autorelease]; to the above code, because you should always immediately balance an alloc/init with autorelease, outside of your own init methods.)

Powered by WordPress