{"id":135,"date":"2008-09-24T22:04:23","date_gmt":"2008-09-25T03:04:23","guid":{"rendered":"http:\/\/vgable.com\/blog\/2008\/09\/24\/xml-parsing-youre-doing-it-wrong\/"},"modified":"2008-09-24T22:04:26","modified_gmt":"2008-09-25T03:04:26","slug":"xml-parsing-youre-doing-it-wrong","status":"publish","type":"post","link":"https:\/\/vgable.com\/blog\/2008\/09\/24\/xml-parsing-youre-doing-it-wrong\/","title":{"rendered":"XML Parsing: You&#8217;re Doing it Wrong"},"content":{"rendered":"<blockquote><p>There are lots of examples of people using text searching and regular expressions to find data in webpages. <strong>These examples are doing it wrong.<\/strong><\/p>\n<p><strong><code>NSXMLDocument<\/code> and an XPath query are your friends.<\/strong> They really make finding elements within a webpage, RSS feed or XML documents very easy.<\/p><\/blockquote>\n<p>&#8212; <a href=\"http:\/\/cocoawithlove.com\/2008\/09\/cocoa-application-driven-by-http-data.html\">Matt Gallagher<\/a><\/p>\n<p>I haven&#8217;t used XPath before, but after seeing Matt&#8217;s example code, I am convinced he&#8217;s right, because I&#8217;ve seen the other side of things.  (I&#8217;ll let you in on a dirty little secret &#8212; right now the worst bit of the code-base I&#8217;m working on parses XML.)<\/p>\n<p><code>&nbsp;&nbsp;&nbsp;&nbsp;NSError&nbsp;*error;<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;NSXMLDocument&nbsp;*document&nbsp;=<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[[NSXMLDocument&nbsp;alloc]&nbsp;initWithData:responseData&nbsp;options:NSXMLDocumentTidyHTML&nbsp;error:&error];<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;[document autorelease];<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;\/\/&nbsp;Deliberately&nbsp;ignore&nbsp;the&nbsp;error:&nbsp;with&nbsp;most&nbsp;HTML&nbsp;it&nbsp;will&nbsp;be&nbsp;filled&nbsp;with<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;\/\/&nbsp;numerous&nbsp;\"tidy\"&nbsp;warnings.<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;NSXMLElement&nbsp;*rootNode&nbsp;=&nbsp;[document&nbsp;rootElement];<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;NSString&nbsp;*xpathQueryString&nbsp;=<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;@\"\/\/div[@id='newtothestore']\/div[@class='modulecontent']\/div[@class='list_content']\/ul\/li\/a\";<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;NSArray&nbsp;*newItemsNodes&nbsp;=&nbsp;[rootNode&nbsp;nodesForXPath:xpathQueryString&nbsp;error:&error];<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;if&nbsp;(error)<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;{<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[[NSAlert&nbsp;alertWithError:error]&nbsp;runModal];<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;return;<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;}<\/code><br \/>\n(I added <em><code>[document autorelease];<\/code><\/em> to the above code, because you should <em>always<\/em> immediately balance an <code>alloc<\/code>\/<code>init<\/code> with <code>autorelease<\/code>, outside of your own <code>init<\/code> methods.)<\/p>\n","protected":false},"excerpt":{"rendered":"<p>There are lots of examples of people using text searching and regular expressions to find data in webpages. These examples are doing it wrong. NSXMLDocument and an XPath query are your friends. They really make finding elements within a webpage, RSS feed or XML documents very easy. &#8212; Matt Gallagher I haven&#8217;t used XPath before, [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[6,3,5,4,24,13,10],"tags":[184,185],"class_list":["post-135","post","type-post","status-publish","format-standard","hentry","category-cocoa","category-macosx","category-objective-c","category-programming","category-quotes","category-sample-code","category-tips","tag-xml","tag-xpath"],"_links":{"self":[{"href":"https:\/\/vgable.com\/blog\/wp-json\/wp\/v2\/posts\/135","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/vgable.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/vgable.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/vgable.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/vgable.com\/blog\/wp-json\/wp\/v2\/comments?post=135"}],"version-history":[{"count":0,"href":"https:\/\/vgable.com\/blog\/wp-json\/wp\/v2\/posts\/135\/revisions"}],"wp:attachment":[{"href":"https:\/\/vgable.com\/blog\/wp-json\/wp\/v2\/media?parent=135"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/vgable.com\/blog\/wp-json\/wp\/v2\/categories?post=135"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/vgable.com\/blog\/wp-json\/wp\/v2\/tags?post=135"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}