{"id":122,"date":"2008-09-05T21:02:06","date_gmt":"2008-09-06T02:02:06","guid":{"rendered":"http:\/\/vgable.com\/blog\/2008\/09\/05\/ascii-is-dangerous\/"},"modified":"2008-09-05T21:02:08","modified_gmt":"2008-09-06T02:02:08","slug":"ascii-is-dangerous","status":"publish","type":"post","link":"https:\/\/vgable.com\/blog\/2008\/09\/05\/ascii-is-dangerous\/","title":{"rendered":"ASCII is Dangerous"},"content":{"rendered":"<p><strong>Never use <code>NSASCIIStringEncoding<\/code><\/strong><br \/>\n<br \/>&#8220;Foreign&#8221; characters, like the &iuml; in &#8220;na&iuml;ve&#8221;, <em>will<\/em> break your code, if you use <code>NSASCIIStringEncoding<\/code>.  Such characters are more common then you might expect, even if you do not have an internationalized application.  &#8220;Smart quotes&#8221;, and most well-rendered punctuation marks, are not 7-bit ASCII.  For example, that last sentence can&#8217;t be encoded into ASCII, because my blog uses smart-quotes. (Seriously, [<code>thatSentence cStringUsingEncoding:NSASCIIStringEncoding]<\/code> will return <code>nil<\/code>!)<\/p>\n<p>Here are some simple alternatives:<\/p>\n<p><strong>C-String Paths<\/strong><br \/>\nUse <code>- (const char *)fileSystemRepresentation;<\/code> to get a C-string that you can pass to POSIX functions.  The C-string will be freed when the <code>NSString<\/code> it came from is freed.<\/p>\n<p><strong>An Alternate Encoding<\/strong><br \/>\n<code>NSUTF8StringEncoding<\/code> is the closest safe alternative to <code>NSASCIIStringEncoding<\/code>.  ASCII characters have the same representation in UTF-8 as in ASCII.  UTF-8 strings will <code>printf<\/code> correctly, but will look wrong (&#8216;fancy&#8217; characters will be garbage) if you use <code>NSLog(%s)<\/code>.<\/p>\n<p><strong>Native Foundation (<code>NSLog<\/code>) Encoding<\/strong><br \/>\nGenerally, Foundation uses UTF-16.  It is my understanding that this is what NSStrings are by default under the hood.  UTF-16 strings will look right if you print them with <code>NSLog(%s)<\/code>, but will not print correctly using <code>printf<\/code>.  In my experience <code>printf<\/code> truncates UTF-16 strings in an unpredictable way. <strong>Do not mix UTF-16 and <code>printf<\/code><\/strong>.<\/p>\n<p><strong>Convenience C-Ctrings<\/strong><br \/>\n<code>[someNSString UTF8String]<\/code> will give you a <code>const char *<\/code> to a <code>NULL<\/code>-terminated UTF8-string.  ASCII characters have the same representation in UTF-8 as in ASCII.<\/p>\n<p><strong>Take a minute to search all your projects for <code>NSASCIIStringEncoding<\/code>, and replace it with a more robust option.<\/strong><\/p>\n<p>It never hurts to <a href=\"http:\/\/www.codinghorror.com\/blog\/archives\/001084.html\">brush up on unicode<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Never use NSASCIIStringEncoding &#8220;Foreign&#8221; characters, like the &iuml; in &#8220;na&iuml;ve&#8221;, will break your code, if you use NSASCIIStringEncoding. Such characters are more common then you might expect, even if you do not have an internationalized application. &#8220;Smart quotes&#8221;, and most well-rendered punctuation marks, are not 7-bit ASCII. For example, that last sentence can&#8217;t be encoded [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[16,18,3,5,4],"tags":[165,161,162,78,159,160,163,164],"class_list":["post-122","post","type-post","status-publish","format-standard","hentry","category-accessibility","category-bug-bite","category-macosx","category-objective-c","category-programming","tag-ascii","tag-file-systems","tag-nsasciistringencoding","tag-nsstring","tag-paths","tag-strings","tag-unicode","tag-utf8"],"_links":{"self":[{"href":"https:\/\/vgable.com\/blog\/wp-json\/wp\/v2\/posts\/122","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/vgable.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/vgable.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/vgable.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/vgable.com\/blog\/wp-json\/wp\/v2\/comments?post=122"}],"version-history":[{"count":0,"href":"https:\/\/vgable.com\/blog\/wp-json\/wp\/v2\/posts\/122\/revisions"}],"wp:attachment":[{"href":"https:\/\/vgable.com\/blog\/wp-json\/wp\/v2\/media?parent=122"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/vgable.com\/blog\/wp-json\/wp\/v2\/categories?post=122"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/vgable.com\/blog\/wp-json\/wp\/v2\/tags?post=122"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}