<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Vincent Gable's Blog &#187; UTF8</title>
	<atom:link href="http://vgable.com/blog/tag/utf8/feed/" rel="self" type="application/rss+xml" />
	<link>http://vgable.com/blog</link>
	<description>my weblog.</description>
	<lastBuildDate>Tue, 29 Nov 2011 22:20:23 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>When In Doubt, UTF8</title>
		<link>http://vgable.com/blog/2009/07/03/when-in-doubt-utf8/</link>
		<comments>http://vgable.com/blog/2009/07/03/when-in-doubt-utf8/#comments</comments>
		<pubDate>Fri, 03 Jul 2009 17:16:59 +0000</pubDate>
		<dc:creator>Vincent Gable</dc:creator>
				<category><![CDATA[Accessibility]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[ASCII]]></category>
		<category><![CDATA[i18n]]></category>
		<category><![CDATA[Unicode]]></category>
		<category><![CDATA[UTF8]]></category>
		<category><![CDATA[Web]]></category>

		<guid isPermaLink="false">http://vgable.com/blog/?p=339</guid>
		<description><![CDATA[/* If you are uncertain of the correct encoding, you should use UTF-8, */ /* which is the encoding designated by RFC 2396 as the correct encoding */ /* for use in URLs.… */ &#8211; CFURL.h This echos my experience, when in doubt, choose UTF8 for the web. UTF8 is backwards compatible with 7-bit ASCII [...]]]></description>
			<content:encoded><![CDATA[<blockquote>
<pre>
/* If you are uncertain of the correct encoding, you should use UTF-8, */
/* which is the encoding designated by <a href="http://www.faqs.org/rfcs/rfc2396.html">RFC 2396</a> as the correct encoding */
/* for use in URLs.… */
</pre>
</blockquote>
<p>&#8211; <a href="http://www.opensource.apple.com/source/CF/CF-476.15/CFURL.h"><code>CFURL.h</code></a></p>
<p>This echos my experience, <strong>when in doubt, choose <a href="http://en.wikipedia.org/wiki/UTF-8">UTF8</a> for the web</strong>. UTF8 is backwards compatible with 7-bit ASCII (eg. &#8216;A&#8217; is 0&#215;41 in ASCII and UTF8).</p>
<p>But know that UTF8 is a variable-length encoding: non-ASCII <strong>characters maybe represented by > 1 byte</strong>. As a general rule with Unicode, I <strong>do <em>not</em> expect a <code>char</code> or <code>wchar_t</code> to always map to a character in a string</strong>. Encoding details can be messy, e.g. &#8220;É&#8221; might be represented as one character, or two composed characters &#8220;´E&#8221;. It never hurts to <a href="http://www.codinghorror.com/blog/archives/001084.html">brush up on Unicode</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://vgable.com/blog/2009/07/03/when-in-doubt-utf8/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>ASCII is Dangerous</title>
		<link>http://vgable.com/blog/2008/09/05/ascii-is-dangerous/</link>
		<comments>http://vgable.com/blog/2008/09/05/ascii-is-dangerous/#comments</comments>
		<pubDate>Sat, 06 Sep 2008 02:02:06 +0000</pubDate>
		<dc:creator>Vincent Gable</dc:creator>
				<category><![CDATA[Accessibility]]></category>
		<category><![CDATA[Bug Bite]]></category>
		<category><![CDATA[MacOSX]]></category>
		<category><![CDATA[Objective-C]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[ASCII]]></category>
		<category><![CDATA[File Systems]]></category>
		<category><![CDATA[NSASCIIStringEncoding]]></category>
		<category><![CDATA[NSString]]></category>
		<category><![CDATA[Paths]]></category>
		<category><![CDATA[Strings]]></category>
		<category><![CDATA[Unicode]]></category>
		<category><![CDATA[UTF8]]></category>

		<guid isPermaLink="false">http://vgable.com/blog/2008/09/05/ascii-is-dangerous/</guid>
		<description><![CDATA[Never use NSASCIIStringEncoding &#8220;Foreign&#8221; characters, like the &#239; in &#8220;na&#239;ve&#8221;, will break your code, if you use NSASCIIStringEncoding. Such characters are more common then you might expect, even if you do not have an internationalized application. &#8220;Smart quotes&#8221;, and most well-rendered punctuation marks, are not 7-bit ASCII. For example, that last sentence can&#8217;t be encoded [...]]]></description>
			<content:encoded><![CDATA[<p><strong>Never use <code>NSASCIIStringEncoding</code></strong><br />
<br />&#8220;Foreign&#8221; characters, like the &iuml; in &#8220;na&iuml;ve&#8221;, <em>will</em> break your code, if you use <code>NSASCIIStringEncoding</code>.  Such characters are more common then you might expect, even if you do not have an internationalized application.  &#8220;Smart quotes&#8221;, and most well-rendered punctuation marks, are not 7-bit ASCII.  For example, that last sentence can&#8217;t be encoded into ASCII, because my blog uses smart-quotes. (Seriously, [<code>thatSentence cStringUsingEncoding:NSASCIIStringEncoding]</code> will return <code>nil</code>!)</p>
<p>Here are some simple alternatives:</p>
<p><strong>C-String Paths</strong><br />
Use <code>- (const char *)fileSystemRepresentation;</code> to get a C-string that you can pass to POSIX functions.  The C-string will be freed when the <code>NSString</code> it came from is freed.</p>
<p><strong>An Alternate Encoding</strong><br />
<code>NSUTF8StringEncoding</code> is the closest safe alternative to <code>NSASCIIStringEncoding</code>.  ASCII characters have the same representation in UTF-8 as in ASCII.  UTF-8 strings will <code>printf</code> correctly, but will look wrong ('fancy' characters will be garbage) if you use <code>NSLog(%s)</code>.</p>
<p><strong>Native Foundation (<code>NSLog</code>) Encoding</strong><br />
Generally, Foundation uses UTF-16.  It is my understanding that this is what NSStrings are by default under the hood.  UTF-16 strings will look right if you print them with <code>NSLog(%s)</code>, but will not print correctly using <code>printf</code>.  In my experience <code>printf</code> truncates UTF-16 strings in an unpredictable way. <strong>Do not mix UTF-16 and <code>printf</code></strong>.</p>
<p><strong>Convenience C-Ctrings</strong><br />
<code>[someNSString UTF8String]</code> will give you a <code>const char *</code> to a <code>NULL</code>-terminated UTF8-string.  ASCII characters have the same representation in UTF-8 as in ASCII.</p>
<p><strong>Take a minute to search all your projects for <code>NSASCIIStringEncoding</code>, and replace it with a more robust option.</strong></p>
<p>It never hurts to <a href="http://www.codinghorror.com/blog/archives/001084.html">brush up on unicode</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://vgable.com/blog/2008/09/05/ascii-is-dangerous/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

