Vincent Gable’s Blog

April 10, 2009

Percent Escapes Gotcha

Filed under: Bug Bite,Cocoa,Programming,Sample Code | ,
― Vincent Gable on April 10, 2009

If you use stringByAddingPercentEscapesUsingEncoding: more than once on a string, the resulting string will not decode correctly from just one call to stringByReplacingPercentEscapesUsingEncoding:. (stringByAddingPercentEscapesUsingEncoding: is not indempotent).

NSString *string = @"100%";
NSString *escapedOnce = [string stringByAddingPercentEscapesUsingEncoding:NSUTF8StringEncoding];
NSString *escapedTwice = [escapedOnce stringByAddingPercentEscapesUsingEncoding:NSUTF8StringEncoding];
NSLog(@"%@ escaped once: %@, escaped twice: %@", string, escapedOnce, escapedTwice);

100% escaped once: 100%25, escaped twice: 100%2525

I thought I was programming defensively by eagerly adding percent-escapes to any string that would become part of a URL. But this caused some annoying bugs resulting form a string being percent-escaped more then once. My solution was to create an indempotent replacement for stringByAddingPercentEscapesUsingEncoding: (I also simplified things a little by removing the encoding parameter, because I never used any encoding other then NSUTF8StringEncoding),

@implementation NSString (IndempotentPercentEscapes)
- (NSString*) stringByReplacingPercentEscapesOnce;
{
	NSString *unescaped = [self stringByReplacingPercentEscapesUsingEncoding:NSUTF8StringEncoding];
	//self may be a string that looks like an invalidly escaped string,
	//eg @"100%", in that case it clearly wasn't escaped,
	//so we return it as our unescaped string.
	return unescaped ? unescaped : self;
}

- (NSString*) stringByAddingPercentEscapesOnce;
{
	return [[self stringByReplacingPercentEscapesOnce] stringByAddingPercentEscapesUsingEncoding:NSUTF8StringEncoding];
}
@end

Usage example,

NSString *string = @"100%";
NSString *escapedOnce = [string stringByAddingPercentEscapesOnce];
NSString *escapedTwice = [escapedOnce stringByAddingPercentEscapesOnce];
NSLog(@"%@ escaped once: %@, escaped twice: %@", string, escapedOnce, escapedTwice);

100% escaped once: 100%25, escaped twice: 100%25

The paranoid have probably noticed that [aBadlyEncodedString stringByReplacingPercentEscapesOnce] will return aBadlyEncodedString not nil, This could make it harder to detect an error.

But it’s not something that I’m worried about for my application. Since I only ever use a UTF8 encoding, and it can represent any unicode character, it’s not possible to have an invalid string. But it’s certainly something to be aware of in situations where you might have strings with different encodings.

3 Comments »

  1. My inability to read Obj-C syntax prevents me from understanding your method. How exactly can an idempotent version of an encoder without making some heuristic assumption about what ‘encoded’ looks like? I mean if someone has a fairly simple string it’s fairly easy, but what if I WANT to encode this string:

    “100% escaped once: 100%25, escaped twice: 100%2525”

    to be properly sendable in a URL. What’s wrong with just using the normal encode/decode functions and being careful to use each once? Outgoing, encoding needs to be the last thing you do. Incoming, decoding is the first.

    Comment by Jason Petersen — April 11, 2009 @ 5:34 pm

  2. Good catch Jason, if someone types in something that has a valid percent escape, eg “cat%20dog”, then I will incorrectly un-encode it it into “cat dog” by stringByReplacingPercentEscapesOnce and send a URL with “cat%20dog” not “cat%2520dog”.

    My application takes text typed into a search-box, and builds a search-URL from it. It also takes the URL of a wikipedia page, extracts the page name from it, un-mangles it, and puts it in the search box (which conceptually acts like the URL/search bar in Chrome.) Wikipedia has a nice property that if your search-string matches a page name, you go right to that page name.

    So I’m not sure if this is anything but an academic problem, unless there are page names that have percent-escapes in their real name.

    Still, it looks like I probably will not be reusing these functions anytime soon.

    Comment by Vincent Gable — April 11, 2009 @ 7:20 pm

  3. “What’s wrong with just using the normal encode/decode functions and being careful to use each once? ”

    I don’t like being careful if I don’t have to :-). But seriously, if I go down that route I will probably make a PercentEscapedNSString object, that just holds an NSString and on creation percent-escapes whatever string it’s given. That way I can let the type-system make sure I’m doing things right.

    Comment by Vincent Gable — April 11, 2009 @ 7:26 pm

RSS feed for comments on this post.

Leave a comment

Powered by WordPress