Vincent Gable’s Blog

May 1, 2009

NSXMLParser and HTML/XHTML

Filed under: Bug Bite,iPhone,Objective-C,Programming | , , , ,
― Vincent Gable on May 1, 2009

NSXMLParser converts HTML/XML-entities in the string it gives the delegate callback -(void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)string. So if an XML file contains the string, "&lt; or &gt;", the converted string "< or >" would be reported to the delegate, not the string that you would see if you opened the file with TextEdit.

This is correct behavior for XML files, but it can cause problems if you are trying to use an NSXMLParser to monkey with XHTML/HTML.

I was using an NSXMLParser to modify an XHTML webpage from Simple Wikipedia, and it was turning: “#include &lt;stdio&gt;” into “#include <stdio>“, which then displayed as “#include “, because WebKit thought <stdio> was a tag.

Solution: Better Tools

For scraping/reading a webpage, XPath is the best choice. It is faster and less memory intensive then NSXMLParser, and very concise. My experience with it has been positive.

For modifying a webpage, JavaScript might be a better fit then Objective-C. You can use
- (NSString *)stringByEvaluatingJavaScriptFromString:(NSString *)script to execute JavaScript inside a UIWebView in any Cocoa program. Neat stuff!

My Unsatisfying Solution

Do not use this, see why below:

- (void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)string;
{
	string = [string stringByReplacingOccurrencesOfString:@"<" withString:@"&lt;"];
	string = [string stringByReplacingOccurrencesOfString:@">" withString:@"&gt;"];

	/* ... rest of the method */
}

Frankly that code scares me. I worry I’m not escaping something I should be. Experience has taught me I don’t have the experience of the teams who wrote HTML libraries, so it’s dangerous to try and recreate their work.

(UPDATED 2009-05-26: And indeed, I screwed up. I was replacing & with &amp;, and that was causing trouble. While my “fix” of not converting & seems to work on one website, it will not in general.)

I would like to experiment with using JavaScript instead of an NSXMLParser, but at the moment I have a working (and surprisingly compact) NSXMLParser implementation, and much less familiarity with JavaScript then Objective-C. And compiled Obj-C code should be more performant then JavaScript. So I’m sticking with what I have, at least until I’ve gotten Prometheus 1.0 out the door.

April 22, 2009

-[NSURL isEqual:] Gotcha

Filed under: Bug Bite,Cocoa,iPhone,MacOSX,Programming,Sample Code | , , , , , ,
― Vincent Gable on April 22, 2009

BREAKING UPDATE: Actually comparing the -absoluteURL or -absoluteString of two NSURLs that represent a file is not good enough. One may start file:///, and the other file://localhost/, and they will not be isEqual:! A work around is to compare the path of each NSURL. I’m still looking into the issue, but for now I am using the following method to compare NSURLs.

@implementation NSURL (IsEqualTesting)
- (BOOL) isEqualToURL:(NSURL*)otherURL;
{
	return [[self absoluteURL] isEqual:[otherURL absoluteURL]] || 
	[self isFileURL] && [otherURL isFileURL] &&
	([[self path] isEqual:[otherURL path]]);
}
@end

[a isEqual:b] may report NO for two NSURLs that both resolve to the same resource (website, file, whatever). So compare NSURLs like [[a absoluteString] isEqual:[b absoluteString]]. It’s important to be aware of this gotcha, because URLs are Apple’s preferred way to represent file paths, and APIs are starting to require them. Equality tests that worked for NSString file-paths may fail with NSURL file-paths.

The official documentation says

two NSURLs are considered equal if they both have the same base baseURL and relativeString.

Furthermore,

An NSURL object is composed of two parts—a potentially nil base URL and a string that is resolved relative to the base URL. An NSURL object whose string is fully resolved without a base is considered absolute; all others are considered relative.

In other words, two NSURL objects can resolve to the same absolute URL, but have a different base URL, and be considered !isEqual:.

An example should make this all clear,

NSURL *VGableDotCom = [NSURL URLWithString:@"http://vgable.com"];
NSURL *a = [[NSURL alloc] initWithString:@"blog" relativeToURL:VGableDotCom];
NSURL *b = [[NSURL alloc] initWithString:@"http://vgable.com/blog" relativeToURL:nil];
LOG_INT([a isEqual:b]);
LOG_INT([[a absoluteURL] isEqual:[b absoluteURL]]);
LOG_ID([a absoluteURL]);
LOG_ID([b absoluteURL]);

[a isEqual:b] = 0
[[a absoluteURL] isEqual:[b absoluteURL]] = 1
[a absoluteURL] = http://vgable.com/blog
[b absoluteURL] = http://vgable.com/blog

Remember that collections use isEqual: to determine equality, so you may have to convert an NSURL to an absoluteURL to get the behavior you expect, especially with NSSet and NSDictionary.

Getting the Current URL from a WebView

Filed under: Cocoa,iPhone,MacOSX,Objective-C,Programming,Sample Code | , , ,
― Vincent Gable on April 22, 2009

UPDATED 2009-04-30: WARNING: this method will not always give the correct result. +[NSURL URLWithString:] requires it’s argument to have unicode characters %-escaped UTF8. But stringByAddingPercentEscapesUsingEncoding:NSUTF8StringEncoding will convert # to %23, so http://example.com/index.html#s1 would become http://example.com/index.html%23s1. Unfortunately, the two URLs are not equivalent. The un-%-escaped one refers to section #s1 in the file index.html, and the other tries to fetch the file index.html#s1 (“index dot html#s1”). I have not yet implemented a workaround, although I suspect one is possible, by building the NSURL out of bits of the JavaScript location object, rather then trying to convert the whole string.


UIWebView/WebView does not provide a way to find the URL of the webpage it is showing. But there’s a simple (and neat) way to get it using embedded JavaScript.

- (NSString *)stringByEvaluatingJavaScriptFromString:(NSString *)script

Is a deceptively powerful method that can execute dynamically constructed JavaScript, and lets you embed JavaScript snippets in Cocoa programs. We can use it to embed one line of JavaScript to ask a UIWebView for the URL it’s showing.

@interface UIWebView (CurrentURLInfo)
- (NSURL*) locationURL;
@end
@implementation UIWebView (CurrentURLInfo) - (NSURL*) locationURL; { NSString *rawLocationString = [self stringByEvaluatingJavaScriptFromString:@"location.href;"]; if(!rawLocationString) return nil; //URLWithString: needs percent escapes added or it will fail with, eg. a file:// URL with spaces or any URL with unicode. locationString = [locationString stringByAddingPercentEscapesUsingEncoding:NSUTF8StringEncoding]; return [NSURL URLWithString:locationString] } @end

With the CurrentURLInfo category, you can do aWebView.locationURL to get the URL of the page a WebView is showing.


One last note, the

if(!rawLocationString)
	return nil;

special-case was only necessary, because [NSURL URLWithString:nil] throws an exception (rdar://6810626). But Apple has decided that this is correct behavior.

April 21, 2009

A Scalpel Not a Swiss Army Knife

Filed under: Design,iPhone,Programming,Quotes,Usability | ,
― Vincent Gable on April 21, 2009

Steven Frank summarizing feedback on the direction future of computer interfaces,

The other common theme was a desire to see applications become less general purpose and more specific. A good example was finding out train or bus schedules. One way to do this is to start up your all-purpose web browser, and visit a transit web site that offers a downloadable PDF of the bus schedule pamphlet. Another way is to use an iPhone application that has been built-to-task to interface with a particular city’s transit system. It’s no contest which is the better experience.

…In 2009, it’s still a chore to find out from the internet what time the grocery store down the street closes — we’ve got some work to do.

I would like to see a nice pithy term replace “very specific task-driven apps”. Perhaps “Specialty Applications” or “Focused Programs”. But I’m not enamored with ether. Whatever the term, it should emphasize excelling at something, not being limited. What are your thoughts for a name?

April 19, 2009

Beware rangeOf NSString Operations

Filed under: Bug Bite,iPhone,Objective-C,Programming,Sample Code | , , , ,
― Vincent Gable on April 19, 2009

I have repeatedly had trouble with the rageOfNSString methods, because they return a struct. Going forward I will do more to avoid them, here are some ways I plan to do it.

Sending a message that returns a struct to nil can “return” undefined values. With small structs like NSRange, you are more likely to get {0} on Intel, compared to PowerPC and iPhone/ARM. Unfortunately, this makes nil-messaging bugs hard to detect. In my experience you will miss them when running on the simulator, even if they are 100% reproducible on an actual iPhone.

This category method has helped me avoid using -rangeOfString: dangerously,

@implementation NSString  (HasSubstring)
- (BOOL) hasSubstring:(NSString*)substring;
{
	if(IsEmpty(substring))
		return NO;
	NSRange substringRange = [self rangeOfString:substring];
	return substringRange.location != NSNotFound && substringRange.length > 0;
}
@end

I choose to define [aString hasSubstring:@""] as NO. You might prefer to throw an exception, or differentiate between @"" and nil. But I don’t think a nil string is enough error to throw an exception. And even though technically any string contains the empty string, I generally treat @"" as semantically equivalent to nil.

As I’ve said before,

A few simple guidelines can help you avoid my misfortune:

  • Be especially careful using of any objective-C method that returns a double, struct, or long long
  • Don’t write methods that return a double, struct, orlong long. Return an object instead of a struct; an NSNumber* or float instead of a double or long long. If you must return a dangerous data type, then see if you can avoid it. There really isn’t a good reason to return a struct, except for efficiency. And when micro-optimizations like that matter, it makes more sense to write that procedure in straight-C, which avoids the overhead of Objective-C message-passing, and solves the undefined-return-value problem.
  • But if you absolutely must return a dangerous data type, then return it in a parameter. That way you can give it a default value of your choice, and won’t have undefined values if an object is unexpectedly nil.
    Bad:
    - (struct CStructure) evaluateThings;
    Good:
    - (void) resultOfEvaluatingThings:(struct CStructure*)result;.

It’s not a bad idea to wrap up all the rangeOf methods in functions or categories that play safer with nil.

Thanks to Greg Parker for corrections!

April 10, 2009

Pre-announcing Prometheus

Filed under: Announcement,iPhone | , , ,
― Vincent Gable on April 10, 2009

For the past month I have been working on my first iPhone application, code-named Prometheus. It’s a dedicated editor for Simple English Wikipedia.

Simple Wikipedia

The Simple English Wikipedia is a wikipedia written in simplified English, using only a few common words, simple grammar, and shorter sentences. The goal is for it to be as accessible as possible. That’s something that really resonates with me. There are many reasons why I think it’s an important project, but I’ll only briefly mention one sappy example.

Children can’t understand all of the Grown Up Wikipedia. Simple Wikipedia helps put more knowledge within their reach. I started writing weekly reports in the 5th grade. I’m sure many schools make students start sooner, and certainly I had to write infrequent reports much earlier. An encyclopedia at a 4th grade reading level would have been a fantastic tool to learn more about the world.

Prometheus

In Greek mythology Prometheus was, “the Titan god of forethought and the creator of mankind. He cheated the gods on several occasions on behalf of man, including the theft of fire.”

Similarly, the Prometheus iPhone app brings knowledge from the American-English Wiki Gods the to the majority of the world that does not speak English natively.

Given the platform, I believe enabling small quick edits is the best way to go. I want to make it easy for a native English speaker to spend 30 seconds correcting grammar while waiting in line.

With more constrained language, writing-aids become even more helpful. Surprisingly, I find myself using a thesaurus more when writing Simple English, to be sure I’m using the clearest synonym I can. There’s a lot more an application like Prometheus can do to help, because it’s not targeting a complex language.

Roadmap

I’ll be brutally honest, the first release is not going to have any fancy analytical features I originally envisioned. My goal is to get a 1.0 release out quickly, then iterate on the feedback and what I learn. I also have a selfish motivation here, because I’m looking for a job, and not having something in the App Store yet has been a big barrier.

What I can promise for 1.0 is

  • A greatly streamlined interface, as compared to editing through Safari.
  • A safe, save-free, experience, where nothing you write is lost if you suddenly have to quit and do something else.
  • Shake to load a random page. (Look, I know that’s nothing to brag about. But it is something Safari can’t do.)
  • A $0 price tag.

I’m trying to have 1.0 ready in about a moth, but that’s an aggressive goal.

Beyond that I would like to add a “simple-checker” that can flag complex terms, and a mechanism for as-you-type suggestions of more common terms. But both of these are technically challenging, and my main priority will be building on what I’ve learned by putting something in front of people.

Teaser

Here’s a picture of what I have running today.

prerelease.jpg

I have a lot to say about the evolving design in future posts. But this should an you an idea of what I’m shooting for — as minimal an interface as I can manage, hopefully condensed into one toolbar.

I don’t have an icon yet, if you can help there please let me know. Unfortunately this is a free project all around, so I’m not in a position to hire someone. My current idea for an icon is a hand
holding a fennel stalk with fire inside, much like an olympic torch. I’d love to hear your ideas.

March 31, 2009

How To Write Cocoa Object Getters

Setters are more straightforward than getters, because you don’t need to worry about memory management.

The best practice is to let the compiler write getters for you, by using Declared Properties.

But when I have to implement a getter manually, I prefer the (to my knowledge) safest pattern,

- (TypeOfX*) x;
{
  return [[x retain] autorelease];
}

Note that by convention in Objective-C, a getter for the variable jabberwocky is simply called jabberwocky, not getJabberwocky.

Why retain Then autorelease

Basically return [[x retain] autorelease]; guarantees that what the getter returns will be valid for as long as any local objects in the code that called the the getter.

Consider,

NSString* oldName = [person name];
[person setName:@"Alice"];
NSLog(@"%@ has changed their name to Alice", oldName);

If -setName: immediately releasees the value that -name returned, oldName will be invalid when it’s used in NSLog. But if the implementation of [x name] used retain/autorelease, then oldName would still be valid, because it would not be destroyed until the autorelease pool around the NSLog was drained.

Also, autorelease pools are per thread; different threads have different autorelease pools that are drained at different times. retain/autorelease makes sure the object is on the calling thread’s pool.

If this cursory explanation isn’t enough, read Seth Willitis’ detailed explanation of retain/autorelease. I’m not going to explain it further here, because he’s done such a through job of it.

Ugly

return [[x retain] autorelease]; is more complicated, and harder to understand then a simple return x;. But sometimes that ugliness is necessary, and the best place to hide it is in a one-line getter method. It’s self documenting. And once you understand Cocoa memory management, it’s entirely clear what the method does. For me, the tiny readability cost is worth the safety guarantee.

Big

return [[x retain] autorelease]; increases peak memory pressure, because it can defer dealloc-ing unused objects until a few autorelease pools are drained. Honestly I’ve never measured memory usage, and found this to be a significant problem. It certainly could be, especially if the thing being returned is a large picture or chunk of data. But in my experience, it’s nothing to worry about for getters that return typical objects, unless there are measurements saying otherwise.

Slow

return [[x retain] autorelease]; is obviously slower then just return x;. But I doubt that optimizing an O(1) getter is going to make a significant difference to your application’s performance — especially compared to other things you could spend that time optimizing. So until I have data telling me otherwise, I don’t worry about adding an the extra method calls.

This is a Good Rule to Break

As I mentioned before, getters don’t need to worry about memory management. It could be argued that the return [[x retain] autorelease]; pattern is a premature optimization of theoretical safety at the expense of concrete performance.

Good programmers try to avoid premature optimization; so perhaps I’m wrong to follow this “safer” pattern. But until I have data showing otherwise, I like to do the safest thing.

How do you write getters, and why?

March 29, 2009

How To Write Cocoa Object Setters

There are several ways to write setters for Objective-C/Cocoa objects that work. But here are the practices I follow; to the best of my knowledge they produce the safest code.

Principle 0: Don’t Write a Setter

When possible, it’s best to write immutable objects. Generally they are safer, and easier to optimize, especially when it comes to concurrency.

By definition immutable objects have no setters, so always ask yourself if you really need a setter, before you write one, and whenever revisiting code.

I’ve removed many of my setters by making the thing they set an argument to the class’s -initWith: constructor. For example,

CustomWidget *widget = [[CustomWidget alloc] init];
[widget setController:self];

becomes,

CustomWidget *widget = [[CustomWidget alloc] initWithController:self];

This is less code, and now, widget is never in a partially-ready state with no controller.

It’s not always practical to do without setters. If an object looks like it needs a settable property, it probably does. But in my experience, questioning the assumption that a property needs to be changeable pays off consistently, if infrequently.

Principle 1: Use @synthesize

This should go without saying, but as long as I’m enumerating best-practices: if you are using Objective-C 2.0 (iPhone or Mac OS X 10.5 & up) you should use @synthesize-ed properties to implement your setters.

The obvious benefits are less code, and setters that are guaranteed to work by the compiler. A less obvious benefit is a clean, abstracted way to expose the state values of an object. Also, using properties can simplify you dealloc method.

But watch out for the a gotcha if you are using copy-assignment for an NSMutable object!

(Note: Some Cocoa programmers strongly dislike the dot-syntax that was introduced with properties and lets you say x.foo = 3; instead of [x setFoo:3];. But, you can use properties without using the dot-syntax. For the record, I think the dot syntax is an improvement. But don’t let a hatred of it it keep you from using properties.)

Principle 2: Prefer copy over retain

I covered this in detail here. In summary, use copy over retain whenever possible: copy is safer, and with most basic Foundation objects, copy is just as fast and efficient as retain.

The Preferred Pattern

When properties are unavailable, this is my “go-to” pattern:

- (void) setX:(TypeOfX*)newX;
{
  [memberVariableThatHoldsX autorelease];
  memberVariableThatHoldsX = [newX copy];
}

Sometimes I use use retain, or very rarely mutableCopy, instead of copy. But if autorelease won’t work, then I use a different pattern. I have a few reasons for writing setters this way.

Reason: Less Code

This pattern is only two lines of code, and has no conditionals. There is very little that can I can screw up when writing it. It always does the same thing, which simplifies debugging.

Reason: autorelease Defers Destruction

Using autorelease instead of release is just a little bit safer, because it does not immediately destroy the old value.

If the old value is immediately released in the setter then this code will sometimes crash,

NSString* oldName = [x name];
[x setName:@"Alice"];
NSLog(@"%@ has changed their name to Alice", oldName);

If -setName: immediately releasees the value that -name returned, oldName will be invalid when it’s used in NSLog.

But if If -setName: autorelease-ed the old value instead, this wouldn’t be a problem; oldName would still be valid until the current autorelease pool was drained.

Reason: Precedent

This is the pattern that google recommends.

When assigning a new object to a variable, one must first release the old object to avoid a memory leak. There are several “correct” ways to handle this. We’ve chosen the “autorelease then retain” approach because it’s less prone to error. Be aware in tight loops it can fill up the autorelease pool, and may be slightly less efficient, but we feel the tradeoffs are acceptable.

- (void)setFoo:(GMFoo *)aFoo {
  [foo_ autorelease];  // Won't dealloc if |foo_| == |aFoo|
  foo_ = [aFoo retain]; 
}

Backup Pattern (No autorelease)

When autorelease won’t work, my Plan-B is:

- (void) setX:(TypeOfX*)newX;
{
  id old = memberVariableThatHoldsX;
  memberVariableThatHoldsX = [newX copy];
  [old release];
}

Reason: Simple

Again, there are no conditionals in this pattern. There’s no if(oldX != newX) test for me to screw up. (Yes, I have done this. It wasn’t a hard bug to discover and fix, but it was a bug nonetheless.) When I’m debugging a problem, I know exactly what setX: did to it’s inputs, without having to know what they are.

On id old

I like naming my temporary old-value id old, because that name & type always works, and it’s short. It’s less to type, and less to think about than TypeOfX* oldX.

But I don’t think it’s necessarily the best choice for doing more to old than sending it release.

To be honest I’m still evaluating that naming practice. But so far I’ve been happy with it.

Principle 3: Only Optimize After You Measure

This is an old maxim of Computer Science, but it bears repeating.

The most common pattern for a setter feels like premature optimization:

- (void) setX:(TypeOfX*)newX;
{
  if(newX != memberVariableThatHoldsX){
    [memberVariableThatHoldsX release];
    memberVariableThatHoldsX = [newX copy];
  }
}

Testing if(newX != memberVariableThatHoldsX) can avoid an expensive copy.

But it also slows every call to setX:. if statements are more code, that takes time to execute. When the processor guesses wrong while loading instructions after the branch, if‘s become quite expensive.

To know what way is faster, you have to measure real-world conditions. Even if a copy is very slow, the conditional approach isn’t necessarily faster, unless there is code that sets a property to it’s current value. Which is kind of silly really. How often do you write code like,

[a setX:x1];
[a setX:x1]; //just to be sure!

or

[a setX:[a x]];

Does that look like code you want to optimize? (Trick question! You don’t know until you test.)

Hypocrisy!

I constantly break Principle 3 by declaring properties in iPhone code as nonatomic, since it’s the pattern Apple uses in their libraries. I assume Apple has good reason for it; and since I will need to write synchronization-code to safely use their libraries, I figure it’s not much more work to reuse the same code to protect access to my objects.

I can’t shake the feeling I’m wrong to do this. But it seems more wrong to not follow Apple’s example; they wrote the iPhone OS in the first place.

If you know a better best practice, say so!

There isn’t a way to write a setter that works optimally all the time, but there is a setter-pattern that works optimally more often then other patterns. With your help I can find it.

UPDATE 03-30-2009:

Wil Shiply disagrees. Essentially his argument is that setters are called a lot, so if they aren’t aggressive about freeing memory, you can have thousands of dead objects rotting in an autorelease pool. Plus, setters often do things like registering with the undo manager, and that’s expensive, so it’s a good idea to have conditional code that only does that when necessary.

My rebuttal is that you should optimize big programs by draining autorelease pools early anyway; and that mitigates the dead-object problem.

With complex setters I can see why it makes sense to check if you need to do something before doing it. I still prefer safer, unconditional, code as a simple first implementation. That’s why it’s my go-to pattern. But if most setters you write end up being more complex, it might be the wrong pattern.

Really you should read what Wil says, and decide for yourself. He’s got much more experience with Objective-C development then I do.

May 31, 2008

Messages to Nowhere

Filed under: Bug Bite,Cocoa,iPhone,MacOSX,Objective-C,Programming | , , ,
― Vincent Gable on May 31, 2008

If you send a message to (call a method of) an object that is nil (NULL) in Objective-C, nothing happens, and the result of the message is nil (aka 0, aka 0.0, aka NO aka false). At least most of the time. There is an important exception if sizeof(return_type) > sizeof(void*); then the return-value is undefined under PowerPC/iPhone, and thus all Macs for the next several years. So watch out if you are using a return value that is a struct, double, long double, or long long.

The fully story:

In Objective-C, it is valid to send a message to a nil object. The Objective-C runtime assumes that the return value of a message sent to a nil object is nil, as long as the message returns an object or any integer scalar of size less than or equal to sizeof(void*).
On Intel-based Macintosh computers, messages to a nil object always return 0.0 for methods whose return type is float, double, long double, or long long. Methods whose return value is a struct, as defined by the Mac OS X ABI Function Call Guide to be returned in registers, will return 0.0 for every field in the data structure. Other struct data types will not be filled with zeros. This is also true under Rosetta. On PowerPC Macintosh computers, the behavior is undefined.

I was recently bitten by this exceptional behavior. I was using an NSRange struct describing a substring; but the string was nil, so the substring was garbage. But only on a PPC machine! Even running under Rosetta wouldn’t have reproduced the bug on my MacBook Pro.Undefined values can be a hard bug to detect, because they may be reasonable values when tests are run.

Code running in the iPhone simulator will return all-zero stucts when messaging nil, but the same code running on an actual device will return undefined structs. Be aware that testing in the simulator isn’t enough to catch these bugs.

A few simple guidelines can help you avoid my misfortune:

  • Be especially careful using of any objective-C method that returns a double, struct, or long long
  • Don’t write methods that return a double, struct, orlong long. Return an object instead of a struct; an NSNumber* or float instead of a double or long long. If you must return a dangerous data type, then see if you can avoid it. There really isn’t a good reason to return a struct, except for efficiency. And when micro-optimizations like that matter, it makes more sense to write that procedure in straight-C, which avoids the overhead of Objective-C message-passing, and solves the undefined-return-value problem.
  • But if you absolutely must return a dangerous data type, then return it in a parameter.
    Bad:
    - (double) crazyMath;
    Good:
    - (void) crazyMathResult:(double*)result;.

I love Objective-C’s “nil messaging” behavior, even though it is rough around the edges. It’s usefulness is beyond the scope of this article, but it can simplify your code if you don’t return a data-type that is larger then sizeof(void*). With time, when the intel-style return behavior can be universally relied on, things will be even better.

« Newer Posts

Powered by WordPress