Vincent Gable’s Blog

April 29, 2010

What Am I About To Call?

Filed under: Cocoa,iPhone,MacOSX,Objective-C,Programming,Reverse Engineering,Tips | , ,
― Vincent Gable on April 29, 2010

Say you’re in gdb, and about to execute a call instruction for dyld_stub_objc_msgSend, how do you know what’s about to happen?

On i386

(gdb) x/s *(SEL*)($esp+4)

tells you the message that’s about to be sent.

(gdb) po *(id*)$esp

tells you the target object that’s about to get the message.

November 11, 2009

Just Look at it, Man!

Filed under: Bug Bite,Programming | , , , , , , , , ,
― Vincent Gable on November 11, 2009

You’re looking at Anscombe’s quartet: 4 datasets with identical simple statistical properties (mean, variance, correlation, linear regression); but obvious differences when graphed.

325px-Anscombe.svg.png


(via Best of Wikipedia)

Graphs aren’t a substitute for numerical analysis. Graphs are not a panacea. But they’re excellent for discovering patterns, outliers, and getting intuition about a dataset. If you never graph your data, then you’ve never really looked at it.

War Story

I was working on optimizing color correction, using SSE (high performance x86 instructions). One operation required division — an expensive operation for a computer. The hardware had a divide instruction, but sometimes using the Newton-Raphson method to do the division in software is faster. You never know until you measure.

While doing the measurement, I somehow got the crazy idea to try both: I’d already unrolled the inner loop so instead of repeating the divide or Newton’s Method twice, I’d do a divide and then use Newton’s Method for the next value. Strangely enough, this was faster on the hardware I was benchmarking than either method individually. Modern hardware is a complex and scary beast.

I was fortunate enough to have a suite of very good unit tests to run against my optimized code. But there was a caveat to testing correctness. Because computers don’t have infinitely precise arithmetic, two correct algorithms might give different answers — but if the numbers they gave were close enough to the infinitely precise answer (say a couple ulps apart) it was good enough. (We can only be exact within some Tolerance!) The tests cleared my hybrid divide/Newton-Raphson function: but we couldn’t use it, because it was fundamentally broken.

Even though the error was acceptably small, it had a nasty distribution. Using divide gave color values that were a bit too light. Doing a divide in software gave values that were a bit too dark. Individually these errors were fine. Randomly spread over the image they would have been fine. But processing every other pixel differently had the effect of adding alternating light/dark stripes! We see contrast, not absolute color, so the numerically insignificant error was quite visible. Worse still, bands of 1 pixel stripes combined to form a shimmering Moiré pattern. It was totally busted. Unusable.

This was all immediately obvious when the results of the color correction were “graphed”. Actually looking at the answer caught a subtle error that our suite of unit tests missed.

To be clear, more subjective graphical analysis is not a substitute for numerical analysis and data mining. But I believe in actually looking at your data at least once. A graph is a kind of end-to-end visualization of everything, and that has value. Graphs are a cheap sanity check — does everything look right? And sometimes, they can give you real insight into a problem.

April 19, 2009

Beware rangeOf NSString Operations

Filed under: Bug Bite,iPhone,Objective-C,Programming,Sample Code | , , , ,
― Vincent Gable on April 19, 2009

I have repeatedly had trouble with the rageOfNSString methods, because they return a struct. Going forward I will do more to avoid them, here are some ways I plan to do it.

Sending a message that returns a struct to nil can “return” undefined values. With small structs like NSRange, you are more likely to get {0} on Intel, compared to PowerPC and iPhone/ARM. Unfortunately, this makes nil-messaging bugs hard to detect. In my experience you will miss them when running on the simulator, even if they are 100% reproducible on an actual iPhone.

This category method has helped me avoid using -rangeOfString: dangerously,

@implementation NSString  (HasSubstring)
- (BOOL) hasSubstring:(NSString*)substring;
{
	if(IsEmpty(substring))
		return NO;
	NSRange substringRange = [self rangeOfString:substring];
	return substringRange.location != NSNotFound && substringRange.length > 0;
}
@end

I choose to define [aString hasSubstring:@""] as NO. You might prefer to throw an exception, or differentiate between @"" and nil. But I don’t think a nil string is enough error to throw an exception. And even though technically any string contains the empty string, I generally treat @"" as semantically equivalent to nil.

As I’ve said before,

A few simple guidelines can help you avoid my misfortune:

  • Be especially careful using of any objective-C method that returns a double, struct, or long long
  • Don’t write methods that return a double, struct, orlong long. Return an object instead of a struct; an NSNumber* or float instead of a double or long long. If you must return a dangerous data type, then see if you can avoid it. There really isn’t a good reason to return a struct, except for efficiency. And when micro-optimizations like that matter, it makes more sense to write that procedure in straight-C, which avoids the overhead of Objective-C message-passing, and solves the undefined-return-value problem.
  • But if you absolutely must return a dangerous data type, then return it in a parameter. That way you can give it a default value of your choice, and won’t have undefined values if an object is unexpectedly nil.
    Bad:
    - (struct CStructure) evaluateThings;
    Good:
    - (void) resultOfEvaluatingThings:(struct CStructure*)result;.

It’s not a bad idea to wrap up all the rangeOf methods in functions or categories that play safer with nil.

Thanks to Greg Parker for corrections!

April 23, 2008

Printing a FourCharCode

Filed under: Bug Bite,C++,Cocoa,MacOSX,Programming,Sample Code,Tips | , , , ,
― Vincent Gable on April 23, 2008

A lot of Macintosh APIs take or return a 32-bit value that is supposed to be interpreted as a four character long string. (Anything using one of the types FourCharCode, OSStatus, OSType, ResType, ScriptCode, UInt32, CodecType probably does this.) The idea is that the value 1952999795 tells you less, and is harder to remember, then 'this'.

It’s a good idea, unfortunately there isn’t a simple way to print integers as character strings. So you have to write your own code to print them out correctly, and since it has to print correctly on both big endian and little endian machines this is tricker then it should be. I’ve done it incorrectly before, but more importantly so has Apple’s own sample code!

Here are the best solutions I’ve found. Please let me know if you have a better way, or if you find any bugs. (The code is public-domain, so use it any way you please.)

FourCharCode to NSString:
use NSString *NSFileTypeForHFSTypeCode(OSType code);
Results are developer-readable and look like: @"'this'", but Apple says “The format of the string is a private implementation detail”, so don’t write code that depends on the format.

FourCharCode to a C-string (char*):
use the C-macro

#define FourCC2Str(code) (char[5]){(code >> 24) & 0xFF, (code >> 16) & 0xFF, (code >> 8) & 0xFF, code & 0xFF, 0}
(requires -std=c99; credit to Joachim Bengtsson). Note that the resulting C-string is statically allocated on the stack, not dynamically allocated on the heap; do not return it from a function.
If you want a value on the heap, try:


void fourByteCodeString (UInt32 code, char* str){
  sprintf(str, "'%c%c%c%c'",
    (code >> 24) & 0xFF, (code >> 16) & 0xFF,
    (code >> 8) & 0xFF, code & 0xFF);
}

In GDB, you can print a number as characters like so:

(gdb) print/T 1936746868
$4 = 'spit'

(via Borkware’s GDB tips)

If you do end up rolling your own FourCharCode printer (and I don’t recommend it!), then be sure to use arithmetic/shifting to extract character values from the integer. You can not rely on the layout of the integer in memory, because it will change on different machines. Even if you are certain that you’re code will only run on an x86 machine, it’s still a bad idea. It limits the robustness and reusability of your code. It sets you up for an unexpected bug if you ever do port your code. And things change unexpectedly in this business.

Printing a FourCharCode is harder then it should be. Experienced developers who should know better have been bitten by it (and so have I). The best solution is probably a %format for printf/NSLog that prints integers as character strings. Unfortunately, it doesn’t look like we’ll be seeing that anytime soon.

Powered by WordPress