<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Vincent Gable's Blog &#187; War Story</title>
	<atom:link href="http://vgable.com/blog/tag/war-story/feed/" rel="self" type="application/rss+xml" />
	<link>http://vgable.com/blog</link>
	<description>my weblog.</description>
	<lastBuildDate>Tue, 29 Nov 2011 22:20:23 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Just Look at it, Man!</title>
		<link>http://vgable.com/blog/2009/11/11/just-look-at-it-man/</link>
		<comments>http://vgable.com/blog/2009/11/11/just-look-at-it-man/#comments</comments>
		<pubDate>Wed, 11 Nov 2009 07:46:06 +0000</pubDate>
		<dc:creator>Vincent Gable</dc:creator>
				<category><![CDATA[Bug Bite]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Analysis]]></category>
		<category><![CDATA[Color]]></category>
		<category><![CDATA[Data]]></category>
		<category><![CDATA[Graphing]]></category>
		<category><![CDATA[Optimization]]></category>
		<category><![CDATA[SSE]]></category>
		<category><![CDATA[Statistics]]></category>
		<category><![CDATA[Visualization]]></category>
		<category><![CDATA[War Story]]></category>
		<category><![CDATA[x86]]></category>

		<guid isPermaLink="false">http://vgable.com/blog/?p=495</guid>
		<description><![CDATA[You&#8217;re looking at Anscombe&#8217;s quartet: 4 datasets with identical simple statistical properties (mean, variance, correlation, linear regression); but obvious differences when graphed. (via Best of Wikipedia) Graphs aren&#8217;t a substitute for numerical analysis. Graphs are not a panacea. But they&#8217;re excellent for discovering patterns, outliers, and getting intuition about a dataset. If you never graph [...]]]></description>
			<content:encoded><![CDATA[<p>You&#8217;re looking at <a href="http://en.wikipedia.org/wiki/Anscombe%27s_quartet">Anscombe&#8217;s quartet</a>: 4 datasets with identical simple statistical properties (mean, variance, correlation, linear regression); but obvious differences when graphed.<br />
<a href="http://en.wikipedia.org/wiki/File:Anscombe.svg" class="no-border">
<div style="text-align:center;"><img src="http://vgable.com/blog/wp-content/uploads/2009/11/325px-Anscombe.svg.png" alt="325px-Anscombe.svg.png" border="0" width="325" height="222" /></div>
<p> </a><br />
(via <a href="http://bestofwikipedia.tumblr.com/">Best of Wikipedia</a>)</p>
<p>Graphs aren&#8217;t a substitute for numerical analysis. Graphs are not a panacea. But they&#8217;re excellent for discovering patterns, outliers, and getting <em>intuition</em> about a dataset. If you never graph your data, then you&#8217;ve never really <em>looked at it</em>.</p>
<h3>War Story</h3>
<p>I was working on optimizing color correction, using SSE (high performance x86 instructions). One operation required division &#8212; an expensive operation for a computer. The hardware had a <code>divide</code> instruction, but sometimes using the <a href="http://www.sosmath.com/calculus/diff/der07/der07.html">Newton-Raphson method</a> to <a href="http://en.wikipedia.org/wiki/Division_(digital)#Newton.E2.80.93Raphson_division">do the division in software</a> is faster. You never know until you measure.</p>
<p>While doing the measurement, I somehow got the crazy idea to try both: I&#8217;d already unrolled the inner loop so instead of repeating the <code>divide</code> or Newton&#8217;s Method twice, I&#8217;d do a <code>divide</code> and then use Newton&#8217;s Method for the next value. Strangely enough, this was faster on the hardware I was benchmarking than either method individually. Modern hardware is a complex and scary beast.</p>
<p>I was fortunate enough to have a suite of very good unit tests to run against my optimized code. But there was a caveat to testing correctness. Because <a href="http://docs.sun.com/source/806-3568/ncg_goldberg.html">computers don&#8217;t have infinitely precise arithmetic</a>, two correct algorithms might give different answers &#8212; but if the numbers they gave were <em>close enough</em> to the infinitely precise answer (say a couple <a href="http://en.wikipedia.org/wiki/Unit_in_the_last_place">ulps</a> apart)  it was good enough. (<a href="http://vgable.com/blog/2009/11/04/tolerance/">We can only be exact within some Tolerance</a>!) The tests cleared my hybrid <code>divide</code>/Newton-Raphson function: <em>but we couldn&#8217;t use it, because it was fundamentally broken</em>.</p>
<p>Even though the error was acceptably small, it had a <em>nasty</em> distribution. Using <code>divide</code> gave color values that were a bit too light. Doing a divide in software gave values that were a bit too dark. Individually these errors were fine. Randomly <a href="http://en.wikipedia.org/wiki/Dither">spread over the image</a> they would have been fine. But processing every other pixel differently had the effect of adding alternating light/dark stripes! <a href="http://www.uiandus.com/2009/06/25/cognitive-science/perception-vs-reality-perception-wins/">We see contrast, not absolute color</a>, so the numerically insignificant error was quite visible. Worse still, bands of 1 pixel stripes combined to form a shimmering <a href="http://en.wikipedia.org/wiki/Moiré_pattern">Moiré pattern</a>. It was totally busted. Unusable.</p>
<p>This was all immediately obvious when the results of the color correction were &#8220;graphed&#8221;. Actually <em>looking</em> at the answer caught a subtle error that our suite of unit tests missed.</p>
<p>To be clear, more subjective graphical analysis is <em>not</em> a substitute for numerical analysis and data mining. But I believe in actually <em>looking at your data</em> at least once. A graph is a kind of <a href="http://vgable.com/blog/2009/03/03/vincents-notes-end-to-end-arguments-in-system-design/">end-to-end</a> visualization of everything, and that has value. Graphs are a cheap sanity check &#8212; does everything look right? And sometimes, they can give you real insight into a problem.</p>
]]></content:encoded>
			<wfw:commentRss>http://vgable.com/blog/2009/11/11/just-look-at-it-man/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

