<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Vincent Gable's Blog &#187; Analysis</title>
	<atom:link href="http://vgable.com/blog/tag/analysis/feed/" rel="self" type="application/rss+xml" />
	<link>http://vgable.com/blog</link>
	<description>my weblog.</description>
	<lastBuildDate>Tue, 29 Nov 2011 22:20:23 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Just Look at it, Man!</title>
		<link>http://vgable.com/blog/2009/11/11/just-look-at-it-man/</link>
		<comments>http://vgable.com/blog/2009/11/11/just-look-at-it-man/#comments</comments>
		<pubDate>Wed, 11 Nov 2009 07:46:06 +0000</pubDate>
		<dc:creator>Vincent Gable</dc:creator>
				<category><![CDATA[Bug Bite]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Analysis]]></category>
		<category><![CDATA[Color]]></category>
		<category><![CDATA[Data]]></category>
		<category><![CDATA[Graphing]]></category>
		<category><![CDATA[Optimization]]></category>
		<category><![CDATA[SSE]]></category>
		<category><![CDATA[Statistics]]></category>
		<category><![CDATA[Visualization]]></category>
		<category><![CDATA[War Story]]></category>
		<category><![CDATA[x86]]></category>

		<guid isPermaLink="false">http://vgable.com/blog/?p=495</guid>
		<description><![CDATA[You&#8217;re looking at Anscombe&#8217;s quartet: 4 datasets with identical simple statistical properties (mean, variance, correlation, linear regression); but obvious differences when graphed. (via Best of Wikipedia) Graphs aren&#8217;t a substitute for numerical analysis. Graphs are not a panacea. But they&#8217;re excellent for discovering patterns, outliers, and getting intuition about a dataset. If you never graph [...]]]></description>
			<content:encoded><![CDATA[<p>You&#8217;re looking at <a href="http://en.wikipedia.org/wiki/Anscombe%27s_quartet">Anscombe&#8217;s quartet</a>: 4 datasets with identical simple statistical properties (mean, variance, correlation, linear regression); but obvious differences when graphed.<br />
<a href="http://en.wikipedia.org/wiki/File:Anscombe.svg" class="no-border">
<div style="text-align:center;"><img src="http://vgable.com/blog/wp-content/uploads/2009/11/325px-Anscombe.svg.png" alt="325px-Anscombe.svg.png" border="0" width="325" height="222" /></div>
<p> </a><br />
(via <a href="http://bestofwikipedia.tumblr.com/">Best of Wikipedia</a>)</p>
<p>Graphs aren&#8217;t a substitute for numerical analysis. Graphs are not a panacea. But they&#8217;re excellent for discovering patterns, outliers, and getting <em>intuition</em> about a dataset. If you never graph your data, then you&#8217;ve never really <em>looked at it</em>.</p>
<h3>War Story</h3>
<p>I was working on optimizing color correction, using SSE (high performance x86 instructions). One operation required division &#8212; an expensive operation for a computer. The hardware had a <code>divide</code> instruction, but sometimes using the <a href="http://www.sosmath.com/calculus/diff/der07/der07.html">Newton-Raphson method</a> to <a href="http://en.wikipedia.org/wiki/Division_(digital)#Newton.E2.80.93Raphson_division">do the division in software</a> is faster. You never know until you measure.</p>
<p>While doing the measurement, I somehow got the crazy idea to try both: I&#8217;d already unrolled the inner loop so instead of repeating the <code>divide</code> or Newton&#8217;s Method twice, I&#8217;d do a <code>divide</code> and then use Newton&#8217;s Method for the next value. Strangely enough, this was faster on the hardware I was benchmarking than either method individually. Modern hardware is a complex and scary beast.</p>
<p>I was fortunate enough to have a suite of very good unit tests to run against my optimized code. But there was a caveat to testing correctness. Because <a href="http://docs.sun.com/source/806-3568/ncg_goldberg.html">computers don&#8217;t have infinitely precise arithmetic</a>, two correct algorithms might give different answers &#8212; but if the numbers they gave were <em>close enough</em> to the infinitely precise answer (say a couple <a href="http://en.wikipedia.org/wiki/Unit_in_the_last_place">ulps</a> apart)  it was good enough. (<a href="http://vgable.com/blog/2009/11/04/tolerance/">We can only be exact within some Tolerance</a>!) The tests cleared my hybrid <code>divide</code>/Newton-Raphson function: <em>but we couldn&#8217;t use it, because it was fundamentally broken</em>.</p>
<p>Even though the error was acceptably small, it had a <em>nasty</em> distribution. Using <code>divide</code> gave color values that were a bit too light. Doing a divide in software gave values that were a bit too dark. Individually these errors were fine. Randomly <a href="http://en.wikipedia.org/wiki/Dither">spread over the image</a> they would have been fine. But processing every other pixel differently had the effect of adding alternating light/dark stripes! <a href="http://www.uiandus.com/2009/06/25/cognitive-science/perception-vs-reality-perception-wins/">We see contrast, not absolute color</a>, so the numerically insignificant error was quite visible. Worse still, bands of 1 pixel stripes combined to form a shimmering <a href="http://en.wikipedia.org/wiki/Moiré_pattern">Moiré pattern</a>. It was totally busted. Unusable.</p>
<p>This was all immediately obvious when the results of the color correction were &#8220;graphed&#8221;. Actually <em>looking</em> at the answer caught a subtle error that our suite of unit tests missed.</p>
<p>To be clear, more subjective graphical analysis is <em>not</em> a substitute for numerical analysis and data mining. But I believe in actually <em>looking at your data</em> at least once. A graph is a kind of <a href="http://vgable.com/blog/2009/03/03/vincents-notes-end-to-end-arguments-in-system-design/">end-to-end</a> visualization of everything, and that has value. Graphs are a cheap sanity check &#8212; does everything look right? And sometimes, they can give you real insight into a problem.</p>
]]></content:encoded>
			<wfw:commentRss>http://vgable.com/blog/2009/11/11/just-look-at-it-man/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Spurious</title>
		<link>http://vgable.com/blog/2009/11/09/spurious/</link>
		<comments>http://vgable.com/blog/2009/11/09/spurious/#comments</comments>
		<pubDate>Mon, 09 Nov 2009 22:09:29 +0000</pubDate>
		<dc:creator>Vincent Gable</dc:creator>
				<category><![CDATA[Announcement]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Quotes]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[Security]]></category>
		<category><![CDATA[Analysis]]></category>
		<category><![CDATA[Batman]]></category>
		<category><![CDATA[Correlation]]></category>
		<category><![CDATA[Data]]></category>
		<category><![CDATA[Drowning]]></category>
		<category><![CDATA[Fallacy]]></category>
		<category><![CDATA[Ice Cream]]></category>
		<category><![CDATA[Lisa Wade]]></category>
		<category><![CDATA[Statistics]]></category>
		<category><![CDATA[Summer]]></category>

		<guid isPermaLink="false">http://vgable.com/blog/?p=490</guid>
		<description><![CDATA[What’s a spurious relationship? Here’s one: People who eat ice cream are more likely to drown. Both incidence of ice cream eating and rates of drowning are related to summertime. The relationship between ice cream and drowning is spurious. That is, there is no relationship. Yet they appear related because they are both related to [...]]]></description>
			<content:encoded><![CDATA[<blockquote><h3>What’s a spurious relationship?</h3>
<p>Here’s one: <strong>People who eat ice cream are more likely to drown</strong>.  Both incidence of ice cream eating and rates of drowning are related to summertime.  The relationship between ice cream and drowning is spurious.  That is, there is no relationship.  Yet they appear related because they are both related to a third variable.
</p></blockquote>
<p>&#8211;<a href="http://contexts.org/socimages/2009/06/06/the-contact-hypothesis-and-spurious-relationships/">Lisa Wade</a></p>
<div style="text-align:center;"><img src="http://vgable.com/blog/wp-content/uploads/2009/11/untitled5sk.jpg" alt="untitled5sk.jpg" border="0" width="400" height="595" /></div>
<p>(Image <a href="http://superdickery.com/index.php?view=article&#038;catid=30%3Aframes-and-panels-index&#038;id=788%3Abatman-hates-ice-cream&#038;option=com_content&#038;Itemid=24">via</a> the amazing <a href="http://superdickery.com/">Superdickery</a>)</p>
]]></content:encoded>
			<wfw:commentRss>http://vgable.com/blog/2009/11/09/spurious/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

