{"id":495,"date":"2009-11-11T02:46:06","date_gmt":"2009-11-11T07:46:06","guid":{"rendered":"http:\/\/vgable.com\/blog\/?p=495"},"modified":"2009-11-12T09:47:15","modified_gmt":"2009-11-12T14:47:15","slug":"just-look-at-it-man","status":"publish","type":"post","link":"https:\/\/vgable.com\/blog\/2009\/11\/11\/just-look-at-it-man\/","title":{"rendered":"Just Look at it, Man!"},"content":{"rendered":"<p>You&#8217;re looking at <a href=\"http:\/\/en.wikipedia.org\/wiki\/Anscombe%27s_quartet\">Anscombe&#8217;s quartet<\/a>: 4 datasets with identical simple statistical properties (mean, variance, correlation, linear regression); but obvious differences when graphed.<br \/>\n<a href=\"http:\/\/en.wikipedia.org\/wiki\/File:Anscombe.svg\" class=\"no-border\"><\/p>\n<div style=\"text-align:center;\"><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/vgable.com\/blog\/wp-content\/uploads\/2009\/11\/325px-Anscombe.svg.png\" alt=\"325px-Anscombe.svg.png\" border=\"0\" width=\"325\" height=\"222\" \/><\/div>\n<p> <\/a><br \/>\n(via <a href=\"http:\/\/bestofwikipedia.tumblr.com\/\">Best of Wikipedia<\/a>)<\/p>\n<p>Graphs aren&#8217;t a substitute for numerical analysis. Graphs are not a panacea. But they&#8217;re excellent for discovering patterns, outliers, and getting <em>intuition<\/em> about a dataset. If you never graph your data, then you&#8217;ve never really <em>looked at it<\/em>.<\/p>\n<h3>War Story<\/h3>\n<p>I was working on optimizing color correction, using SSE (high performance x86 instructions). One operation required division &#8212; an expensive operation for a computer. The hardware had a <code>divide<\/code> instruction, but sometimes using the <a href=\"http:\/\/www.sosmath.com\/calculus\/diff\/der07\/der07.html\">Newton-Raphson method<\/a> to <a href=\"http:\/\/en.wikipedia.org\/wiki\/Division_(digital)#Newton.E2.80.93Raphson_division\">do the division in software<\/a> is faster. You never know until you measure.<\/p>\n<p>While doing the measurement, I somehow got the crazy idea to try both: I&#8217;d already unrolled the inner loop so instead of repeating the <code>divide<\/code> or Newton&#8217;s Method twice, I&#8217;d do a <code>divide<\/code> and then use Newton&#8217;s Method for the next value. Strangely enough, this was faster on the hardware I was benchmarking than either method individually. Modern hardware is a complex and scary beast.<\/p>\n<p>I was fortunate enough to have a suite of very good unit tests to run against my optimized code. But there was a caveat to testing correctness. Because <a href=\"http:\/\/docs.sun.com\/source\/806-3568\/ncg_goldberg.html\">computers don&#8217;t have infinitely precise arithmetic<\/a>, two correct algorithms might give different answers &#8212; but if the numbers they gave were <em>close enough<\/em> to the infinitely precise answer (say a couple <a href=\"http:\/\/en.wikipedia.org\/wiki\/Unit_in_the_last_place\">ulps<\/a> apart)  it was good enough. (<a href=\"http:\/\/vgable.com\/blog\/2009\/11\/04\/tolerance\/\">We can only be exact within some Tolerance<\/a>!) The tests cleared my hybrid <code>divide<\/code>\/Newton-Raphson function: <em>but we couldn&#8217;t use it, because it was fundamentally broken<\/em>.<\/p>\n<p>Even though the error was acceptably small, it had a <em>nasty<\/em> distribution. Using <code>divide<\/code> gave color values that were a bit too light. Doing a divide in software gave values that were a bit too dark. Individually these errors were fine. Randomly <a href=\"http:\/\/en.wikipedia.org\/wiki\/Dither\">spread over the image<\/a> they would have been fine. But processing every other pixel differently had the effect of adding alternating light\/dark stripes! <a href=\"http:\/\/www.uiandus.com\/2009\/06\/25\/cognitive-science\/perception-vs-reality-perception-wins\/\">We see contrast, not absolute color<\/a>, so the numerically insignificant error was quite visible. Worse still, bands of 1 pixel stripes combined to form a shimmering <a href=\"http:\/\/en.wikipedia.org\/wiki\/Moir\u00e9_pattern\">Moir\u00e9 pattern<\/a>. It was totally busted. Unusable.<\/p>\n<p>This was all immediately obvious when the results of the color correction were &#8220;graphed&#8221;. Actually <em>looking<\/em> at the answer caught a subtle error that our suite of unit tests missed.<\/p>\n<p>To be clear, more subjective graphical analysis is <em>not<\/em> a substitute for numerical analysis and data mining. But I believe in actually <em>looking at your data<\/em> at least once. A graph is a kind of <a href=\"http:\/\/vgable.com\/blog\/2009\/03\/03\/vincents-notes-end-to-end-arguments-in-system-design\/\">end-to-end<\/a> visualization of everything, and that has value. Graphs are a cheap sanity check &#8212; does everything look right? And sometimes, they can give you real insight into a problem.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>You&#8217;re looking at Anscombe&#8217;s quartet: 4 datasets with identical simple statistical properties (mean, variance, correlation, linear regression); but obvious differences when graphed. (via Best of Wikipedia) Graphs aren&#8217;t a substitute for numerical analysis. Graphs are not a panacea. But they&#8217;re excellent for discovering patterns, outliers, and getting intuition about a dataset. If you never graph [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[18,4],"tags":[530,304,529,377,158,535,416,378,536,23],"class_list":["post-495","post","type-post","status-publish","format-standard","hentry","category-bug-bite","category-programming","tag-analysis","tag-color","tag-data","tag-graphing","tag-optimization","tag-sse","tag-statistics","tag-visualization","tag-war-story","tag-x86"],"_links":{"self":[{"href":"https:\/\/vgable.com\/blog\/wp-json\/wp\/v2\/posts\/495","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/vgable.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/vgable.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/vgable.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/vgable.com\/blog\/wp-json\/wp\/v2\/comments?post=495"}],"version-history":[{"count":5,"href":"https:\/\/vgable.com\/blog\/wp-json\/wp\/v2\/posts\/495\/revisions"}],"predecessor-version":[{"id":500,"href":"https:\/\/vgable.com\/blog\/wp-json\/wp\/v2\/posts\/495\/revisions\/500"}],"wp:attachment":[{"href":"https:\/\/vgable.com\/blog\/wp-json\/wp\/v2\/media?parent=495"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/vgable.com\/blog\/wp-json\/wp\/v2\/categories?post=495"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/vgable.com\/blog\/wp-json\/wp\/v2\/tags?post=495"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}