# Data manipulation

This isn’t a political post, it’s a post about data, and ethics. When you read a graph, you start with the assumption that whoever made the graph followed a set of basic guidelines. Breaking these guidelines can happen by accident, but with computerized tools, this is rare. With the availability of graph software, making an X-Y graph misrepresent data requires some effort. Sometimes you can even prove it was intentional.

The case study here is a television graphic presented last week. Here is the original on-air version:

Original air date Dec 12, 2011 (Fox News)

There are two big problems with this graph. I’m not going to address graph design, chartjunk, or any other aspects beyond the fundamentals. Once this graph is technically correct, we could argue those points. Here we’ll focus on graphs 101… the basics.

The scale is clearly wrong on the left of the chart. 9% roughly lines up with the data but the peak at 9.2 almost lines up with 9.5 on the scale. The valley (8.8) is close to 8.5 on the scale. Despite the error, this doesn’t really matter because most line graphs are used to indicate a trend. As long as the trend is correct, we can forgive an error on the scale.

The trend is not only wrong, but it has been manipulated. This can be shown by comparing the data presented above to the actual data. First, in order to even compare these charts, we have to fix the scale issue. It turns out that the scale was wrong by exactly 1/2 so it seems reasonable to assume that an error was made in the preparation where either the data or the scale were reduced in size in order to make things fit. Perhaps it was more pleasing to see 8 and 10 as round numbers on the screen. The data doesn’t vary by much so that scale would have made it seem too flat (a legitimate trick for skewing data perception). This kind of mistake happens… it shouldn’t, but it does. With the scale fixed, the data sets overlap very well except for two key areas:

Unemployment data (blue) and TV news graphic (red).

You can spot the difference without my help. Just to be open, honest, and clear I have added error bars that represent the confidence interval for gathering data from the screenshot presented above. Thanks to GraphClick software, this is an easy thing to do. The blue line is the original data from the source, and the red is the (scale corrected) TV version.

Consider the process of making a graph: