Show me the data!

While this blog is principally about Image Analysis (turning images into numbers), Data Analysis (turning numbers into something meaningful) is also really important.

In this post I’m going to explain how to display your data in a beeswarm plot and why you might want to do this. Simple statistics are great but show me the data!

Trying a Sample (Mean)

It shouldn’t need saying but for completeness, let’s start with the basics:

I think I’m right in saying that there exists a ground truth for everything (although Werner Heisenberg would disagree, but we’re going to ignore quantum mechanics for the time being). For example, there is a value for the mean length of every dolphin on this planet at any given time. We can never really know that number, all we can do (should we want to) is attempt to estimate it. This is where sampling comes in. If we measure a sub-population of dolphins we can make a few assumptions and extrapolate to every dolphin on the planet.

The problem is that mean alone is not that helpful in describing a population. Talk amongst yourselves as I find some reference dolphins to sample.

2016-09-SMTD-001

Above you can see that in two sampling trips, I found that the mean dolphin length both times was 2.7m. If you look at the individual numbers you can see that the range of values is quite different:2016-09-SMTD-002

In this case, we can describe the difference using the standard deviation or the variance (both of which come out to about 0.2 versus 1 for samples one and two respectively). But what if those values are also similar?

Enter the Anscombe Quartet

Simple statistics are great for describing lots of datasets but there will always be cases where they are insufficient. One of the best examples is the Anscombe Quartet:

2016-09-smtd-003

Each of the four datasets are identical (to at least two decimal places) when compared using simple statistics (mean of X, mean of Y, sample variance, correlation and linear regression). The differences are only really apparent when you graph them (as above).

Show me the data

I’ve heard over and over again that you should plot the individual data points if you have fewer than 6 but personally, I would set that number much (much!) higher. It’s a bit clumsy to try and plot single data points in Excel or even MATLAB. In these cases we can resort to a little bit of know-how and create our own beeswarm plot:

2016-09-07-smtd_swarm

The thing that makes this non-obvious is getting the points aligned on the x-axis. We can deal with that really easily in Excel using the RAND() function:

2016-09-smtd-004

For Sample 1, every cell in the first column has the same formula. RAND() returns a number between 0 and 1 and we add 1 to that to give us numbers between 1 and 2. For Sample 2 we have done the same thing but added 3 to RAND() to give numbers between 3 and 4. This leaves the whole of the ‘2’ range empty so that there’s some separation between the points. If you need to plot more datasets just make an X column adding 5, 7, 9 &c. There’s your simple beeswarm!

Admittedly, with this few points, it really doesn’t come into it’s own but when you have several hundred you can really start to appreciate the distribution of your data.

That’s not normal

One final tip; we’ve used RAND() which provides a uniform random number (do you remember this from before?). If you want a more traditional beeswarm, use a normally distributed random number which biases values towards the mean value:

2016-09-smtd-007

Excel doesn’t have a built-in function for this so (as before) we can use the Box-Muller method to approximate a normal distribution. In this case my equation for x-values was:

=SQRT(-2*LOG(RAND()))*COS(2*3.14*RAND())+3
Advertisements

3 thoughts on “Show me the data!

    1. Dave Mason Post author

      Whilst I agree that Prism is really easy to use, even for non linear regression and some stats, I think most people collate data in a spreadsheet first. My aim was to show how these sort of data can be displayed really quickly in a useful way.

      Although I’ve used it before, we don’t currently have an institutional license for Prism at Liverpool, so it tends not to be on my radar these days.

      Like

      Reply
      1. galicolagfb

        I understand, and appreciate, what you tried to do. Just wanted to mention that there are easier ways 🙂 We also don’t have an institutional license. I bought a student’s licence for myself. makes my life much easier at a not too high cost (for which I’m reimbursed anyway…).

        Like

Comment!

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.