Much of the posts on this blog deal with converting Images to numbers, which is what I would consider Image Analysis. This is ususally followed by some method of data analysis (converting numbers into results or other meaningful output.
This post will deal with the latter, and chiefly using Pivot tables to easily summarise the sorts of data that will frequently come out of Imaging experiments.
First things first, we’ll need some data. For this example, I’ll be using some shape parameters from Fiji’s Analyse Particles function. If you want to play along you can download the data here, or recreate it using the original data and this script.
What we have ended up with (and a key to success) are tidy data. I won’t go too much into the details of tidy data but the basic concept is that each column represents a variable and no two columns represent the same variable. Each row thus represents a separate measurement.
As usual, things are easier with an example. The following table would be considered messy data as the variable (cups of coffee) is represented in column 2 and column 3.
The tidy way to represent these data are to put all of the measurements of a single variable (cups of coffee) into a single column like so (and in doing so create another variable “Day Of Week”).
The other important thing to note is that each column has a header. This will become important later when we need to reference columns.
For the examples in this post, you’ll also need a spreadsheet application. While Excel is the de facto standard these days, it’s worth pointing out that Libre Office Calc has all of the functionality described, so if you prefer FOSS, go for it, I’ll be using Excel for the rest of the post.
Let’s Pivot: a simple example
Open up the example data or some equivalent tidy data set. Put your cursor somewhere in your data then on the ‘Insert’ tab, hit Pivot Table.
You will be presented with a dialog whos purpose is to define the data source and the location of the pivot table. The Data source is really important so check that it encompasses all of your columns and rows (see image below, although we’ll come back to this at the end of the post). Hit OK.
If you’ve used the default options, you’ll be provided with a new worksheet. I think this is where most people fear-quit but once you get the hang, it’s actually pretty simple.
Let’s take this one step at a time:
What’s most important?
First thing is to figure out what output you want to report. For this example we’ll use Circularity which is identified by the Column header
Circ. Drag this from the Field list at the top, to the VALUES box.
The value that appears in the pivot table is the sum of all the circularity measurements in our original table. Usually we want to report the summary statistcs so let’s go ahead and change that. Click on the entry in the VALUES box and select “Value Field Settings”.
You’ll get the following dialog where you can select the calcution used to summarise. Select Average. Hit OK.
Back in our spreadsheet we now have the average circularity for all of our measurements. We could have done that in the original sheet so let’s try something more useful. In this case including another parameter.
Breaking it down
In this dataset, the Slice column represents different images (from 1 to 12) in the original dataset, each with multiple measured objects. To calculate the mean circularity of the objects in each image, drag the
Slicecolumn into the ROWS box. Now instead of a single value, you’re provided with the average for each slice.
Before moving on, let’s just add some more info to our summary stats. Once again drag the
Circ. entry from the FIELDS to VALUES (so you have it twice). Click on the second entry and now change the calculation parameter to StDev. Repeat but this time set the calculation to Count. You should have something that looks like this:
Super! That’s pretty much what most people want when doing data analysis. Pat yourself on the back if you got this far.
The more values you have in your original datasource, the more useful the pivot tables will be. Let’s say for a project you’ve done 4 experiments (Exp001, Exp002, Exp003, Exp004) and each time you’ve treated with either Drug A or Drug B. You take a bunch of pictures and have measured the Circularity of objects in each picture (there’s a second set of Data here if you want to play).
Instead of intersecting on image number as we did above, what about grouping by Drug? You can create the following Pivot table in almost the same way as before (note that now Drug is in the ROWS box instead of Slice):
This encompases all four of our experiments. There are two cool things you can do with the experiments. Firstly, drag the Experiment field into the ROWS box. If you put it below Drug, you’ll get each Drug subclassed into the four experiments:
Note that if you drag the Experiment above Drug in the ROWS area, the layout will change to have each experiment broken down into Drug A and B.Both useful for different things.
The one box we’ve not really used yet is the FILTER box. This last trick will allow you to dynamically look at the data for a particular case or condition. Drag the Experiment field from ROWS to the box above, FILTERS.
This will create a new box at the top of your pivot table and return the main table to just summarising on the Drugs.
If you pull down the filter box (that says “All”), you can select a particular experiment to highlight. You also have the option to exclude certain items (which makes more sense for categorical data like this than for numerical data).
And that’s about it for a basic intro to Pivot tables. It’s surprising how much you can do with even these basic functions and how useful they can be!
Tips and Gotchas
Once you get the hang of them, Pivot tables are super-helpful for summarising big sets of data. Just remember to include a variable for anything you may want to query on later (drug concentration, experiment date, lot number of antibody etc). Below are a couple of extra pointers for working with Pivot tables.
You can address the entries in a pivot table just like any other spreadsheet cell, so plotting from the pivot table is really easy:
This is fine but remember if the pivot table changes size (eg. a new Drug is introduced in later experiments) the cell references may become wrong.
In many cases though, you can bypass this directly and jump straight into a PivotChart. Follow the same method as we used to create the original pivot table but instead create a chart using [Insert > Pivot Chart] under the Insert Charts box:
Note that the areas now represent SERIES and CATEGORY instead of VALUES and ROWS but the same rules apply. Unlike the example above, this chart will adapt to new entries appearing in the data.
This one definitely falls into the gotcha category. Recall that the first thing you do is to define a range for your data. If you create a pivot table then add more data, the pivot table will not automatically update (which is also why the ‘Count’ column is a useful sanity check). To update to include new data select the pivot table and under PivotTable Tools Analyze tab, pull down “Change Data Source”
You can now redefine the scope of the original dataset and your pivot table will update. Note that if you just want to update the data for example if the original data has changed, you can hit “Refresh All” under the Data tab.
Finally in the realm of “Good to know” if you’re doing loads of forumla work with the results of pivot tables (normalising data based on mean values or whatever) Excel has a function called GETPIVOTDATA that allows you to pull values out of a pivot table. This largely bypasses the problems mentioned above with columns shifting as you use named arguements instead of references. Loads more info on the help page.