One of the biggest challenges in modern Imaging Research lies in how to handle big datasets. This is a particular issue when undertaking multidimensional acquisition. In this post, I’ll be covering some ideas on how to sensibly work with large datasets as well as some neat tricks on how to downsize the biggest ones.
To fully appreciate the magnitude of big datasets, it’s worth considering dimensionality. On a hard-disk, a single image takes up the number of bits (binary digits) needed to record the information for every pixel in the image (We’ll assume arbitrary 512 x 512 image dimensions). If the image is 8 bit (remember that’s 8 bits of data per pixel) the image will take up:
x * y * bitdepth = 512 * 512 * 8 = 2097152 bits ≈ 0.25 Mb
(It will actually take up a bit more as there’s header information in the image)
We can extend this thought-experiment to a time course:
Multi-channel imaging is very common, even if it’s just recording the transmitted light in addition to your fluorescence. The real fun starts when you have access to spectral detection, which can add up to 32 channels across a slightly-wider-than-visible spectrum:
tn * zn * λn * x * y * bitdepth = 50 * 20 * 4 * 512 * 512 * 8 ≈ 1000 Mb
Of course this can be extended to multiple positions on an XY stage, higher bitdepths and so on but hopefully the point is made.
You not only have to store all of these data but even moving the files around and opening them, becomes time consuming.
When this might be an issue
In and of itself, there is no real problem with multidimensional datasets. If that’s what your experiment requires, then you can’t really make do with fewer channels or a smaller z-stack.
This may be more relevant in some specific circumstances:
- If during your time course the sample drifts, perhaps the first half of your movie is useful but the latter half is unusable.
- The same goes for bubbles, debris, cells dying or anything else obscuring your data but leaving enough time points to be worth keeping.
- We’ve occasionally seen that when you run long, multidimensional time courses and cancel before the end of the acquisition, Zen (the Zeiss acquisition software) writes out “Phantom Images” to fill in the un-acquired time points. The images are not empty, but have very specific (and within an experiment, identical) noise.
What to do about it: An ode to Bio-Formats
As I have said before, one of the most useful things about Fiji is the integration of the Bio-Formats library. Not only does this allow you to open a huge number of proprietary file formats from within the comfort of Fiji, but it also has a bunch of smart Importing options. Let’s take a look:
To open files with the Bio-Formats library in Fiji you have two options:
1) On the menu, run [Plugins > Bio-Formats > Bio-Formats Importer] and you’ll get a normal Open File Dialog.
2) The alternative (and my preference) is to run [Plugins > Bio-Formats > Bio-Formats Shortcut Window] which will (unsurprisingly) open the Shortcut Window:
Here’s where the awesome begins. Once you drag a file in or open a file, you are presented with the Import Window which looks something like this:
Display Metadata / ROIs: Metadata deserve their own post (although a great place to start is the OME blog). Needless to say if you want to find out the details of your acquisition, check these boxes.
Split Channels/Timepoints/focal planes: Really useful if you were going to split the channels anyway
Specify Range to Open: Check this box and you’ll be presented with a further dialog asking you to specify the ranges you’d like to import. Only need the first 10 frames of a 1000 frame movie? You got it! Only want one channel on which to do your analysis: no problem.
This last option is the key to subsetting large datasets. If you only want the first half of a massive movie, why bother opening all the frames only to discard half of them. Open the first half and save them as a subset…which neatly leads me onto the last point:
Much as I like to moan about proprietary file formats, they do have some benefits. Each one perfectly saves all of the acquisition metadata for that acquisition system, because they’re written to do that. Using these formats can become a problem when you start using different software and other file formats as they all save different amounts of metadata in different ways and call them different things (again, the OME blog has a great explanation).
Anyway, as you can’t save back into the original file format, your next best option is OME.TIFF which is a fantastically metadata-aware file format. Furthermore, as of Bio-Formats version 5.1.2, the OME specification now supports files larger than 4GB.
To save your file in this format, simply use the Bio-Formats Exporter, available in the same places as the Importer (see above).