Note
This page was generated from a Jupyter notebook. Not all interactive visualization will work on this web page. Consider downloading the notebooks for full Python-backed interactivity.
[ ]:
# holoviews initialization
3. Pre-processing¶
In the pre-processing steps that follow, the pipeline will load and process the videos (downsampling, subsetting, denoising).
All functions are evaluated lazily, which means that initially only a “plan” for the actual computation will be created, without its execution. Actual computations are carried out only when results are being saved
3.1. loading videos and visualization¶
Recall the values of param_load_videos
:
[6]:
param_load_videos
[6]:
{'pattern': 'msCam[0-9]+\\.avi$',
'dtype': numpy.uint8,
'downsample': {'frame': 1, 'height': 1, 'width': 1},
'downsample_strategy': 'subset'}
The first argument of load_videos should be the path that contains the videos(dpath
). We then pass the dictionary, param_load_videos
, defined earlier, which specifies several relevant arguments. The argument pattern
is optional and is the regular expression used to filter files under the specified folder. The default value r"msCam[0-9]+\.avi$"
means that a file can only be loaded if its filename contains ‘msCam’, followed by
at least one number, then ‘.avi’ as the end of the filename. This can be changed to suit the naming convention of your videos. The resulting “video array” varr
contains three dimensions: height
, width
, and frame
. If you wish to downsample the video, pass in a dictionary to downsample
, with the name of dimensions as keys and the downsampling folds as integer value. For example, downsample=dict("frame"=2)
will temporally downsample the video with a factor of 2.
downsample_strategy
will assume two values: either "subset"
, meaning downsampling are carried out simply by subsetting the data, or "mean"
, meaning a mean will be calculated on the window of downsampling (the latter being slower).
In addition to the video array varr
, the following cell also try to estimate best chunk size chk
to use for computations. This variable is needed for later steps since it’s important to keep chunk size consistent within the pipeline. If for some reason you have to restart the kernel at some point, remember to either note down the content of chk
or rerun the following cell.
changing parameters
All minian parameters are dict
and you can freely change them in various ways. You can go back to the initial parameter setting cell and change things there. Alternatively, you can add a code cell before running the relevant step. For example, the following line will tell the function to load from "/my/data_path"
:
param_load_videos["vapth"] = "/my/data_path"
While the following line will change the downsample setting (which is specified as a dict
on its own) when loading the video:
param_load_videos["downsample"] = {"frame": 2}
[7]:
varr = load_videos(dpath, **param_load_videos)
chk, _ = get_optimal_chk(varr, dtype=float)
loading 10 videos in folder /home/runner/work/minian/minian/demo_movies
We then immediately save the array representation to the intermediate folder to avoid repeatedly loading the video in later steps.
[8]:
%%time
varr = save_minian(
varr.chunk({"frame": chk["frame"], "height": -1, "width": -1}).rename("varr"),
intpath,
overwrite=True,
)
CPU times: user 328 ms, sys: 32.7 ms, total: 361 ms
Wall time: 4.8 s
The variable varr
is a xarray.DataArray. Now is a perfect time to familiarize yourself with this data structure and the xarray module in general, since we will be using these data structures throughout the analysis. Basically, a xarray.DataArray
is N-dimensional array labeled with additional metadata, with many useful properties that make them easy to
manipulate. We can ask the computer to print out some information of varr
by calling it (as with any other variable):
[9]:
varr
[9]:
<xarray.DataArray 'varr' (frame: 2000, height: 480, width: 752)> dask.array<from-zarr, shape=(2000, 480, 752), dtype=uint8, chunksize=(50, 480, 752), chunktype=numpy.ndarray> Coordinates: * frame (frame) int64 0 1 2 3 4 5 6 ... 1993 1994 1995 1996 1997 1998 1999 * height (height) int64 0 1 2 3 4 5 6 7 ... 472 473 474 475 476 477 478 479 * width (width) int64 0 1 2 3 4 5 6 7 8 ... 744 745 746 747 748 749 750 751
- frame: 2000
- height: 480
- width: 752
- dask.array<chunksize=(50, 480, 752), meta=np.ndarray>
Array Chunk Bytes 721.92 MB 18.05 MB Shape (2000, 480, 752) (50, 480, 752) Count 40 Tasks 40 Chunks Type uint8 numpy.ndarray - frame(frame)int640 1 2 3 4 ... 1996 1997 1998 1999
array([ 0, 1, 2, ..., 1997, 1998, 1999])
- height(height)int640 1 2 3 4 5 ... 475 476 477 478 479
array([ 0, 1, 2, ..., 477, 478, 479])
- width(width)int640 1 2 3 4 5 ... 747 748 749 750 751
array([ 0, 1, 2, ..., 749, 750, 751])
We can see now that varr
is a xarray.DataArray
with a name "demo_movies"
and three dimensions: frame
, height
and width
. Each dimension is labeled with ascending natural numbers. The dtype (data type) of
varr
is numpy.uint8
.
3.2. visualize raw data and optionally set roi for motion correction¶
Once the data is loaded we can visualize the content of varr
with the help of VArrayViewer
, which shows the array as a movie. You can also plot summary traces like mean fluorescence across frames
by passing a list
of names of traces as inputs. Currently "mean"
, "min"
, "max"
and "diff"
are supported, where "diff"
is mean fluorescent value difference across all pixels in a frame.
VArrayViewer
also supports a box drawing tool where you can draw a box in the field of view (FOV) and save it as a mask using the “save mask”
button. The mask is saved as vaviewer.mask
, and can be retrieved and used at later stages, for example, when you want to run motion correction on a sub-region of the FOV. See the API reference for more detail.
[10]:
hv.output(size=output_size)
if interactive:
vaviewer = VArrayViewer(varr, framerate=5, summary=["mean", "max"])
display(vaviewer.show())
computing summary
If you decide to set a mask for motion correction, the following cell is an example of how to convert the mask into a subset_mc
parameter that can be later passed into motion correction functions.
[11]:
if interactive:
try:
subset_mc = list(vaviewer.mask.values())[0]
except IndexError:
pass
3.3. subset part of video¶
Before proceeding to pre-processing, it’s good practice to check if there is anything obviously wrong with the video (e.g. the camera suddenly dropped, resulting in dark frames). This can usually be observed by visualizing the video and plotting the timecourse of the mean fluorescence. We can utilize the xarray.DataArray.sel method and slice to subset any part
of the data we want. By default subset = None
will result in no subsetting.
subsetting data
The xarray.DataArray.sel method takes in either a dict
or keyword arguments. In both cases you want to specify the dimension names and the coordinates of the subset as key-value pairs. For example, say you want only the first 800 frames of the video, the following two lines will both work and they are equivalent:
varr.sel(frame=slice(0, 799)) # slice object is inclusive on both ends
varr.sel({"frame": slice(0, 799)})
This also works on arbitrary dimensions. For example, the following will give you a 100px x 100px chunk of your movie at a corner:
varr.sel(height=slice(0, 99), width=slice(0, 99))
[12]:
varr_ref = varr.sel(subset)
3.4. glow removal and visualization¶
Here we remove the general glow background caused by vignetting effect. We simply calculate a minimum projection across all frame
s and subtract that projection from all frame
s. A benefit of doing this is you could interpret the result as “change of fluorescence from baseline”, while preserving the linear scale of the raw data, which is usually the range of a 8-bit integer – 0-255. The result can be visualized again with VArrayViewer
.
[13]:
%%time
varr_min = varr_ref.min("frame").compute()
varr_ref = varr_ref - varr_min
CPU times: user 162 ms, sys: 24.4 ms, total: 186 ms
Wall time: 974 ms
[14]:
hv.output(size=int(output_size * 0.7))
if interactive:
vaviewer = VArrayViewer(
[varr.rename("original"), varr_ref.rename("glow_removed")],
framerate=5,
summary=None,
layout=True,
)
display(vaviewer.show())
3.5. denoise¶
Recall that by default we use a median filter for denoising:
[15]:
param_denoise
[15]:
{'method': 'median', 'ksize': 7}
There is only one parameter controlling how the filtering is done: the kernel size (ksize
) of the filter. The effect of this parameter can be visualized with the tool below. Alternatively other methods like gaussian filter, anisotropic filter etc. are implmented as well. See the API reference for denoise
for more detail.
Generally ksize=5
is good (approximately half the diamater of the largest cell). Note that if you do want to play with the ksize, it has to be odd number.
[16]:
hv.output(size=int(output_size * 0.6))
if interactive:
display(
visualize_preprocess(
varr_ref.isel(frame=0).compute(),
denoise,
method=["median"],
ksize=[5, 7, 9],
)
)
The following cell would carry out denoise step. Be sure to change the parameters based on visualization results before running the following cell.
[17]:
varr_ref = denoise(varr_ref, **param_denoise)
3.6. background removal¶
Recall the parameters for background removal:
[18]:
param_background_removal
[18]:
{'method': 'tophat', 'wnd': 15}
This step attempts to estimate background (everything except the fluorescent signal of in-focus cells) and subtracts it from the frame. By default we use a morphological tophat operation to estimate the background from each frame: First, a disk element with a radius of wnd
is created. Then, a morphological erosion using the disk element is applied to each frame,
which eats away any bright “features” that are smaller than the disk element. Subsequently, a morphological dilation is applied to the “eroded” image, which in theory undoes the erosion except the bright “features” that were completely eaten away. The overall effect of this process is to remove any bright feature that is smaller than a disk with radius wnd
. Thus, when setting wnd
to the largest expected radius of cell, this
process can give us a good estimation of the background. Then finally the estimated background is subtracted from each frame.
Pragmatically wnd=15
works well.
[19]:
hv.output(size=int(output_size * 0.6))
if interactive:
display(
visualize_preprocess(
varr_ref.isel(frame=0).compute(),
remove_background,
method=["tophat"],
wnd=[10, 15, 20],
)
)
The following cell would carry out background removal step. Be sure to change the parameters based on visualization results before running the following cell.
[20]:
varr_ref = remove_background(varr_ref, **param_background_removal)
3.7. save result¶
Here we are saving our pre-processed video (varr_ref
) into the intermediate folder (intpath
). Note that for every saved variable a separate folder will be created based on the .name
attribute of that variable. And variables with the same .name
attribute will be saved to same folder regardless the variable name, potentially overwritting each other! Here we rename it to "varr_ref"
so that the saved
folder will be named “varr_ref.zarr”.
[21]:
%%time
varr_ref = save_minian(varr_ref.rename("varr_ref"), dpath=intpath, overwrite=True)
CPU times: user 915 ms, sys: 103 ms, total: 1.02 s
Wall time: 23.5 s