Note

This page was generated from a Jupyter notebook. Not all interactive visualization will work on this web page. Consider downloading the notebooks for full Python-backed interactivity.

[ ]:
# holoviews initialization

3. Pre-processing

In the pre-processing steps that follow, the pipeline will load and process the videos (downsampling, subsetting, denoising).

All functions are evaluated lazily, which means that initially only a “plan” for the actual computation will be created, without its execution. Actual computations are carried out only when results are being saved

3.1. loading videos and visualization

Recall the values of param_load_videos:

[6]:
param_load_videos
[6]:
{'pattern': 'msCam[0-9]+\\.avi$',
 'dtype': numpy.uint8,
 'downsample': {'frame': 1, 'height': 1, 'width': 1},
 'downsample_strategy': 'subset'}

The first argument of load_videos should be the path that contains the videos(dpath). We then pass the dictionary, param_load_videos, defined earlier, which specifies several relevant arguments. The argument pattern is optional and is the regular expression used to filter files under the specified folder. The default value r"msCam[0-9]+\.avi$" means that a file can only be loaded if its filename contains ‘msCam’, followed by at least one number, then ‘.avi’ as the end of the filename. This can be changed to suit the naming convention of your videos. The resulting “video array” varr contains three dimensions: height, width, and frame. If you wish to downsample the video, pass in a dictionary to downsample, with the name of dimensions as keys and the downsampling folds as integer value. For example, downsample=dict("frame"=2) will temporally downsample the video with a factor of 2. downsample_strategy will assume two values: either "subset", meaning downsampling are carried out simply by subsetting the data, or "mean", meaning a mean will be calculated on the window of downsampling (the latter being slower).

In addition to the video array varr, the following cell also try to estimate best chunk size chk to use for computations. This variable is needed for later steps since it’s important to keep chunk size consistent within the pipeline. If for some reason you have to restart the kernel at some point, remember to either note down the content of chk or rerun the following cell.

changing parameters

All minian parameters are dict and you can freely change them in various ways. You can go back to the initial parameter setting cell and change things there. Alternatively, you can add a code cell before running the relevant step. For example, the following line will tell the function to load from "/my/data_path":

param_load_videos["vapth"] = "/my/data_path"

While the following line will change the downsample setting (which is specified as a dict on its own) when loading the video:

param_load_videos["downsample"] = {"frame": 2}
[7]:
varr = load_videos(dpath, **param_load_videos)
chk, _ = get_optimal_chk(varr, dtype=float)
loading 10 videos in folder /home/runner/work/minian/minian/demo_movies

We then immediately save the array representation to the intermediate folder to avoid repeatedly loading the video in later steps.

[8]:
%%time
varr = save_minian(
    varr.chunk({"frame": chk["frame"], "height": -1, "width": -1}).rename("varr"),
    intpath,
    overwrite=True,
)
CPU times: user 328 ms, sys: 32.7 ms, total: 361 ms
Wall time: 4.8 s

The variable varr is a xarray.DataArray. Now is a perfect time to familiarize yourself with this data structure and the xarray module in general, since we will be using these data structures throughout the analysis. Basically, a xarray.DataArray is N-dimensional array labeled with additional metadata, with many useful properties that make them easy to manipulate. We can ask the computer to print out some information of varr by calling it (as with any other variable):

[9]:
varr
[9]:
<xarray.DataArray 'varr' (frame: 2000, height: 480, width: 752)>
dask.array<from-zarr, shape=(2000, 480, 752), dtype=uint8, chunksize=(50, 480, 752), chunktype=numpy.ndarray>
Coordinates:
  * frame    (frame) int64 0 1 2 3 4 5 6 ... 1993 1994 1995 1996 1997 1998 1999
  * height   (height) int64 0 1 2 3 4 5 6 7 ... 472 473 474 475 476 477 478 479
  * width    (width) int64 0 1 2 3 4 5 6 7 8 ... 744 745 746 747 748 749 750 751

We can see now that varr is a xarray.DataArray with a name "demo_movies" and three dimensions: frame, height and width. Each dimension is labeled with ascending natural numbers. The dtype (data type) of varr is numpy.uint8.

3.2. visualize raw data and optionally set roi for motion correction

Once the data is loaded we can visualize the content of varr with the help of VArrayViewer, which shows the array as a movie. You can also plot summary traces like mean fluorescence across frames by passing a list of names of traces as inputs. Currently "mean", "min", "max" and "diff" are supported, where "diff" is mean fluorescent value difference across all pixels in a frame.

VArrayViewer also supports a box drawing tool where you can draw a box in the field of view (FOV) and save it as a mask using the “save mask” button. The mask is saved as vaviewer.mask, and can be retrieved and used at later stages, for example, when you want to run motion correction on a sub-region of the FOV. See the API reference for more detail.

[10]:
hv.output(size=output_size)
if interactive:
    vaviewer = VArrayViewer(varr, framerate=5, summary=["mean", "max"])
    display(vaviewer.show())
computing summary

If you decide to set a mask for motion correction, the following cell is an example of how to convert the mask into a subset_mc parameter that can be later passed into motion correction functions.

[11]:
if interactive:
    try:
        subset_mc = list(vaviewer.mask.values())[0]
    except IndexError:
        pass

3.3. subset part of video

Before proceeding to pre-processing, it’s good practice to check if there is anything obviously wrong with the video (e.g. the camera suddenly dropped, resulting in dark frames). This can usually be observed by visualizing the video and plotting the timecourse of the mean fluorescence. We can utilize the xarray.DataArray.sel method and slice to subset any part of the data we want. By default subset = None will result in no subsetting.

subsetting data

The xarray.DataArray.sel method takes in either a dict or keyword arguments. In both cases you want to specify the dimension names and the coordinates of the subset as key-value pairs. For example, say you want only the first 800 frames of the video, the following two lines will both work and they are equivalent:

varr.sel(frame=slice(0, 799)) # slice object is inclusive on both ends
varr.sel({"frame": slice(0, 799)})

This also works on arbitrary dimensions. For example, the following will give you a 100px x 100px chunk of your movie at a corner:

varr.sel(height=slice(0, 99), width=slice(0, 99))
[12]:
varr_ref = varr.sel(subset)

3.4. glow removal and visualization

Here we remove the general glow background caused by vignetting effect. We simply calculate a minimum projection across all frames and subtract that projection from all frames. A benefit of doing this is you could interpret the result as “change of fluorescence from baseline”, while preserving the linear scale of the raw data, which is usually the range of a 8-bit integer – 0-255. The result can be visualized again with VArrayViewer.

[13]:
%%time
varr_min = varr_ref.min("frame").compute()
varr_ref = varr_ref - varr_min
CPU times: user 162 ms, sys: 24.4 ms, total: 186 ms
Wall time: 974 ms
[14]:
hv.output(size=int(output_size * 0.7))
if interactive:
    vaviewer = VArrayViewer(
        [varr.rename("original"), varr_ref.rename("glow_removed")],
        framerate=5,
        summary=None,
        layout=True,
    )
    display(vaviewer.show())

3.5. denoise

Recall that by default we use a median filter for denoising:

[15]:
param_denoise
[15]:
{'method': 'median', 'ksize': 7}

There is only one parameter controlling how the filtering is done: the kernel size (ksize) of the filter. The effect of this parameter can be visualized with the tool below. Alternatively other methods like gaussian filter, anisotropic filter etc. are implmented as well. See the API reference for denoise for more detail.

Generally ksize=5 is good (approximately half the diamater of the largest cell). Note that if you do want to play with the ksize, it has to be odd number.

[16]:
hv.output(size=int(output_size * 0.6))
if interactive:
    display(
        visualize_preprocess(
            varr_ref.isel(frame=0).compute(),
            denoise,
            method=["median"],
            ksize=[5, 7, 9],
        )
    )

The following cell would carry out denoise step. Be sure to change the parameters based on visualization results before running the following cell.

[17]:
varr_ref = denoise(varr_ref, **param_denoise)

3.6. background removal

Recall the parameters for background removal:

[18]:
param_background_removal
[18]:
{'method': 'tophat', 'wnd': 15}

This step attempts to estimate background (everything except the fluorescent signal of in-focus cells) and subtracts it from the frame. By default we use a morphological tophat operation to estimate the background from each frame: First, a disk element with a radius of wnd is created. Then, a morphological erosion using the disk element is applied to each frame, which eats away any bright “features” that are smaller than the disk element. Subsequently, a morphological dilation is applied to the “eroded” image, which in theory undoes the erosion except the bright “features” that were completely eaten away. The overall effect of this process is to remove any bright feature that is smaller than a disk with radius wnd. Thus, when setting wnd to the largest expected radius of cell, this process can give us a good estimation of the background. Then finally the estimated background is subtracted from each frame.

Pragmatically wnd=15 works well.

[19]:
hv.output(size=int(output_size * 0.6))
if interactive:
    display(
        visualize_preprocess(
            varr_ref.isel(frame=0).compute(),
            remove_background,
            method=["tophat"],
            wnd=[10, 15, 20],
        )
    )

The following cell would carry out background removal step. Be sure to change the parameters based on visualization results before running the following cell.

[20]:
varr_ref = remove_background(varr_ref, **param_background_removal)

3.7. save result

Here we are saving our pre-processed video (varr_ref) into the intermediate folder (intpath). Note that for every saved variable a separate folder will be created based on the .name attribute of that variable. And variables with the same .name attribute will be saved to same folder regardless the variable name, potentially overwritting each other! Here we rename it to "varr_ref" so that the saved folder will be named “varr_ref.zarr”.

[21]:
%%time
varr_ref = save_minian(varr_ref.rename("varr_ref"), dpath=intpath, overwrite=True)
CPU times: user 915 ms, sys: 103 ms, total: 1.02 s
Wall time: 23.5 s