minian.cross_registration module

minian.cross_registration.cal_mapping(dist)[source]

Calculate mappings from cell pair distances for a single group.

This function is called by calculate_mapping() for each group defined by metadata.

Parameters

dist (pd.DataFrame) – The distances between cell pairs. Should be in two-level column format.

Returns

mapping (pd.DataFrame) – The mapping of cells across sessions.

minian.cross_registration.calculate_centroid_distance(cents, by='session', index_dim=['animal'], tile=(50, 50))[source]

Calculate pairwise distance between centroids across all pairs of sessions.

To avoid calculating distance between centroids that are very far away, a 2d rolling window is applied to spatial coordinates, and only pairs of centroids within the rolling windows are considered for calculation.

Parameters
  • cents (pd.DataFrame) – Dataframe of centroid locations as returned by calculate_centroids().

  • by (str, optional) – Name of column by which cells from sessions will be grouped together. By default “session”.

  • index_dim (list, optional) – Additional metadata columns by which data should be grouped together. Pairs of sessions within such groups (but not across groups) will be used for calculation. By default [“animal”].

  • tile (tuple, optional) – Size of the rolling window to constrain caculation, specified in pixels and in the order (“height”, “width”). By default (50, 50).

Returns

res_df (pd.DataFrame) – Pairwise distance between centroids across all pairs of sessions, where each row represent a specific pair of cells across specific sessions. The dataframe contains a two-level MultiIndex as column names. The top level contains three labels: “session”, “variable” and “meta”. Each session will have a column under the “session” label, with values indicating the “unit_id” of the cell pair if either cell is in the corresponding session, and NaN otherwise. “variable” contains a single column “distance” indicating the distance of centroids for the cell pair. “meta” contains all additional metadata dimensions specified in index_dim as columns so that cell pairs can be uniquely identified.

minian.cross_registration.calculate_centroids(A, window)[source]

Calculate centroids of spatial footprints for cells inside a window.

Parameters
  • A (xr.DataArray) – The input spatial footprints of cells.

  • window (xr.DataArray) – Boolean mask with dimensions “height” and “width”. Only sptial footprints of cells within this window will be included in the result.

Returns

cents (pd.DataFrame) – Resulting centroids dataframe.

minian.cross_registration.calculate_mapping(dist)[source]

Calculate mappings from cell pair distances with mutual nearest-neighbor criteria.

This function takes in distance between cell pairs and filter them based on mutual nearest-neighbor criteria, where a cell pair is considered a valid mapping only when either cell is the nearest neighbor to the other (among all cell pairs presented in input dist). The result is hence a subset of input dist dataframe and rows are considered mapping between cells in pairs of sessions.

Parameters

dist (pd.DataFrame) – The distances between cell pairs. Should be in two-level column format as returned by calculate_centroid_distance(), and should also contains a (“group”, “group”) column as returned by group_by_session().

Returns

mapping (pd.DataFrame) – The mapping of cells across sessions, where each row represent a mapping of cells across specific sessions. The dataframe contains a two-level MultiIndex as column names. The top level contains three labels: “session”, “variable” and “meta”. Each session will have a column under the “session” label, with values indicating the “unit_id” of the cell in that session involved in the mapping, or NaN if the mapping does not involve the session. “variable” contains a single column “distance” indicating the distance of centroids for the cell pair if the mapping involve only two cells, and NaN otherwise. “meta” contains all additional metadata dimensions specified in index_dim as columns so that cell pairs can be uniquely identified.

minian.cross_registration.cartesian(*args)[source]

Computes cartesian product of inputs.

Parameters

*args (array_like) – Inputs that can be interpreted as array.

Returns

product (np.ndarray) – k x n array representing cartesian product of inputs, with k number of unique combinations for n inputs.

minian.cross_registration.fill_mapping(mappings, cents)[source]

Fill mappings with rows representing unmatched cells.

This function takes all cells in cents and check to see if they appear in any rows in mappings. If a cell is not involved in any mappings, then a row will be appended to mappings with the cell’s “unit_id” in the session column contatining the cell and NaN in all other “session” columns.

Parameters
Returns

mappings (pd.DataFrame) – Output mappings with unmatched cells.

minian.cross_registration.group_by_session(df)[source]

Add grouping information based on sessions involved in each row/mapping.

Parameters

df (pd.DataFrame) – Input dataframe with rows representing mappings. Should be in two-level column format like those returned by calculate_centroid_distance() or calculate_mapping() etc.

Returns

df (pd.DataFrame) – The input df with an additional (“group”, “group”) column, whose values are tuples indicating which sessions are involved (have non-NaN values) in the mappings represented by each row.

See also

resolve_mapping

for example usages

minian.cross_registration.pd_dist(A, B)[source]

Compute euclidean distance between two sets of matching centroid locations.

Parameters
  • A (pd.DataFrame) – Input centroid locations. Should have columns “height” and “width”.

  • B (pd.DataFrame) – Input centroid locations. Should have columns “height” and “width” and same row index as A, such that distance between corresponding rows will be calculated.

Returns

dist (pd.Series) – Distance between centroid locations. Has same row index as A and B.

minian.cross_registration.resolve(mapping, mode)[source]

Extend and resolve mappings.

This function is called by resolve_mapping() for each group defined by metadata

Parameters
  • mapping (pd.DataFrame) – Input mappings dataframe. Should be in two-level column format.

  • mode (str) – How to handle conflicted mappings. Should be either “strict” or “majority”.

Returns

mapping (pd.DataFrame) – Output mappings with extended and resolved mappings. Should be in the same two-level column format as input.

See also

resolve_mapping

minian.cross_registration.resolve_mapping(mapping, mode='majority')[source]

Extend and resolve mappings of pairs of sessions into mappings across multiple sessions.

This function try to transitively extend any mappings that share common cells. It do so by constructing an undirected unweighted graph with each cell in each session as unique nodes. An edge will be created for each pair of nodes mapped in the input pairwise mapping. It then walk through all connected components of the graph and examine whether conflict exists, i.e. when the component include multiple cells from same session. Depending on mode, either all cells in the conflicting session would be dropped, or the one mapped most of the times would be kept. Finally each connected component would result in one multi-session mapping.

Parameters
  • mapping (pd.DataFrame) – Input mappings dataframe. Should be in two-level column format as returned by calculate_mapping(), and should also contains a (“group”, “group”) column as returned by group_by_session().

  • mode (str) – Mode used to handle sessions containing conflicting mappings. Should be either “strict” or “majority”. If “strict”, then all the cells in the conflicting session would be dropped. If “majority”, then the cell that was mapped most of times will be kept, while a tie would result in dropping of all cells.

Returns

mapping (pd.DataFrame) – Output mappings with extended and resolved mappings. Should be in the same two-level column format as input.

Examples

Suppose we have two mappings sharing a common cell in “session2”:

>>> mapping = pd.DataFrame(
...     {
...         ("meta", "animal"): ["m1", "m1"],
...         ("session", "session1"): [0, None],
...         ("session", "session2"): [1, 1],
...         ("session", "session3"): [None, 2],
...     }
... )
>>> mapping = group_by_session(mapping)
>>> mapping 
    meta  session                                   group
  animal session1 session2 session3                 group
0     m1      0.0        1      NaN  (session1, session2)
1     m1      NaN        1      2.0  (session2, session3)

Then they will be extended and merged as a single mapping:

>>> resolve_mapping(mapping) 
    meta  session                                             group
  animal session1 session2 session3                           group
0     m1      0.0      1.0      2.0  (session1, session2, session3)

However, if our mappings contains an additional entry that conflicts with the extended mapping like the following:

>>> mapping = pd.DataFrame(
...     {
...         ("meta", "animal"): ["m1", "m1", "m1"],
...         ("session", "session1"): [0, None, 0],
...         ("session", "session2"): [1, 1, None],
...         ("session", "session3"): [None, 2, 5],
...     }
... )
>>> mapping = group_by_session(mapping)
>>> mapping 
    meta  session                                   group
  animal session1 session2 session3                 group
0     m1      0.0      1.0      NaN  (session1, session2)
1     m1      NaN      1.0      2.0  (session2, session3)
2     m1      0.0      NaN      5.0  (session1, session3)

Then mappings on the conflicting session will be dropped:

>>> resolve_mapping(mapping) 
    meta  session                                   group
  animal session1 session2 session3                 group
0     m1      0.0      1.0      NaN  (session1, session2)

Furthermore, if we have more mappings such that some cells in the conflicting session are more consistent than other, i.e they are involved in more mappings overall, like the following:

>>> mapping = pd.DataFrame(
...     {
...         ("meta", "animal"): ["m1", "m1", "m1", "m1", "m1"],
...         ("session", "session1"): [0, None, 0, None, None],
...         ("session", "session2"): [1, 1, None, 1, None],
...         ("session", "session3"): [None, 2, 5, None, 2],
...         ("session", "session4"): [None, None, None, 3, 3],
...     }
... )
>>> mapping = group_by_session(mapping)
>>> mapping 
    meta  session                                            group
  animal session1 session2 session3 session4                 group
0     m1      0.0      1.0      NaN      NaN  (session1, session2)
1     m1      NaN      1.0      2.0      NaN  (session2, session3)
2     m1      0.0      NaN      5.0      NaN  (session1, session3)
3     m1      NaN      1.0      NaN      3.0  (session2, session4)
4     m1      NaN      NaN      2.0      3.0  (session3, session4)

Then, the majority mode would keep the cell in the conflicting session that matched to most number of mappings (in this case, cell 2 in session3):

>>> resolve_mapping(mapping, mode='majority') 
    meta  session                                                                group
  animal session1 session2 session3 session4                                     group
0     m1      0.0      1.0      2.0      3.0  (session1, session2, session3, session4)

While the strict mode would drop any cells in the conflicting session regardless:

>>> resolve_mapping(mapping, mode='strict') 
    meta  session                                                      group
  animal session1 session2 session3 session4                           group
0     m1      0.0      1.0      NaN      3.0  (session1, session2, session4)
minian.cross_registration.subset_pairs(A, B, tile)[source]

Return all pairs of cells within certain window given two sets of centroid locations.

Parameters
  • A (pd.DataFrame) – Input centroid locations. Should have columns “height” and “width”.

  • B (pd.DataFrame) – Input centroid locations. Should have columns “height” and “width”.

  • tile (tuple) – Window size.

Returns

pairs (set) – Set of all cell pairs represented as tuple.