polyTEM.statistics¶

These functions are used to help with bulk analysis of crystal stacks

Warning

This module will be deprecated in favor of the spatial module

Functions

`decay_length`(expected_vals, distances[, ...])
`decay_length_means`(expected_val_list, distances)
`fisher_z`(rho)	Fisher Z Transform turns pearson correlation coefficient r into a normally distributed z value, such that z = arctanh(r) with mean 0.5*np.log((1+rho)/(1-rho)) and stderr 1/np.sqrt(N-3) INPUTS: rho = np.ndarray of correlation coefficients
`get_deltath_hist`(filename[, plot, fig, axs])	Process and Plot the AngleOverlap histogram INPUTS filename = .csv output of the AngleOverlap.py script --- OUTPUTS delta_th_df = dataframe
`get_kstest`(stack_list[, reference, plot])	retrieves Kolmogorov-smirnov test D-value against all distances for each stack in stack_list -- INPUTS stack_list = list of crystal stacks OUTPUTS kstest_list[sample][d_value_list] = list of all the ks_dvalues kstest_thetas[sample,distance] = theta with the largest CDF difference from uniform
`get_spatial_df`(stack_list)	Uses multiprocessing to run function multiprocessing_df_func, which performs spatial analysis on each stack.
`get_stack_list`(filenames_list[, num_threads])	Use multiprocessing to load stacks from filenames helps reduce time for large datasets
`match_stack_lists`(stack_list1[, stack_list2])	arbitrary CrystalStackLists may not having matching lengths and matching image orders.
`overlap_density`(stack_list1, stack_list2[, ...])	Compares overlap density of CrystalStacks peaks.
`plot_xcorr`(xc_array[, dtheta, resolution, ...])	plotting function for cross correlation
`remove_outliers`(data, col[, method])	removes outliers from pandas database using method input INPUT: data = pandas dataframe col = string of column name method = 'IQR', data more than 1.5*IQR outside the IQR are considered outliers. 'mean', data more than 4 standard deviations from the mean are considered outliers.
`xcorr_linecuts`(arr, max_d)	Outputs the radial lineout of average correlation.
`xcorr_slow`(stack_list1[, stack_list2, ...])	Across many samples, performs crosscorrelation of two 3-D datacubes that originate from the same sample. First, checks that the datasets come match sample name Second, perform crosscorrelation (convolution of the datacubes) using scipy.signal.correlate This method is slow, tqdm will print progress -- INPUT stack_list1: List of CrystalStack instances stack_list2: List of CrystalStack instances, if not inputted, then perform autocorrelation instead -- OUTPUT xc_array: 4-D array of shape (len(matches), size[0]2 - 1, size[1]2 - 1, size[2]*2-1) that represents (sample, lagx, lagy, lag_theta), centered around 0 lag --- KNOWN BUGS This uses A LOT of RAM space Takes about 5 seconds per pair of images.
`z_test`(x, y)	Standard Z-Statistic test Critical Z for p=0.01 is 2.33