Core class¶
Submodule core
Description of the package functionality
The main class of DigitalCellSorter. The class includes tools for:
Pre-preprocessing of single cell RNA sequencing data
Quality control
Batch effects correction
Cells anomaly score evaluation
Dimensionality reduction
Clustering
Annotation of cell types
Vizualization
Post-processing
-
class
DigitalCellSorter
(df_expr=None, dataName='dataName', species='Human', geneNamesType='alias', geneListFileName=None, mitochondrialGenes=None, sigmaOverMeanSigma=0.01, nClusters=10, nFineClusters=3, doFineClustering=True, splitFineClusters=False, subSplitSize=100, medianScaleFactor=10000, minSizeForFineClustering=50, clusteringFunction=<class 'sklearn.cluster._agglomerative.AgglomerativeClustering'>, nComponentsPCA=200, nSamples_pDCS=3000, nSamples_Hopfield=200, saveDir='', makeMarkerSubplots=False, availableCPUsCount=1, zScoreCutoff=0.3, subclusteringName=None, doQualityControl=True, doBatchCorrection=False, makePlots=True, useUnderlyingNetwork=True, minimumNumberOfMarkersPerCelltype=10, nameForUnknown='Unassigned', nameForLowQC='Failed QC', matplotlibMode='Agg', countDepthCutoffQC=0.5, numberOfGenesCutoffQC=0.5, mitochondrialGenesCutoffQC=1.5, excludedFromQC=None, countDepthPrecutQC=500, numberOfGenesPrecutQC=250, precutQC=False, minSubclusterSize=25, thresholdForUnknown_pDCS=0.0, thresholdForUnknown_ratio=0.0, thresholdForUnknown_Hopfield=0.0, thresholdForUnknown=0.2, layout='TSNE', safePlotting=True, HopfieldTemperature=0.1, annotationMethod='ratio-pDCS-Hopfield', useNegativeMarkers=True, removeLowQualityScores=True, updateConversionDictFile=True, verbose=1)[source]¶ Bases:
DigitalCellSorter.VisualizationFunctions.VisualizationFunctions
Class of Digital Cell Sorter with methods for processing single cell RNA-seq data. Includes analyses and visualization tools.
- Parameters:
- df_expr: pandas.DataFrame, Defauld None
Gene expression in a form of a table, where genes are rows, and cells/batches are columns
- dataName: str, Default ‘dataName’
Name used in output files
- geneNamesType: str, Default ‘alias’
Input gene name convention
- geneListFileName: str, Default None
Name of the marker genes file
- mitochondrialGenes: list, Default None
List of mitochondrial genes to use in quality control
- sigmaOverMeanSigma: float, Default 0.1
Threshold to consider a gene constant
- nClusters: int, Default 10
Number of clusters
- nFineClusters: int, Default 3
Number of fine clusters to determine with Spectral Co-clustering routine. This option is ignored is doFineClustering is False.
- doFineClustering: boolean, Default True
Whether to do fine clustering or not
- minSizeForFineClustering: int, Default 50
Minimum number of cells required to do fine clustering of a cluster. This option is ignored is doFineClustering is False.
- clusteringFunction: function, Default AgglomerativeClustering
Clustering function to use. Other options: KMeans, {k_neighbors:40}, etc. Note: the function should have .fit method and same input and output. For Network-based clustering pass a dictionary {‘k_neighbors’:40, metric:’euclidean’, ‘clusterExpression’:True}, this way the best number of clusters will be determined automatically
- nComponentsPCA: int, Default 200
Number of pca components
- nSamples_pDCS: int, Default 3000
Number of random samples in distribution for pDCS annotation method
- nSamples_Hopfield: int, Default 500
Number of repetitions for Hopfield annotation method
- saveDir: str, Default os.path.join(‘’)
Directory for output files
- makeMarkerSubplots: boolean, Default False
Whether to make subplots on markers
- makePlots: boolean, Default True
Whether to make all major plots
- availableCPUsCount: int, Default min(12, os.cpu_count())
Number of CPUs used in pDCS method
- zScoreCutoff: float, Default 0.3
Z-Score cutoff when setting expression of a cluster as significant
- thresholdForUnknown: float, Default 0.3
Threshold when assigning label “Unknown”. This option is used only with a combination of 2 or more annotation methods
- thresholdForUnknown_pDCS: float, Default 0.1
Threshold when assigning label “Unknown” in pDCS method
- thresholdForUnknown_ratio: float, Default 0.1
Threshold when assigning label “Unknown” in ratio method
- thresholdForUnknown_Hopfield: float, Default 0.1
Threshold when assigning label “Unknown” in Hopfield method
- annotationMethod: str, Default ‘ratio-pDCS-Hopfield’
- Metod to use for annotation of cell types to clusters. Options are:
‘pDCS’: main DCS voting scheme with null testing
‘ratio’: simple voting score
‘Hopfield’: Hopfield Network classifier
‘pDCS-ratio’: ‘pDCS’ adjusted with ‘ratio’
‘pDCS-Hopfield’: ‘pDCS’ adjusted with ‘Hopfield’
‘ratio-Hopfield’: ‘ratio’ adjusted with ‘Hopfield’
‘pDCS-ratio-Hopfield’: ‘pDCS’ adjusted with ‘ratio’ and ‘Hopfield’
- subclusteringName: str, Default None
Parameter used in for certain labels on plots
- doQualityControl: boolean, Default True
Whether to remove low quality cells
- doBatchCorrection: boolean, Default False
Whether to correct data for batches
- minimumNumberOfMarkersPerCelltype: int, Default 10
Minimum number of markers per cell type to keep that cell type in annotation options
- nameForUnknown: str, Default ‘Unassigned’
Name to use for clusters where label assignment yielded uncertain results
- nameForLowQC: str, Default ‘Failed QC’
Name to use for cell that do not pass quality control
- layout: str, Default ‘TSNE’
- Projection layout used in visualization. Options are:
‘TSNE’: t-SNE layout L.J.P. van der Maaten. Accelerating t-SNE using Tree-Based Algorithms. Journal of Machine Learning Research 15(Oct):3221-3245, 2014.
‘PCA’: use two largest principal components
‘UMAP’: use uniform manifold approximation, McInnes, L., Healy, J., UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction, ArXiv e-prints 1802.03426, 2018
‘PHATE’: use potential of heat diffusion for affinity-based transition embedding, Moon, K.R., van Dijk, D., Wang, Z. et al. Visualizing structure and transitions in high-dimensional biological data. Nat Biotechnol 37, 1482–1492 (2019).
- Usage:
DCS = DigitalCellSorter.DigitalCellSorter()
df_data = DCS.Clean(df_data)
Methods:
KeyInFile
(key, file)Check is a key exists in a HDF file.
alignSeries
(se1, se2, tagForMissing)Align two pandas.Series
annotate
([mapNonexpressedCelltypes])Produce cluster voting results, annotate cell types, and update marker expression with cell type labels
Produce cluster annotation results
annotateWith_pDCS_Scheme
(df_markers_expr, …)Produce cluster annotation results
annotateWith_ratio_Scheme
(df_markers_expr, …)Produce cluster annotation results
batchEffectCorrection
([method])Batch effect correction.
Calculate Quality Control (QC) measures
calculateV
(args)Calculate the voting scores (celltypes by clusters)
clean
()Clean pandas.DataFrame: validate index, remove index duplicates, replace missing with zeros, remove all-zero rows and columns
cluster
()Cluster PCA-reduced data into a desired number of clusters
convert
([nameFrom, nameTo])Convert index to hugo names, if any names in the index are duplicated, remove duplicates
convertColormap
(colormap)Convert colormap from the form (1.,1.,1.,1.) to ‘rgba(255,255,255,1.)’
createReverseDictionary
(inputDictionary)Efficient way to create a reverse dictionary from a dictionary.
getAnomalyScores
(trainingSet, testingSet[, …])Function to get anomaly score of cells based on some reference set
getCells
([celltype, clusterIndex, clusterName])Get cell annotations in a form of pandas.Series
getCountsDataframe
(se1, se2[, tagForMissing])Get a pandas.DataFrame with cross-counts (overlaps) between two pandas.Series
getExprOfCells
(cells)Get expression of a set of cells.
getExprOfGene
(gene[, analyzeBy])Get expression of a gene.
getHugoName
(gene[, printAliases])Get gene hugo name(s).
getIndexOfGoodQualityCells
([QCplotsSubDir])Get index of sells that satisfy the QC criteria
getNewMarkerGenes
([cluster, top, …])Extract new marker genes based on the cluster annotations
getQualityControlCutoff
(se, cutoff[, …])Function to calculate QC quality cutoff
getSubnetworkOfPCN
(subnetworkGenes[, …])Extract subnetwork of PCN network
loadAnnotatedLabels
([detailed, …])Load cell annotations resulted from function ‘annotate’
Load processed expression data from the internal HDF storage.
makeAnomalyScoresPlot
([cells, suffix, noPlot])Make anomaly scores plot
Make and plot Hopfield landscape
makeIndividualGeneExpressionPlot
(genes, **kwargs)Produce individual gene expression plot on a 2D layout
makeIndividualGeneTtestPlot
(gene[, analyzeBy])Produce individual gene t-test plot of the two-tailed p-value.
makeMarkerSubplots
(**kwargs)Produce subplots on each marker and its expression on all clusters
makeProjectionPlotAnnotated
(**kwargs)Produce projection plot colored by cell types
makeProjectionPlotByBatches
(**kwargs)Produce projection plot colored by batches
makeProjectionPlotByClusters
(**kwargs)Produce projection plot colored by clusters
makeProjectionPlotsQualityControl
(**kwargs)Produce Quality Control projection plots
mergeIndexDuplicates
(df_expr[, method, …])Merge index duplicates
normalize
([median])Normalize pandas.DataFrame: rescale all cells, log-transform data, remove constant genes, sort index
prepare
(obj)Prepare pandas.DataFrame for input to function process() If input is pd.DataFrame validate the input whether it has correct structure.
prepareMarkers
([expressedGenes, …])Get dictionary of markers for each cell types.
process
([dataIsNormalized, cleanData])Process data before using any annotation of visualization functions
project
([PCAonly, do_fast_tsne])Project pandas.DataFrame to lower dimensions
propagateHopfield
([sigma, xi, T, tmax, …])Function is used internally to propagate Hopfield network over a set number of time steps
qualityControl
(**kwargs)Remove low quality cells
readMarkerFile
([mergeFunction, mergeCutoff])Read markers file, prepare markers
recordAnnotationResults
(df_marker_cell_type, …)Record cell type annotation results to spreadsheets.
Record expression data from the internal HDF storage.
Aggregate of visualization tools of this class.
zScoreOfSeries
(se)Calculate z-score of pandas.Series and modify the Series in place
Attributes:
-
property
saveDir
¶
-
property
fileHDFpath
¶
-
property
df_expr
¶
-
property
geneListFileName
¶
-
prepare
(obj)[source]¶ Prepare pandas.DataFrame for input to function process() If input is pd.DataFrame validate the input whether it has correct structure.
- Parameters:
- obj: str, pandas.DataFrame, pandas.Series
Expression data in a form of pandas.DataFrame, pandas.Series, or name and path to a csv file with data
- Returns:
None
- Usage:
DCS = DigitalCellSorter.DigitalCellSorter()
dDCS.preapre(‘data.csv’)
-
convert
(nameFrom=None, nameTo=None, **kwargs)[source]¶ Convert index to hugo names, if any names in the index are duplicated, remove duplicates
- Parameters:
- nameFrom: str, Default ‘alias’
Gene name type to convert from
- nameTo: str, Default ‘hugo’
Gene name type to convert to
Any parameters that function ‘mergeIndexDuplicates’ can accept
- Returns:
None
- Usage:
DCS = DigitalCellSorter.DigitalCellSorter()
DCS.convertIndex()
-
clean
()[source]¶ Clean pandas.DataFrame: validate index, remove index duplicates, replace missing with zeros, remove all-zero rows and columns
- Parameters:
None
- Returns:
None
- Usage:
DCS = DigitalCellSorter.DigitalCellSorter()
DCS.clean()
-
normalize
(median=None)[source]¶ Normalize pandas.DataFrame: rescale all cells, log-transform data, remove constant genes, sort index
- Parameters:
- median: float, Default None
Scale factor, if not provided will be computed as median across all cells in data
- Returns:
None
- Usage:
DCS = DigitalCellSorter.DigitalCellSorter()
DCS.normalize()
-
project
(PCAonly=False, do_fast_tsne=True)[source]¶ Project pandas.DataFrame to lower dimensions
- Parameters:
- PCAonly: boolean, Default False
Perform Principal component analysis only
- do_fast_tsne: boolean, Default True
Do FI-tSNE instead of “exact” tSNE This option is ignored if layout is not ‘TSNE’
- Returns:
- tuple
Processed data
- Usage:
DCS = DigitalCellSorter.DigitalCellSorter()
xPCA, PCs, tSNE = DCS.project()
-
cluster
()[source]¶ Cluster PCA-reduced data into a desired number of clusters
- Parameters:
None
- Returns:
None
- Usage:
DCS = DigitalCellSorter.DigitalCellSorter()
DCS.cluster()
-
annotate
(mapNonexpressedCelltypes=True)[source]¶ Produce cluster voting results, annotate cell types, and update marker expression with cell type labels
- Parameters:
- mapNonexpressedCelltypes: boolean, Default True
If True then cell types coloring will be consistent across all datasets, regardless what cell types are annotated in all datasets for a given input marker list file.
- Returns:
- dictionary
Voting results, a dictionary in form of: {cluster label: assigned cell type}
- Usage:
DCS = DigitalCellSorter.DigitalCellSorter()
results = DCS.annotate(df_markers_expr, df_marker_cell_type)
-
process
(dataIsNormalized=False, cleanData=True)[source]¶ Process data before using any annotation of visualization functions
- Parameters:
- dataIsNormalized: boolean, Default False
Whether DCS.df_expr is normalized or not
- Returns:
None
- Usage:
DCS = DigitalCellSorter.DigitalCellSorter()
DCS.process()
-
visualize
()[source]¶ Aggregate of visualization tools of this class.
- Parameters:
None
- Returns:
None
- Usage:
DCS = DigitalCellSorter.DigitalCellSorter()
DCS.process()
DCS.visualize()
-
makeProjectionPlotAnnotated
(**kwargs)[source]¶ Produce projection plot colored by cell types
- Parameters:
Any parameters that function ‘makeProjectionPlot’ can accept
- Returns:
None
- Usage:
DCS = DigitalCellSorter.DigitalCellSorter()
DCS.process()
DCS.makeProjectionPlotAnnotated()
-
makeProjectionPlotByBatches
(**kwargs)[source]¶ Produce projection plot colored by batches
- Parameters:
Any parameters that function ‘makeProjectionPlot’ can accept
- Returns:
None
- Usage:
DCS = DigitalCellSorter.DigitalCellSorter()
DCS.process()
DCS.makeProjectionPlotByBatches()
-
makeProjectionPlotByClusters
(**kwargs)[source]¶ Produce projection plot colored by clusters
- Parameters:
Any parameters that function ‘makeProjectionPlot’ can accept
- Returns:
None
- Usage:
DCS = DigitalCellSorter.DigitalCellSorter()
DCS.process()
DCS.makeProjectionPlotByClusters()
-
makeProjectionPlotsQualityControl
(**kwargs)[source]¶ Produce Quality Control projection plots
- Parameters:
Any parameters that function ‘makeProjectionPlot’ can accept
- Returns:
None
- Usage:
DCS = DigitalCellSorter.DigitalCellSorter()
DCS.process()
DCS.makeProjectionPlotsQualityControl()
-
makeMarkerSubplots
(**kwargs)[source]¶ Produce subplots on each marker and its expression on all clusters
- Parameters:
Any parameters that function ‘internalMakeMarkerSubplots’ can accept
- Returns:
None
- Usage:
DCS = DigitalCellSorter.DigitalCellSorter()
DCS.process()
DCS.makeMarkerSubplots()
-
makeAnomalyScoresPlot
(cells='All', suffix='', noPlot=False, **kwargs)[source]¶ Make anomaly scores plot
- Parameters:
- cells: pandas.MultiIndex, Default ‘All’
Index of cells of interest
Any parameters that function ‘makeProjectionPlot’ can accept
- Returns:
None
- Usage:
DCS = DigitalCellSorter.DigitalCellSorter()
DCS.process()
cells = DCS.getCells(celltype=’T cell’)
DCS.makeAnomalyScoresPlot(cells)
-
makeIndividualGeneTtestPlot
(gene, analyzeBy='label', **kwargs)[source]¶ Produce individual gene t-test plot of the two-tailed p-value.
- Parameters:
- gene: str
Name of gene of interest
- analyzeBy: str, Default ‘label’
What level of lablels to include. Other possible options are ‘label’ and ‘celltype’
Any parameters that function ‘makeTtestPlot’ can accept
- Returns:
None
- Usage:
DCS = DigitalCellSorter.DigitalCellSorter()
DCS.makeIndividualGeneTtestPlot(‘SDC1’)
-
makeIndividualGeneExpressionPlot
(genes, **kwargs)[source]¶ Produce individual gene expression plot on a 2D layout
- Parameters:
- gene: str, or list-like
Name of gene of interest. E.g. ‘CD4, CD33’, ‘PECAM1’, [‘CD4’, ‘CD33’]
- hideClusterLabels: boolean, Default False
Whether to hide the clusters labels
- outlineClusters: boolean, Default True
Whether to outline the clusters with circles
Any parameters that function ‘internalMakeMarkerSubplots’ can accept
- Returns:
None
- Usage:
DCS = DigitalCellSorter.DigitalCellSorter()
DCS.makeIndividualGeneExpressionPlot(‘CD4’)
-
makeHopfieldLandscapePlot
(meshSamplingRate=1000, plot3D=True, reuseData=False, **kwargs)[source]¶ Make and plot Hopfield landscape
- Parameters:
- meshSamplingRate: int, Default 1000
Defines quality of sampling around attractor states
- plot3D: boolean, Default False
Whether to plot 2D or 3D figure
- reuseData: boolean, Default False
Whether to attempt using precalculated data.
Any parameters that function ‘HopfieldLandscapePlot’ or ‘HopfieldLandscapePlot3D’ can accept
- Returns:
None
- Usage:
DCS = DigitalCellSorter.DigitalCellSorter() DCS.makeHopfieldLandscapePlot()
-
getAnomalyScores
(trainingSet, testingSet, printResults=False)[source]¶ Function to get anomaly score of cells based on some reference set
- Parameters:
- trainingSet: pandas.DataFrame
With cells to trail isolation forest on
- testingSet: pandas.DataFrame
With cells to score
- printResults: boolean, Default False
Whether to print results
- Returns:
- 1d numpy.array
Anomaly score(s) of tested cell(s)
- Usage:
DCS = DigitalCellSorter.DigitalCellSorter()
cutoff = DCS.getAnomalyScores(df_expr.iloc[:, 5:], df_expr.iloc[:, :5])
-
getHugoName
(gene, printAliases=False)[source]¶ Get gene hugo name(s).
- Parameters:
- gene: str
‘hugo’ or ‘alias’ name of a gene
- Returns:
- str
Hugo name if found, otherwise input name
- Usage:
DCS = DigitalCellSorter.DigitalCellSorter()
DCS.getHugoName(‘CD138’)
-
getExprOfGene
(gene, analyzeBy='cluster')[source]¶ Get expression of a gene. Run this function only after function process()
- Parameters:
- cells: pandas.MultiIndex
Index of cells of interest
- analyzeBy: str, Default ‘cluster’
What level of lablels to include. Other possible options are ‘label’ and ‘celltype’
- Returns:
- pandas.DataFrame
With expression of the cells of interest
- Usage:
DCS = DigitalCellSorter.DigitalCellSorter()
DCS.process()
DCS.getExprOfGene(‘SDC1’)
-
getExprOfCells
(cells)[source]¶ Get expression of a set of cells. Run this function only after function process()
- Parameters:
- cells: pandas.MultiIndex
2-level Index of cells of interest, must include levels ‘batch’ and ‘cell’
- Returns:
- pandas.DataFrame
With expression of the cells of interest
- Usage:
DCS = DigitalCellSorter.DigitalCellSorter()
DCS.process()
DCS.getExprOfCells(cells)
-
getCells
(celltype=None, clusterIndex=None, clusterName=None)[source]¶ Get cell annotations in a form of pandas.Series
- Parameters:
- celltype: str, Default None
Cell type to extract
- clusterIndex: int, Default None
Cell type to extract
- clusterName: str, Default None
Cell type to extract
- Returns:
- pandas.MultiIndex
Index of labelled cells
- Usage:
DCS = DigitalCellSorter.DigitalCellSorter()
DCS.process()
labels = DCS.getCells()
-
getIndexOfGoodQualityCells
(QCplotsSubDir='QC_plots', **kwargs)[source]¶ Get index of sells that satisfy the QC criteria
- Parameters:
- count_depth_cutoff: float, Default 0.5
Fraction of median to take as count depth cutoff
- number_of_genes_cutoff: float, Default 0.5
Fraction of median to take as number of genes cutoff
- mitochondrial_genes_cutoff: float, Default 3.0
The cutoff is median + standard_deviation * this_parameter
Any parameters that function ‘makeQualityControlHistogramPlot’ can accept
- Returns:
- pandas.Index
Index of cells
- Usage:
DCS = DigitalCellSorter.DigitalCellSorter()
index = DCS.getIndexOfGoodQualityCells()
-
getQualityControlCutoff
(se, cutoff, precut=1.0, mito=False, MakeHistogramPlot=True, **kwargs)[source]¶ Function to calculate QC quality cutoff
- Parameters:
- se: pandas.Series
With data to analyze
- cutoff: float
Parameter for calculating the quality control cutoff
- mito: boolean, Default False
Whether the analysis of mitochondrial genes fraction
- plotPathAndName: str, Default None
Text to include in the figure title and file name
- MakeHistogramPlot: boolean, Default True
Whether to make a histogram plot
Any parameters that function ‘makeQualityControlHistogramPlot’ can accept
- Returns:
- float
Cutoff value
- Usage:
DCS = DigitalCellSorter.DigitalCellSorter()
cutoff = DCS.getQualityControlCutoff(se)
-
getCountsDataframe
(se1, se2, tagForMissing='N/A')[source]¶ Get a pandas.DataFrame with cross-counts (overlaps) between two pandas.Series
- Parameters:
- se1: pandas.Series
Series with the first set of items
- se2: pandas.Series
Series with the second set of items
- tagForMissing: str, Default ‘N/A’
Label to assign to non-overlapping items
- Returns:
- pandas.DataFrame
Contains counts
- Usage:
DCS = DigitalCellSorter.DigitalCellSorter()
df = DCS.getCountsDataframe(se1, se2)
-
getNewMarkerGenes
(cluster=None, top=100, zScoreCutoff=None, removeUnknown=False, **kwargs)[source]¶ Extract new marker genes based on the cluster annotations
- Parameters:
- cluster: int, Default None
Cluster #, if provided genes of only this culster will be returned
- top: int, Default 100
Upper bound for number of new markers per cell type
- zScoreCutoff: float, Default 0.3
Lower bound for a marker z-score to be significant
- removeUnknown: boolean, Default False
Whether to remove type “Unknown”
Any parameters that function ‘makePlotOfNewMarkers’ can accept
- Returns:
None
- Usage:
DCS = DigitalCellSorter.DigitalCellSorter()
DCS.extractNewMarkerGenes()
-
classmethod
calculateV
(args)[source]¶ Calculate the voting scores (celltypes by clusters)
- Parameters:
- args: tuple
Tuple of sub-arguments
- df_M: pandas.DataFrame
Marker cell type DataFrame
- df_X: pandas.DataFrame
Markers expression DataFrame
- cluster_index: 1d numpy.array
Clustering index
- cutoff: float
Significance cutoff, i.e. a threshold for a given marker to be significant
- giveSignificant: boolean
Whether to return the significance matrix along with the scores
- removeLowQCscores: boolean
Whether to remove low quality scores, i.e. those with less than 10% of markers that a re supporting
- Returns:
- pandas.DataFrame
Contains voting scores per celltype per cluster
- Usage:
Function is used internally.
df = calculateV((df_M, df_X, cluster_index, 0.3, False, True))
-
annotateWith_pDCS_Scheme
(df_markers_expr, df_marker_cell_type)[source]¶ Produce cluster annotation results
- Parameters:
- df_markers_expr: pandas.DataFrame
Data with marker genes by cells expression
- df_marker_cell_type: pandas.DataFrame
Data with marker genes by cell types
- Returns:
tuple
- Usage:
Function should be called internally only
-
annotateWith_ratio_Scheme
(df_markers_expr, df_marker_cell_type)[source]¶ Produce cluster annotation results
- Parameters:
- df_markers_expr: pandas.DataFrame
Data with marker genes by cells expression
- df_marker_cell_type: pandas.DataFrame
Data with marker genes by cell types
- Returns:
tuple
- Usage:
Function should be called internally only
-
annotateWith_Hopfield_Scheme
(df_markers_expr, df_marker_cell_type)[source]¶ Produce cluster annotation results
- Parameters:
- df_markers_expr: pandas.DataFrame
Markers expression DataFrame
- df_marker_cell_type: pandas.DataFrame
Marker cell type DataFrame
- Returns:
tuple
- Usage:
Function should be called internally only
-
recordAnnotationResults
(df_marker_cell_type, df_markers_expr, df_L, df_V, dict_expressed_markers, df_null_distributions=None)[source]¶ Record cell type annotation results to spreadsheets.
- Parameters:
- df_marker_cell_type: pandas.DataFrame
Markers to cell types table
- df_markers_expr: pandas.DataFrame
Markers expression in each cluster
- df_L: pandas.DataFrame
Annotation scores along with other information
- df_V: pandas.DataFrame
Annotation scores along with other information
- dict_expressed_markers: dictionary
Dictionary of markers signigicantly expressed in each cluster
- df_null_distributions: pandas.DataFrame, Default None
Table with null distributions
- Returns:
None
- Usage:
This function is intended to be used internally only
-
propagateHopfield
(sigma=None, xi=None, T=0.2, tmax=200, fractionToUpdate=0.5, mode=4, meshSamplingRate=200, underlyingNetwork=None, typesNames=None, clustersNames=None, printInfo=False, recordTrajectories=True, id=None, printSwitchingFraction=False, path=None, verbose=0)[source]¶ Function is used internally to propagate Hopfield network over a set number of time steps
- Parameters:
- sigma: pandas.DataFrame, Default None
Markers expression
- xi: pandas.DataFrame, Default None
Marker cell type DataFrame
- T: float, Default 0.2
Noise (Temperature) parameter
- tmax: int, Default 200
Number of step to iterate through
- fractionToUpdate: float, Default 0.5
Fraction of nodes to randomly update at each iteration
- mode: int, Default 4
- Options are:
1: non-onthogonalized, non-weighted attractors 2: onthogonalized, non-weighted attractors 3: onthogonalized, weighted attractors 4: onthogonalized, weighted attractors, asymetric and diluted dynamics
- meshSamplingRate: int, Default 100
Visualization parameter to control the quality of the color mesh near the attractors
- underlyingNetwork: 2d numpy.array, Default None
Network of underlying connections between genes
- typesNames: list-like, Default None
Names of cell types
- clustersNames: list-like, Default None
Names or identifiers of the clusters
- printInfo: boolean, Default False
Whether to print detailes
- recordTrajectories: boolean, Default True
Whether to record trajectories data to files
- id: int, Default None
Identifier of this function call
- printSwitchingFraction: boolean, Default False
Whether to print fraction of clusters that switch theie maximum overlapping attractor
- path: str, Default None
Path for saving trajectories data
- Returns:
- 2d numpy.array
Overlaps
- Usage:
result = propagateHopfield(sigma=sigma, xi=df_attrs)
-
classmethod
convertColormap
(colormap)[source]¶ Convert colormap from the form (1.,1.,1.,1.) to ‘rgba(255,255,255,1.)’
- Parameters:
- colormap: dictionary
Colormap to convert
- Returns:
- dictionary
Converted colomap
- Usage:
DCS = DigitalCellSorter.DigitalCellSorter()
colormap = DCS.convertColormap(colormap)
-
classmethod
zScoreOfSeries
(se)[source]¶ Calculate z-score of pandas.Series and modify the Series in place
- Parameters:
- se: pandas.Series
Series to process
- Returns:
- pandas.Series
Processed series
- Usage:
DCS = DigitalCellSorter.DigitalCellSorter()
se = DCS.zScoreOfSeries(se)
-
classmethod
KeyInFile
(key, file)[source]¶ Check is a key exists in a HDF file.
- Parameters:
- key: str
Key name to check
- file: str
HDF file name to check
- Returns:
- boolean
True if the key is found False otherwise
- Usage:
DCS = DigitalCellSorter.DigitalCellSorter()
DCS.KeyInFile(‘df_expr’, ‘data/file.h5’)
-
getSubnetworkOfPCN
(subnetworkGenes, min_shared_first_targets=30)[source]¶ Extract subnetwork of PCN network
- Parameters:
- subnetworkGenes: list-like
Set of genes that the subnetwork should contain
- min_shared_first_targets: int, Default 30
Number of minimum first shared targets to connect two nodes
- Returns:
- pandas.DataFrame
Adjacency matrix
- Usage:
DCS = DigitalCellSorter.DigitalCellSorter()
df_subnetwork = DCS.getSubnetworkOfPCN(genes)
-
alignSeries
(se1, se2, tagForMissing)[source]¶ Align two pandas.Series
- Parameters:
- se1: pandas.Series
Series with the first set of items
- se2: pandas.Series
Series with the second set of items
- tagForMissing: str, Default ‘Missing’
Label to assign to non-overlapping items
- Returns:
- pandas.DataFrame
Contains two aligned pandas.Series
- Usage:
DCS = DigitalCellSorter.DigitalCellSorter()
df = DCS.alignSeries(pd.Index([‘A’, ‘B’, ‘C’, ‘D’]).to_series(), pd.Index([‘B’, ‘C’, ‘D’, ‘E’, ‘F’]).to_series())
-
createReverseDictionary
(inputDictionary)[source]¶ Efficient way to create a reverse dictionary from a dictionary. Utilizes Pandas.Dataframe.groupby and Numpy arrays indexing.
- Parameters:
- inputDictionary: dictionary
Dictionary to reverse
- Returns:
- dictionary
Reversed dictionary
- Usage:
DCS = DigitalCellSorter.DigitalCellSorter()
revDict = DCS.createReverseDictionary(Dict)
-
readMarkerFile
(mergeFunction='mean', mergeCutoff=0.25)[source]¶ Read markers file, prepare markers
- Parameters:
- mergeCutoff: str, Default ‘mean’
- Function used for grouping of the cell sub-types. Options are:
‘mean’: average of the values ‘max’: maxium of the values, effectively a logiacal OR function
- mergeCutoff: float, Default 0.25
Values below cutoff are set to zero. This option is used if mergeCutoff is ‘mean’
- Returns:
- pandas.DataFrame
Celltype/markers matrix
- Usage:
DCS = DigitalCellSorter.DigitalCellSorter()
df_marker_cell_type = DCS.readMarkerFile()
-
mergeIndexDuplicates
(df_expr, method='average', printDuplicates=False, verbose=1)[source]¶ Merge index duplicates
- Parameters:
- df_expr: pandas.DataFrame
Gene expression table
- method: str, Default None
- How to deal with index duplicates. Option are:
‘average’: average values of duplicates
‘first’: keep only first of duplicates, discard rest
- Returns:
- pandas.DataFrame
Gene expression table
- Usage:
DCS = DigitalCellSorter.DigitalCellSorter()
df_expr = DCS.mergeIndexDuplicates(df_expr)
-
recordExpressionData
()[source]¶ Record expression data from the internal HDF storage.
- Parameters:
None
- Returns:
None
- Usage:
DCS = DigitalCellSorter.DigitalCellSorter()
DCS.recordExpressionData()
-
loadAnnotatedLabels
(detailed=False, includeLowQC=True, infoType='label')[source]¶ Load cell annotations resulted from function ‘annotate’
- Parameters:
- detailed: boolean, Default False
Whether to give cluster- or celltype- resolution data
- includeLowQC: boolean, Default False
Whether to include low quality cells in the output
- Returns:
pandas.Series
- Usage:
DCS = DigitalCellSorter.DigitalCellSorter()
DCS.loadAnnotatedLabels()
-
loadExpressionData
()[source]¶ Load processed expression data from the internal HDF storage.
- Parameters:
None
- Returns:
None
- Usage:
DCS = DigitalCellSorter.DigitalCellSorter()
DCS.loadExpressionData()
-
prepareMarkers
(expressedGenes=None, createColormapForCelltypes=True)[source]¶ Get dictionary of markers for each cell types.
- Parameters:
- expressedGenes: pandas.Index, Default None
If not None then the marker DataFrame will be intersected with this index, i.e. all non-expressed genes will be filtered from the marker file
- createColormapForCelltypes: boolean, Default True
Create (or update) a colormap for cell types based on a marker-celltype matrix. This will make coloring of cell clusters consistent across all plots.
- Usage:
DCS = DigitalCellSorter.DigitalCellSorter()
DCS.prepareMarkers()
-
calculateQCmeasures
()[source]¶ Calculate Quality Control (QC) measures
- Parameters:
None
- Returns:
None
- Usage:
DCS = DigitalCellSorter.DigitalCellSorter()
DCS.calculateQCmeasures()
-
qualityControl
(**kwargs)[source]¶ Remove low quality cells
- Parameters:
None
- Returns:
Any parameters that function ‘getIndexOfGoodQualityCells’ can accept
- Usage:
DCS = DigitalCellSorter.DigitalCellSorter()
DCS.qualityControl()
-
batchEffectCorrection
(method='COMBAT')[source]¶ Batch effect correction.
- Parameters:
- method: str, Default ‘COMBAT’
Stein, C.K., Qu, P., Epstein, J. et al. Removing batch effects from purified plasma cell gene expression microarrays with modified ComBat. BMC Bioinformatics 16, 63 (2015)
- Returns:
None
- Usage:
DCS = DigitalCellSorter.DigitalCellSorter()
DCS.batchEffectCorrection()