Cara menangani data hierarkis / bersarang dalam pembelajaran mesin Show
Saya akan menjelaskan masalah saya dengan sebuah contoh. Misalkan Anda ingin memprediksi penghasilan seseorang yang diberikan beberapa atribut: {Usia, Jenis Kelamin, Negara, Wilayah, Kota}. Anda memiliki dataset pelatihan seperti itu train <- data.frame(CountryID=c(1,1,1,1, 2,2,2,2, 3,3,3,3), RegionID=c(1,1,1,2, 3,3,4,4, 5,5,5,5), CityID=c(1,1,2,3, 4,5,6,6, 7,7,7,8), Age=c(23,48,62,63, 25,41,45,19, 37,41,31,50), Gender=factor(c("M","F","M","F", "M","F","M","F", "F","F","F","M")), Income=c(31,42,71,65, 50,51,101,38, 47,50,55,23)) …
characterizing spatial gene expression heterogeneity in spatially resolved single-cell transcriptomics data with nonuniform cellular densities View the Project on GitHub JEFworks-Lab/MERINGUE In order to cluster genes that mark similar spatial patterns in space as well as infer evidence of cellular communication between spatially co-localized cell-types, MERINGUE computes a spatial cross-correlation statistic. In this tutorial, we will explore the distinction between this spatial cross-correlation statistic compared to a general (spatially-unaware) cross-correlation using simulations.
Simulate cells in spaceFirst, let’s simulate some cells in space. Each point here is a cell. Their location in the plot can be interpreted as their physical location in space.
Next, let’s simulate various gene expression patterns to highlight different scenarios that will help highlight the distinction spatial cross-correlation versus general (spatially-unaware) cross-correlation. Scenario 1: General cross-correlation and spatial cross-correlation suggest similar trendsFirst, let’s consider two genes,
If we plot the expression of
Likewise, if we compute a spatial cross-correlation statistic between
In this case, both the general and spatial cross-correlation statistics are positive. Scenario 2: General cross-correlation and spatial cross-correlation suggest different trendsNow, let’s consider different two genes,
Now, if we plot the expression of
However, if we compute a spatial cross-correlation statistic between
In this case, even though the general cross-correlation statistic is negative, the spatial cross-correlation statistic is positive. This
distinction is particularly important when we consider how transcriptionally-distinct cell-types and subtypes may be interacting with each other in space. For example, consider if Computing an inter-cell-type spatial cross-correlationNow, let’s call cells that expression
Now, instead of considering all neighbors,
because we can see two transcriptionally distinct but spatially intertwined cell-types in our data, let’s only consider neighbor-relationships between cells of cell-type A and cells of cell-type B. We can acheive this by modifying the binary weight matrix used in the spatial cross-correlation statistic calculation to include only neighbor-relationships between the two cell-types (as opposed to within each cell-type). And indeed, we see a very high inter-cell-type spatial cross-correlation - that
is, cells of cell-type A that express higher levels of
Indeed, if Scenario 3: Neither general cross-correlation nor spatial cross-correlationLastly, let’s consider two genes,
We observe no significant cross-correlation relationships between the two genes.
And also no significant spatial cross-correlation in this case.
Despite neither gene showing any spatial or general cross-correlation relationship between them, both genes can and do still exhibit high spatial auto-correlation in this example. In summary, as these various simulated gene expression patterns highlight, spatial cross-correlation and autocorrelation can provide complementary information to general correlation analyses to enable the identification of potentially interesting spatial patterns indicative of cellular communication. |