This exercise asks us to interpret and validate the consistency within our clusters of data. To do this, we will employ the silhouette index, which gives us a silhouette value measuring how similar an object is to its own cluster compared to other clusters.
The silhouette index is as follows:
\[\displaystyle S(i) = \frac{B(i) - A(i)}{max_i(A(i), B(i))} \]
The book explains the equation by first defining that the average dissimilarity of a point \(x_i\) to a cluster \(C_k\) is the average of the distances from \(x_i\) to all of the points in \(C_k\).
The vocabulary quiz will be live here during the start of the course.
Loading…
Chapter 5 covers Clustering Analysis for large scale data anlysis like DNA/RNA sequencing outputs. These methods produce so much data that more unbiased approaches are required when attempting to make correlations.
unsupervised method A learning method where all variables are treated with the same status, rather than one variable being considered as an outcome or target. status A variable’s classification as an outcome/predictor (e.