This exercise asks us to interpret and validate the consistency within our clusters of data. To do this, we will employ the silhouette index, which gives us a silhouette value measuring how similar an object is to its own cluster compared to other clusters.
The silhouette index is as follows:
The book explains the equation by first defining that the average dissimilarity of a point to a cluster is the average of the distances from to all of the points in .
The vocabulary quiz will be live here during the start of the course.
Loading…
Chapter 5 covers Clustering Analysis for large scale data anlysis like DNA/RNA sequencing outputs. These methods produce so much data that more unbiased approaches are required when attempting to make correlations.
unsupervised method A learning method where all variables are treated with the same status, rather than one variable being considered as an outcome or target. status A variable’s classification as an outcome/predictor (e.