Chapter 5

Exercise Solution for 5.1

This exercise asks us to interpret and validate the consistency within our clusters of data. To do this, we will employ the silhouette index, which gives us a silhouette value measuring how similar an object is to its own cluster compared to other clusters. The silhouette index is as follows: \[\displaystyle S(i) = \frac{B(i) - A(i)}{max_i(A(i), B(i))} \] The book explains the equation by first defining that the average dissimilarity of a point \(x_i\) to a cluster \(C_k\) is the average of the distances from \(x_i\) to all of the points in \(C_k\).

Chapter 5 vocabulary quiz

The vocabulary quiz will be live here during the start of the course. Loading…

Vocabulary for Chapter 5

Chapter 5 covers Clustering Analysis for large scale data anlysis like DNA/RNA sequencing outputs. These methods produce so much data that more unbiased approaches are required when attempting to make correlations. unsupervised method A learning method where all variables are treated with the same status, rather than one variable being considered as an outcome or target. status A variable’s classification as an outcome/predictor (e.