JOURNAL OF COSMETIC SCIENCE 354 was to group the 286 subjects into specifi c partitions based on their unprotected MED and SPF values. Only similar subjects would be present within each partition (also called a cluster), and subjects from each partition would have statistically sig- nifi cant different MED and SPF values versus subjects from different partitions. To pr oduce the clusters, the following steps were performed: Step 1: Specify the numbers of clusters k. Randomly choose k subjects and declare them as the centers of each of the k clusters. These centers are also known as “centroids,” and the value of each centroid is the average of all subjects (based on their unprotected MED and SPF values) within that specifi c cluster. Step 2: Calculate the distance metric between each cluster centroid and all other data points (i.e., all other subjects) within the data set. The distance metric is the Euclidean distance between two vectors in a Euclidean space, with each vector representing a unique subject. Step 3: Assign each subject to the cluster centroid whose distance metric is the least of all the cluster centroids. Each subject should be assigned to exactly one of the k clusters, and no subject should share multiple clusters. Step 4: With all subjects now inside the k clusters, recalculate the new cluster centroids by calculating the average of all subjects within each cluster. Step 5: Repeat steps 2–4, now with a newly calculated cluster centroid for each cluster. Repeat this process until the cluster centroids remain unchanged despite further iterations. When this occurs, no more subjects should be reassigned to a new cluster. Step 1 of the k-means cluster analysis required a prespecifi ed value for the number of clus- ters k. Rather than selecting an arbitrary value, an optimal value for k was determined using the silhouette method (7). This method examined a range of possible values for k. For each of these possible values, an average silhouette width was calculated. The value of k with the largest average silhouette width was selected as the optimal cluster size for the analysis. To ca lculate the average silhouette width, the following steps were performed: Step 1: Perform the k-means cluster analysis for each of the possible values of k. A range of 1–10 clusters was tested. Step 2: Select a subject within one of the k clusters. Any subject within the cluster can be used as a starting point. Step 3: Calculate the average distance metric between the selected subject and all other subjects within the cluster. This will be the within-cluster distance. Step 4: Calculate the average distance metric between the selected subject and all subjects in a neighboring cluster. This will be the between-cluster distance. If there is more than one neighboring cluster, calculate the between-cluster distance for the remaining clusters. Step 5: Compare the within-cluster distance (Step 3) with the smallest between-cluster distance (Step 4). Calculate the difference between the two values, and then divide the difference by the largest of the two values. This will produce the silhouette width, with a value ranging from –1 to 1. Step 6: Repeat steps 2–5 by selecting another subject within the same cluster mentioned in step 2. This process will repeat until all subjects within the cluster are selected. Step 7: Calculate the average of all silhouette widths that were calculated in step 6. This is the average silhouette width for the cluster value k. If th e clustering algorithm performed well, the within-cluster distance will be small and the between-cluster distance will be large. An average silhouette width of 1 will indicate
SUNSCREEN TESTING BIAS 355 the most appropriate clustering, and an average silhouette width of –1 will indicate the poorest clustering performance. After the k-means cluster analysis partitioned each subject into specifi c groups, descrip- tive statistics were calculated within each cluster. In addition, hypothesis testing using Welch’s unequal variance t-test was performed to test for any statistically signifi cant dif- ferences between the clusters. Statistical signifi cance was achieved at the 95% confi dence level (p 0.050). Finally, subjects with extreme SPF values were evaluated within each cluster and reported. STATI STICAL SOFTWARE Stati stical software R (version 3.6.1 for Microsoft Windows R Foundation for Statistical Computing, Vienna, Austria) was used for all data analyses (8). In addition to the base package preinstalled with software, the packages “tidyverse,” “cluster,” “factoextra,” and “ggplot2” were also used for the cluster analysis and for graphical plots. RESUL TS SELEC TION BIAS Aleja ndria et al. (5) reported Pearson’s product-moment correlation coeffi cient of –0.409 when evaluating the relationship between an observation’s unprotected MED value and the resulting SPF value (n = 2,503) (Figure 1). In addition, the trend line from the regres- sion analysis had an intercept of 18.579 and a slope of –0.155. By comparison, the cor- relation coeffi cient of the subject-specifi c data (n = 286 subjects) revealed a correlation coeffi cient of –0.478, and the trend line from the regression analysis revealed an intercept of 18.098 and a slope of –0.116 (Figure 3). Befor e performing the k-means cluster analysis, the silhouette method calculated 10 average silhouette widths, one for each of the possible values of k. The average silhouette widths ranged from 0.000 to 0.395. The largest average width of 0.395 was associated with a cluster size of 2, and the second largest average width of 0.341 was associated with a cluster size of 6. Using the optimal cluster amount suggested by the silhouette method, the k-means cluster analysis revealed two groups of subjects sharing similar unprotected MED and SPF values. These two clusters—labeled as “high SPF” and “low SPF”—had sample sizes of 153 and 133, respectively (Figure 4). When comparing the two clusters, the “high SPF” cluster revealed a statistically signifi cantly greater average SPF value (as well as a lower average unprotected MED value) than the “low SPF” cluster (p 0.001). For the “high SPF” cluster, the average SPF value was 16.314 and the average unprotected MED value was 18.366. For the “low SPF” cluster, the average SPF value was 14.708 and the average unprotected MED value was 25.671. SUBVE RSION BIAS To fu rther characterize the potential impact of a subject’s MED response on SPF results, subjects with extreme SPF values were evaluated within each cluster. In the original
Previous Page Next Page