SENSORY TESTING -- A STATISTICIAN'S APPROACH 215 which a total set of n+m samples, n being of one kind and m being of the other, are given to the panel members to sort into the two categories. Table I presents the probabilities for such tests. Table I only records half of the probabilities, but it is to be noted that (n -5 m) -- (m -5 n) . The diagonal entries printed in brackets give the pro- babilities for those cases where the total set of samples is divided into two equal groups they also refer to the case when the panel member is not required to identify either of those groups in any way. Should the panel member be asked to identify one set as being (for example) the stronger perfume, then those probabilities should be halved. Efficiency of designs for assessing whether detectable differences exist may be assessed by finding the lowest probability for a particular total number of samples. The most efficient designs for discrimination are to be found when n-m = 1 the reader may check this for himself from Table I. It should also be noted that the probability associated with a test in which n=m is the same as that for n+m where m=n-1, e.g. the probability of discrimination on the null hypothesis for a 2-52 test is «, which is exactly the same as for a 2-51 test. However, it is to be noted that the conduct of a test in which n m is complicated by the fact that the experimenter has to make a decision about which of the two preparations is allocated to n and which to m. This is not a trivial decision because experience shows that in these two cases, the probability of discrimination may not be the same. In triangle (2-51) tests of flavour, for example, in which peppermint oils are being compared, it has been shown that it is easier to detect a stronger flavour if it is being compared with two weaker flavours than vice-versa. For this reason all possible orderings should be tested by far the simplest way of achieving this is to give the panel member equal numbers of each sample to test in whatever order he or she finds most simple to discriminate. Moreover, as we shall see, the testing procedure is frequently extended to involve questions of preference under such circumstances it can also be observed that the unbalanced designs (where n g=m) appear to bias or at least to have an effect upon the preference judgements preferences in some cases appear to go in favour of the larger number, and in other cases in favour of the smaller number by margins that cannot be due to chance. It appears desirable to eliminate this complication by making n =m. Let us therefore examine more closely those balanced test designs. The 1-51 test is clearly a nonsense we can always correctly divide up A from B without this telling us anything, hence the probability of correctly dividing
216 JOURNAL OF THE SOCIETY OF COSMETIC CHEMISTS by chance is 1.0, i.e. complete certainty. However, with 2+2, we have one chance in three of correctly putting the two A's together and the two B's together. It will, of course, be noted that with the symmetrical designs, it does not matter which of the two sets we nominate as A and which we nominate as B if there is one A amongst the B's, then there will clearly be one B amongst the A's at the end of the sorting procedure. Table II demon- strates the procedure. Table II. Balanced sorting designs (n=m) Probabilities of errors on random sorting hypothesis Number 0 of 1 errors 2 (e) 3 n 1 2 3 4 5 6 1.000 0.333 O. 100 0.029 0.008 0.002 (0) 0.667 0.900 0.457 0.198 0.078 0.514 0.794 0.487 0.433 From this table it can be seen that a 4+4 design is the smallest sized design which provides for an individual judge to show significant dis- crimination at a probability level of 1 in 20 (p=0.029). However, if one goes up as far as the 6+6 design, not only can we demonstrate that com- plete correct division with no errors is most unlikely on the hypothesis of random selection (p=0.002), but it is also possible to consider a second grade of response in which there is only one error (i.e. five identical samples and one misplaced sample in a set of six and this still has a low probability of occurrence (p=0.078) on the assumption of pure random choice. As we shall go on to show, the 6+6 form of test has a number of con- venient features for the making of odour comparisons, and these are largely concerned with the evidence about the detectability of differences by individual panel members. However, the statistician has to examine not only the probability of the panel member making a correct selection entirely by chance (known to the statistician as a Type 1 error) but also the probability of the panel member making an incorrect selection even though he is able to discriminate (known as a Type 2 error). However, in order to do this, it is necessary to consider the mechanism by which the panel member is thought to make his selection. The most reasonable basic concept is one proposed by Gridgeman (8)
Previous Page Next Page