Information on the Soybean data

Soybeans are of great commercial value all over the world. The data set available here has been widely analysed before. In particular, as a three-way data set by

Kroonenberg, P. M., & Basford, K. E. B. (1989). An investigation of multi-attribute gentype response across environments using three-mode principal component analysis. Euphytica, 44, 109-123.

The most comprehensive analyses of these data as well as experimental details can be found in

Basford, K. E., & Tuckey, J. W. (1999). Graphical analysis of multiresponse data illustrated with a plant breeding trial. Boca Raton, FL: Chapman & Hall/CRC.

Individual: Soybeans

The 58 different lines or genotypes of soybeans consisted of 43 lines (40 local Australian selections from a cross, their two parents, and one other which was used a parent in earlier trials) and 15 other lines of which 12 were from the US.

The Line 1-40 were local Australian selections from Mamloxi (CPI 172) and Avoyelles (CPI 15939) .
41.Avoyelles (CPI 15939)Tanzania
42.Hernon 49 (CPI 15948)Tanzania
43.Mamloxi (CPI 172) Nigeria
44.Dorman USA
45.Hampton USA
46.Hill USA
47.Jackson USA
48.Leslie USA
49.Semstar Australia
50.Wills USA
51.C26673 Morocco
52.C26671 Morocco
53.Bragg USA
54.Delmar USA
55.Lee USA
56.Hood USA
57.Ogden USA
58.Wayne USA

Variables or attributes

4.Seed size
5.Protein percentage
6.Oil percentage


Measurement are available from 4 locations in Queensland, Australia in two consecutive years.
1.Lawes 1970
2.Brookstead 1970
3.Nambour 1970
4.Redland Bay1970
5.Lawes 1971
6.Brookstead 1971
7.Nambour 1971
8.Redland Bay1971

Data arrangement

A 3-way data array X = (x(i,j,k)) has the following form
                   |-----|    |i=2
              |-----|    |    |..
              |     |    |    |..
              |     |    |____|i=I=58    k=K=8
              |     |____|            k=2
              |_____|              k=1



which translates to the data set itself as

            |     |i=2
            |     |..     k= 1
            |     |..
            |     |i=2
            |     |..     k= 2
            |     |..
            |     |i=2
            |     |..     k= 8
            |     |..


Thus the first mode (i) is nested in the third mode (k) and there are 58 (genottypes/lines) times 6 (variables/attributes) rows and 8 (environments) columns.

Data handling

In the Kroonenberg & Basford analyses, the data were centred per column (attribute) x(j,k) so that the columns were in deviation of their mean. After this the lateral slices (attributes) Xj were normalised to sum of squares equal to 1, so that the total sum of squares of the preprocessed data was 8.

Note on the data in Basford and Tukey's book.
Appendix 2: The values for line 58 for Nambour 1970 and Redland Bay 1971 are incorrectly listed on page 477 as 20.490 and 15.070. They should be 17.350 and 13.000, respectively. In the data set made available here, these values have been corrected.

All other data sets have been checked and are correct as listed.

