The downloaded raw MicroArray data were processed with R (version 2.6.1) and BioConductor to get seed preferential probesets which were translated into gene loci thereafter. The details are:
  1. Normalization of raw MicroArray data
  2. RMA method was used to normalize the MicroArray data of Arabidopsis and rice. RMA result is log2 value. We use 2^(RMA value) for the intensity.

  3. Method to deal with duplicate samples
  4. We use the median instead of mean of intensity of duplicate samples because the former is more tolerant to the abnormal deviation of a single sample and can be a better representative for duplicate samples.

  5. Background subtraction
  6. The intensities of a probeset in all samples are larger than zero. But the intensities smaller than 50 are not the real indication of expression. So we substract the intensities in all samples with the median of intensities smaller than 50. For probesets whose intensities in all samples are larger than 50, we substract their intensities with the mean of all medians of the probesets which have intensities smaller than 50.

  7. Get samples with maximum expression value within each kind of tissue
  8. We devided all 79 Arabidopsis samples into eight types according to Schmid et al (2005) [For more details, see Arabidopsis sample description]:

    	root (7 samples)
    	stem (3 samples)
    	leaf (17 samples)
    	whole plant (11 samples)
    	apex (11 samples)
    	flower (12 samples)
    	floral organ (10 samples)
    	seed (8 samples)

    We devided the 15 Rice samples (Jain et al., 2007) into five types [For more details, see Rice Sample Description]:

    	root (1 sample)
    	leaf (2 samples)
    	SAM (1 sample)
    	inflorescence (6 samples)
    	seed (5 samples)
    For each type of tissue, we selected the sample with maximum expression intensity, which was used for next step analysis.

  9. Screening probesets dominantly present in seeds
  10. We used fold-change method to find the probesets that are dominantly present in seeds. The Criterions are:

    	max (seed group) / max (each nonseed group) >= 4     [A]
    	max (seed group) / max (all nonseed groups) >= 1/4   [B]

    There are seven max (seed group) / max (nonseed group) values (seed/root, seed/stem, ..., seed/floral organ) for each probeset. We selected those probesets which meet criterion B and have five, six or seven of the seed/nonseed ratios meet criterion A.

    So the selected probesets have larger intensity in at least one seed sample than in at least five nonseed sample groups. In some cases the intensity of a probeset in seed samples may be smaller than in nonseed samples, but it is ensured not smaller than one quarter of the maximum of that of nonseed samples.

  11. Presence/absence detection
  12. We performed MAS 5.0 Absolute Detection using mas5calls implemented in affy package to detect presence/absence of each probeset. If two or three of three duplicates of a probeset is present, the probeset is considered present. Those probesets detected as absent in all seed samples are excluded from further analysis.


    Schmid M, Davison TS, Henz SR, Pape UJ, Demar M, Vingron M, Schölkopf B, Weigel D, Lohmann J. (2005) A gene expression map of Arabidopsis development. Nature Genetics. 37: 501-506. [PMID: 15806101]

    Jain M, Nijhawan A, Arora R, Agarwal P, Ray S, Sharma P, Kapoor S, Tyagi AK, Khurana JP. (2007) F-box proteins in rice. Genome-wide analysis, classification, temporal and spatial gene expression during panicle and seed development, and regulation by light and abiotic stress. Plant Physiol. 143(4): 1467-83. [PMID: 17293439]