Cloning of the entire set of an organism's protein-coding open reading frames (ORFs), or 'ORFeome', is a means of connecting the genome to downstream 'omics' applications. Here we report a proteome-scale study of the fission yeast Schizosaccharomyces pombe based on cloning of the ORFeome. Taking advantage of a recombination-based cloning system, we obtained 4,910 ORFs in a form that is readily usable in various analyses. First, we evaluated ORF prediction in the fission yeast genome project by expressing each ORF tagged at the 3' terminus. Next, we determined the localization of 4,431 proteins, corresponding to approximately 90% of the fission yeast proteome, by tagging each ORF with the yellow fluorescent protein. Furthermore, using leptomycin B, an inhibitor of the nuclear export protein Crm1, we identified 285 proteins whose localization is regulated by Crm1.
In this review, we present an overview of the Gene Ontology (GO) structure and describe how the GO is implemented for Sz. pombe and made available via Sz. pombe GeneDB (http://www.genedb.org/genedb/pombe/). We give a detailed progress report of Sz. pombe GO annotation, providing the current status of both manual and automatic annotations. Fission yeast has at least one GO annotation for 98.3% of its genes (excluding annotations to 'unknown' terms), greater than the current percentage coverage for any other organism. Approximately 65% (3225 gene products) have at least one annotation to each of the three ontologies (biological process, cellular component and molecular function). Approximately 30% (1443 gene products) have GO terms derived directly from small-scale experiments in fission yeast, supporting the validity of fission yeast as a model eukaryote and a reference organism.
We have sequenced and annotated the genome of fission yeast (Schizosaccharomyces pombe), which contains the smallest number of protein-coding genes yet recorded for a eukaryote: 4,824. The centromeres are between 35 and 110 kilobases (kb) and contain related repeats including a highly conserved 1.8-kb element. Regions upstream of genes are longer than in budding yeast (Saccharomyces cerevisiae), possibly reflecting more-extended control regions. Some 43% of the genes contain introns, of which there are 4,730. Fifty genes have significant similarity with human disease genes; half of these are cancer related. We identify highly conserved genes important for eukaryotic cell organization including those required for the cytoskeleton, compartmentation, cell-cycle control, proteolysis, protein phosphorylation and RNA splicing. These genes may have originated with the appearance of eukaryotic life. Few similarly conserved genes that are important for multicellular organization were identified, suggesting that the transition from prokaryotes to eukaryotes required more new genes than did the transition from unicellular to multicellular organization.