Release 4a.53: Summary of Intentions
Projected release date: April 29, 2009
Below is a list of intended features that we hope to make available on MaizeSequence.org on the 29th of April as part of Release 4a.53. While we make a best effort to deliver these features, they may be postponed or otherwise cancelled. We will post a notice on the homepage with an update if the release is delayed significantly.
Sequences
Below are the sequence data sets that will be available as part of the release and will be integrated in various components of the site:
- New and updated BAC clones. The freeze will include approximately 500 BAC clones that are either updated versions of previously available clones or new clones that have not been previously published on our site. As the sequencing project winds down, these clones are mostly gap-filling sequences within FPC contigs.
- Maize Golden Path (AGP) v1. The site will provide a first attempt at a sequence-based maize assembly. The assembly is defined by a golden path of BAC clone accessions and provides a non-overlapping representation of the maize genome.
- Mo17 vs. B73 sequence variation, Release 1. Approximately 1 million single feature polymorphisms were identified by the DOE Joint Genome Institute by comparing Mo17 shotgun (Roche 454) reads to the B73 accessioned BAC clones. We provide a mirror to the data hosted primarily on Phytozome/Maize.
Annotations
Features computed by the standard annotation pipeline, such as MIPS repeats and cereal alignments will be available for old as well as new BAC clones that define the minimal tiling path of the maize genome. The following are new annotation data sets that are being generated for the release:
- Evidence-based gene predictions. Gramene GeneBuilder pipeline uses same- and cross-species full-length cDNA, EST, and proteins. Supporting evidence, such as cDNA alignments, will be displayed. The evidence-based gene build will not be run on the new clones.
- Ab initio gene predictions. Gene models computed with Fgenesh using a monocot training set over masked sequence.
- Working Gene Set. Set of genes used for protein-based annotations. Includes best representatives among Fgenesh and GeneBuilder genes.
- Filtered Gene Set. Excludes genes classified as transposon-encoded, pseudogenes, and genes without homology to other species.
- Functional annotations. All genes made available with the release will be annotated with protein domains. Using the Ensembl Xref system, InterPro as well as GO and SLIM terms will also be associated with the genes.
- Canonical locus names. Gene predictions in the working gene set will be mapped to known locus names (e.g., bz2, sh1, ra3).
- Repeat Annotations. In addition to analyzing the maize genome with RepeatMasker with the MIPS/REcat library, low-complexity filters such as Dust and TRF (Tandem Repeat Finder) will be used to compute low-complexity features.
Web Enhancements
Below are design changes we intend to make to the site itself, and to how the maize genome is visualized and accessed.
- Ensembl version 53. Upgrading from Ensembl version 50 to version 53 includes several fundamental changes to genome views, more intuitive navigability, intuitive menus and tabs, and increased performance. More info about the design changes here.
- Accessioned Golden Path Browser. With the integration of the maize accessioned golden path (AGP), genome navigation will become unified. We have been careful to this point to separate the physical map from the sequence-based BAC clone maps, primarily to prevent unintended inference of order and orientation of annotations between neighboring sequence contigs. With the first version of a pseudochromosome assembly, larger regions (millions of bases) can be viewed in the same context along with their anchored annotations. We also hope to enable visual differentiation between assembled contigs that are still unordered from ordered scaffolds.
- Clone and gene archive. While we make a best effort to map identifiers of genes and markers from one build to the next, it is difficult, and oftentimes misleading and inappropriate to do so. We are working on providing mirrors to previous archived releases of the data. One feature is a Where's My Gene? widget that will try to find your gene of interest in the current build, and, in case it does not, give you full information about the gene so that you may at least try to find it yourself via BLAST, for example. The FTP site retains full dumps of sequences and gene sets from the previous few releases.
- New DAS tracks. Remotely-hosted data tracks will be visualized on standard genome views via the DAS (Distributed Annotation System) protocol. One example is PlantGDB Unique Transcripts (PUTs).
If you have any questions, concerns, or general feedback about any of the above, do not hesitate to contact us.

