OMAP Banner
  AGI CSHL Purdue University AGCoL NCGR
  NSF Proposal Details
  Project Status
NSF Proposal Details
Project Summary
Project Description
  - Results from prior funding
  - Introduction
  - Objectives
  - Experimental Plan
  - Agarose Fingerprinting
  - BAC End Sequencing
  - Informatics
  - Detailed Reconstruction of Rice Chr 1, 3, 10
  - Roles of Project Personnel
  - Establish and Maintain an Experimental Advisory Committee
  - References
Phylogeny of the Genus Oryza
Oryza BAC Library projects
Education and Outreach opportunities with OMAP

Project Summary Top

Scientific Objectives:
The long term goal of our collaboration is to develop an experimentally tractable and closed model system to globally unravel and understand the evolution, physiology and biochemistry of the genus Oryza. The specific objectives of this proposal are to: 1) Construct DNA fingerprint/BAC-end sequence physical maps from 11 deep coverage BAC libraries that represent the 11 wild genomes of Oryza (830,000 fingerprints; 1,659,000 BAC ends);
2) align the 11 physical maps with the sequenced reference subspecies japonica and indica;
3) construct high-resolution physical maps of rice chromosomes 1, 3 and 10 across the 11 wild genomes using a combination of hybridization and in silico anchoring strategies, and;
4) provide convenient bioinformatics research and educational tools (FPC and web-based) to rapidly access and understand the collective Oryza genome.

Broader Impacts:
The research proposed will provide the first ever closed experimental system to understand the evolution, physiology and biochemistry of a single genus in plants or animals. We will align representatives of eleven wild genomes of rice, including both diploids and tetraploids, to the sequenced and finished O. sativa ssp. japonica AA diploid genome. Such a system will empower the scientific community to address complex scientific questions on a whole genome scale. For example, one would be able to determine the majority of genome rearrangements leading to the present day wild species as compared with the sequenced cultivated rice. Such data can be used to study the dynamics of the evolution of a genus and the impacts of domestication. Another example is that one could move vertically across genomes to explore the diversity and evolution of disease resistance gene clusters and their cis regulatory elements. Such data could be used to rapidly identify new and useful disease resistance genes as well as to define conserved regulatory sequences. This research will not only impact rice genomics but will be useful for understanding monocot biology in general and can serve as a model to establish similar systems in both plants and animals.

Project Description
  - Results from prior funding Top

The PIs have received continuous NSF support since 1994. Wing is PI/Co-PI on 11 (3 of which will expire in September 2003; 2 of which all experimental objective will have been completed); Jackson on 1 and Stein on 1 active NSF Plant Genome grants. Here we will describe our results from prior support of grants directly related to the proposed work.

C.1.1) Title: “Sequencing Rice Chromosomes 3 and 10”: PI R. Wing, Co-PIs D. McCombie, R. Wilson, C. Soderlund, J. Jiang. NSF Award# DBI-9982594 (09/15/99-09/14/2002); NSF Supplemental # (09/15/02-09/14/03); USDA-CSREES award# 99-3517-8505 (09/15/99-09/14/2002); USDA Supplemental award# (09/15/02-09/14/03); DOE Supplemental award# (09/15/02-09/14/03). Award amount: $5,100,000 ($2,550,000 NSF & $2,550,000 USDA) Supplemental Award amount $1,100,000 ($500k NSF, $500k USDA, $100k DOE).
     The overall objective of this 4 year project was to finish and annotate rice chromosomes 10 and 3. Our consortia ACWW 1 (Arizona Genomics Institute [Wing, Yu, Soderlund], Cold Spring Harbor Laboratory [McCombie, Stein], Washington University Genome Sequencing Center [Wilson] and the University of Wisconsin [Jiang]) was assigned the top half of chromosome 10 (Mb) and the short arm of chromosome 3 (Mb), while TIGR (Buell) was assigned the lower half of ch10 and the long arm of ch3. Ch10 is finished and published in Science in June 2003 (The Rice Chromosome 10 Sequencing Consortium: Yu et al. 2003). Although 10 is the smallest of the rice chromosomes (22.4 Mb) it is also one of the most heterochromatic, especially the top 12 Mb. Therefore it was a difficult chromosome to finish however, in the end, ACWW finished all but 1 of its BACs without any sequencing gaps to a standard of 1 error in 10,000 bases or more.
     We are now focusing our efforts on finishing our region of chromosome 3. As an interim goal, ACWW and TIGR finished a 10X BAC-by-BAC Phase 2 draft of ch3 in December 2002 (IRGSP 2002). For the ACWW region, all sequence is contiguous, representing 16.7 Mb of the rice genome, except for 3 physical gaps between FPC contigs. To date 112 BACs are finished and are in annotation, 5 BACs are in the Phase 2 stage (3 in 1 contig) and 4 BACs are in phase 1 stage. We therefore do not anticipate any delays in achieving the goal of finishing and annotating rice chromosome 3 by September 15, 2003 if not earlier. This is especially true because the remaining genomic sequence is from the more euchromatic region of the ch3. ACWW sequencing status can be viewed for all chromosomes at, which has link to the sequence location in the webFPC map.
     In addition to the scientific contributions made by this project, it also helped us to develop the talent, infrastructure, methodology and management skills to tackle large scale sequencing and physical mapping projects like the one being proposed here.

Publications resulting from the award (9): 1) Zhao et al. 2002; 2) Chen et al. 2002; 3) Cheng et al. 2001b; 4) Cheng et al. 2001a; 5) Yuan et al. 2000; 6) Wing et al. 2001; 7) Bastide et al. 2001; 8) Soderlund et al. 2002; 9) Yu et al. 2003

C.1.2) Title: “The Oryza BAC Library Project”: PI R. Wing, Co-PI C. Soderlund, Tomkins (Clemson) and S. Jackson (Purdue). NSF Award# DBI-0208329 (09/15/02-09/14/04). Award amount: $600k.
     The long term objective of this project is to construct 11 deep-coverage large-insert BAC libraries from representatives of all the known wild genomes of rice (see Table 1 for species and genomes sizes) and provide affordable access to these resources (clones, filters and libraries) through the Arizona and Clemson BAC/EST Resource Centers. The grant was awarded this October and will provide the raw material for our proposal to align the wild genomes with the cultivated and sequenced A genome diploids.
     Progress: In mid-November 2002, PIs Wing and Jackson traveled to the International Rice Research Institute (IRRI: Philippines) to prepare high molecular weight DNA for BAC library construction. The 4 week trip was very productive and we were able to produce 1-2 DNA preparations from 8 of the 11 species. Unfortunately the DNA samples were not digestible with restriction enzymes most likely due to the poor condition of the plant material. In May 2003 Dr. Meizhong Luo returned to IRRI to prepare megabase-size DNA from much younger tissue and brought these samples back to Arizona. Fortunately, this time the majority of DNA samples were digestible with restriction enzymes and therefore can be used for BAC library construction (see figure 1 below). These samples are now being used to build BAC libraries in Arizona and Clemson. The first library has been cloned, from O. rufipogon, with an average insert size of 134kb (see figure 2 below).We plan to construct the 11 BAC libraries over the next year and 3 months. The Wing laboratory has over 10 years of experience in construction of BAC libraries, including over 10 rice libraries, and we do not anticipate any problems in completing our objectives within the time frame of the project now that we have suitable DNA preparations

FIGURE 1. Enzyme digestions of wild species HMW DNA.

Figure 2. NotI digestion of Oryza rufipogon BAC clones

C.1.3) Title: "Maize Mapping Project": Lead PI: E. Coe (Missouri); Arizona PI: R. Wing, Co-PI C. Soderlund. NSF Award# DBI-9872655 (09/15/98-09/14/03). Total award amount: $1,629,246.
     The primary role of our group, in collaboration with Missouri, was to construct a genetically anchored phase I BAC physical map of the maize genome. Our objectives were to: 1) construct a deep-coverage BAC library of the inbred B73 using the HindIII; 2) fingerprint the HindIII library and an additional EcoR1 BAC library; 3) assemble the fingerprints into contigs using FPC; 4) order and merge as many contigs as possible along the maize genetic map to create a phase I physical map.
     Objectives 1-3 are complete and we are now in the final year of the project. We generated about 300,000 successful fingerprints from the two libraries which assembled into about 4500 contigs using FPC. All fingerprint and anchor data can be downloaded and viewed using WebFPC and WebChrom at the site, where WebFPC displays the FPC contigs, and WebChrom shows the location of genetic markers and the FPC contigs, with links to external web based databases. Integration of the genetic map with the physical map can be viewed using iMAP at Missouri. The contigs cover approximately 2036 Mb of the 2500 MB genome, and the longest contig is approximately 4 Mb. The libraries have been hybridized with about 14,715 probes, 798 of which are genetically mapped.
     Our final year goal is to order and merge as many contigs as possible with a realistic goal of achieving about 3000 contigs in the end, 1000 of which will be genetically anchored. Although not a complete physical map we have been able to develop an important resource for maize genetics that is widely used by the community.

Publications from the award: 1) Tomkins et al. 2002; 2) Coe et al. 2002; 3) Cone et al. 2002

C.1.4) Title: "Comparative genomics of rice: reconstructing rice chromosome 1 in related species.": PI S. Jackson, Co-PI P. SanMiguel (Purdue). NSF# DBI-0227414 (10/01/02-09/31/07). Award amount: $1,630,537.
     The long term objective of this proposal is to use BAC libraries and Overgo technology to reconstruct rice chromosome 1 in 6 related Oryza species to examine chromosome evolution in group of closely related species and develop tools for comparative mapping in plant genomes.
     Progress: We have developed a computational pipeline to sift through genomic sequences to find overgos that will be used for comparative mapping. We have designed and begun testing the first 96 out of several thousand overgos on rice and a wild rice BAC library. A description of these resources and an online database is available at This proposal has just been funded but we have made tremendous progress and one postdoc, one technician and two graduate students are already on board working on this project.
     We anticipate no problems attaining the goals of this project within the timeline and are collaborating with R. Wing to expand the scope of this project to include the entire rice genome instead of just rice chromosome 1.

D.1.5) Title: "Gramene: A resource for comparative grass genomics" PI: L. Stein; Co-PI: S. McCouch (Cornell). USDA-IFAFS# 2000-04538 (09/01/00-08/31/04). Award amount: $2,098,000.
     Gramene ( is a comparative mapping resource for monocot genomes. It combines the extensive colinearity between the genetic maps of the cereals, with the draft and finished genomes of rice to create an environment in which researchers can move easily from a genetically-defined region in one species to a physical map in another species, and ultimately to an annotated region of the rice genome.
     Gramene then builds on top of this comparative mapping framework by adding a knowledgebase of rice functional genomics. From the Gramene web site, researchers can browse and download an extensive collection of annotated rice mutants and their phenotypes, the gene ontology annotations of rice gene products, protein orthology relationships, and an extensively annotated bibliography of rice biology. Researchers also have access to essential resources for comparative genomics, including an ontology of monocot developmental and phenotypic terms (developed jointly with MaizeDB), assay information for the markers used to develop monocot genetic maps, and outgoing links to stock centers and genomics sites.

Fig. 1: Comparing maize map to genetic and physical maps of rice

     In a typical use case, a researcher who is interested in identifying candidate genes in a genetically defined region of maize can use the comparative map display to find the corresponding region in the genetic and physical maps of rice. From here he can navigate to a display of the rice genome in the selected area (Figure 1) and find annotated candidate genes.
     Gramene currently provides the following data sets to the research community. Typically these sets are available both as browsable web pages and bulk downloads from the Gramene FTP site. We produce a major new web site build every three months, but some data sets are updated more rapidly as circumstances warrant.

overview Detailed view
Fig. 2: Contig view of selected rice region, showing alignment of expressed sequences from rice, maize, and other monocots

1. Rice genomic sequence data: Draft and finished rice genomic sequence, including both japonica and indica subspecies.
     Fully attributed annotations from the rice sequencing groups. These vary in type and protocol from region to region.
     Uniform annotations that Gramene generates internally. Currently this consists of FGENESH gene predictions (Salamov and Solovyev 2000), aligned ESTs and EST clusters, BAC end sequences from rice and other monocots, and genetic markers from rice and other monocots. In the near future we will be adding BLASTX (Altschul et al. 1990) protein to nucleotide alignments, followed by Genewise (Birney and Durbin 2000) alignments and Ensembl pipeline consensus gene models, as described in the Experimental Plan.

2. Rice protein/gene product data: Gramene provides protein records for all published rice protein sequences and hypothetical gene products that have been submitted by the genomic sequencing groups to GenBank/EMBL.
     To provide researchers with the most up to date information on the putative functions of rice gene products, we cross reference all confirmed and hypothetical products with Interpro (Mulder et al. 2003) to provide electronic Gene Ontology annotations. In addition, we manually inspect all confirmed protein entries in order to produce hand-curated GO annotations.
     Phylogenetic relationship information is provided by a table of precomputed BLAST scores, as well as a link to the BLINK service (Wheeler et al. 2003).

3. Monocot genetic and physical map data: We currently provide access to 22 genetic and physical maps for rice, maize, barley, sorghum, oat and the triticeae. Included in this set is the Oryza sativa japonica BAC physical map developed by R. Wing (Chen et al. 2002), and 11 widely used genetic maps of rice. We select which maps to publish in consultation with our Scientific Advisory Board, collaborators and other representatives of the academic and commercial research communities.
     For any map displayed on Gramene, researchers can find the molecular data required to reproduce the component marker assays. In the case of rice maps, the marker molecular data (e.g. primer pairs) is retrieved directly from the Gramene database. For other species, the molecular data is retrieved indirectly via links to affiliated databases such as MaizeDB (Coe et al. 2002) and GrainGenes [].

4. Ontology annotations of rice mutants and strains: Researchers can access a set of ontologies from the Gramene web site. In addition to the standard GO, we have developed several plant-specific ontologies in collaboration with MaizeDB and TAIR. The Plant Ontology (PO) is a collection of concepts to describe plant anatomic locations and developmental stages. The Trait Ontology (TO) is a collection of concepts to describe abnormal traits and phenotypes. All of these ontologies are preliminary, and a major effort for the coming years is to generalize and refine them as we apply them to monocot genome annotation.
     We provide researchers with access to 470 published rice mutants, each of which have been annotated with PO and TO terms. Mutant records are often accompanied by illustrations of the phenotype, and have links to the genome, when known, to the OryzaBase strain database, and to literature references.
     The Gramene architecture is based around the Oracle 9i database, accessed via a large open source middleware layer. We use the Ensembl data model (Clamp et al. 2003) for managing and displaying the rice genome and its annotations, the Gene Ontology Consortium schema for ontologies, and custom software for display of protein annotations, comparative maps, rice mutants, and bibliographic references. With the exception of the Oracle DBMS itself, all software used or produced by Gramene is open source, allowing any group to adapt and use our system.
     The bulk of Gramene's analytic work is producing alignments to the rice genome. We use a version of the Ensembl pipeline called Biopipe, which has recently been released by Elia Stupka's group in the Singapore Fugu Sequencing Project. Biopipe is open source cluster management software that allows us to distribute tasks among multiple machines. The primary workhorse for our alignments is Blat (Clamp et al. 2003), a fast sequence alignment algorithm developed by Jim Kent for use in human genome assembly. The combination of Biopipe and Blat allows us to align 300,000 single reads (ESTs or BAC ends) to the rice genome per hour. We use stringent criteria to ensure that alignments are good ones. These criteria provide an overall alignment success rate of 77% for rice genomic sequence sources, such as BAC ends, and 80% for rice ESTs. ESTs derived from non-rice monocots are aligned unambiguously roughly 70% of the time. These numbers should be viewed in the context of the incomplete state of the rice genome. Extrapolating from the current state of the rice genome to a 99% complete state, we expect to see a success rate of 88% and 91% for rice genomic and EST sequences respectively by the time the IRGSP sequencing project is complete in December 2004.
     Since going on line in January 2002, the community usage of Gramene has increased rapidly and now stands at over 50,000 meaningful hits per month, where a meaningful hit is defined as one that causes a database access rather than the retrieval of a static page or image.

Publications from the award: 1) Ware et al. 2002a; 2) Ware et al. 2002b; 3) McCouch et al. 2002; 4) Jaiswal et al. 2002

  - Introduction Top
  The Poaceae family of grasses is one of the most intensely studied families in plant science and is thought to have originated 70-55 millions years ago. The grass family includes about 10,000 species which covers approximately 20% of the earth's land surface (Shantz 1954). Poaceae includes all the major cereal species such as corn, sorghum, sugarcane, millet, wheat, barley, rye, oats and rice.
     Conservation of gene order across large sections of grass genomes has been documented for maize and sorghum (Hulbert et al. 1990), rice and maize (Ahn and Tanksley 1993), wheat and rice (Kurata et al. 1994) and maize, wheat and rice (Ahn et al. 1993) for evolutionary periods as long as 65 million years. In 1994 Moore et al. (Moore et al. 1995) designed an ingenious representation of Poaceae know as "The Circle Diagram" which shows the relationships of the genomes of several members of Poaceae drawn in concentric circles with rice (Oryza sativa) forming the smallest circle. These studies and others have been a major driving force to describe the grasses as a "Collective Model Genetics System" - to study plant evolution, development and genetics.
     Rice is the most important food crop in the world. Its compact genome, evolutionary relationship with other cereals and sophisticated molecular genetic tools have made sequencing the rice genome a top priority for plant science (Sasaki and Burr 2000). To meet this priority, the International Rice Genome Sequencing Project (IRGSP: was formed in 1998 with the goal of completely sequencing the rice genome by the end of 2008. The IRGSP relied heavily on the BAC physical map/BAC end sequence framework that the Wing lab constructed to sequence the rice genome (Chen et al. 2002; Mao et al. 2000; Wing et al. 2001).The 2008 goal was accelerated by announcements from Monsanto (April 2000) (Barry 2001) and Syngenta (January 2001) (Davenport 2001; Goff et al. 2002) and with their help, a 10X draft of the rice genome was publicly released on December 18th, 2002 (IRGSP 2002). The new goal set by the IRGSP to generate a complete finished rice genome is December 2004. As part of the IRGSP, the ACWW Rice Genome Sequencing Consortium [Arizona Genomics Institute (AGI), Cold Spring Harbor, Washington University - GSC, U. of Wisconsin], was funded in October 1999 to sequence the short arms of rice chromosomes 10 and 3. Together with The Institute for Genomics Research (TIGR) and the Plant Genome Initiative at Rutgers (PGIR ~ 3 Mb of ch10) we finished chromosome 10 (Wing et al. 2003) and are scheduled to finish and annotate chromosome 3 by October of 2003. Japan and China recently published finished sequences for chromosomes 1 (Sasaki et al. 2002) and 4 respectively (Feng et al. 2002). With knowledge that the rice genome will soon be finished it is critical that we have the tools in place to properly annotate and functionally characterize the rice genome and be able to apply this information to other grass genomes. To this end we were recently awarded a NSF grant to construct large-insert deep-coverage BAC libraries from representatives of the 11 wild genomes of rice shown in Table 1 and described in detail in the results from prior NSF Support section.
Table 1 Oryza BAC libraries under construction
Genus species     1Genome Type   2Genome Size  
    Clones     Genome Coverage
O. rufipogon AA 760 29,231 5X
O. glaberrima AA 809 31,115 5X
O. punctata BB 539 41,462 10X
O. officinalis CC 1201 92,385 10X
O. minuta BBCC 1691 130,077 10X
O. australiensis DD (EE?) 1054 81,077 10X
O. latifolia CCDD 1127 86,692 10X
O. schlechteri HHKK 1568 120,615 10X
O. ridleyi HHJJ 1568 120,615 10X
O. brachyantha FF 343 26,385 10X
O. granulata GG 907 69,769 10X
1 Genome designations from (Vaughan 1994) and (Ge et al. 1999)
2 Genome sizes from

     These BAC libraries will be the tools necessary to explore how plants evolve and adapt to variability in genome size and structure, ecological habitats and changes in development and physiology. Within the genus Oryza, genome size varies 5-fold (Table 1), polyploidy exists, there are structural chromosome changes between species (Hass et al. 2003; Shishido et al. 2001) and the habitat adaptation varies from forests to swamps and from Himalayan foothills to Caribbean islands (Vaughan 1994). Developmentally these species show differences in many aspects of plant growth and response to environmental stimuli. Since the central species, O. sativa, is a crop species, the ability to examine crop evolution/domestication at the genic/genome level represents an unprecedented opportunity. Several genes important to rice domestication have been placed on the high-density O. sativa linkage map (Cai and Morishima 2000; Zhou et al. 2001), therefore, the opportunity to study specific genes involved in crop domestication will be available. Because of the depth of genetic information available for O. sativa, this Oryza BAC resource will form a closed system for the study of evolution of specific physiological/developmental genes and large-scale genome events.
     Such work will also lead to rapid isolation of developmentally, physiologically and agriculturally important loci and regulatory regions for comparative functional studies.

  - Objectives Top
    The long-term focus of our research is to develop and exploit the tools of genomics to make the Oryza genus the most advanced and tractable model system to study plant evolution, development, physiology and crop science in the world. The specific objectives of this proposal are to: 1) Construct DNA fingerprint/BAC-end sequence physical maps from 11 deep coverage BAC libraries that represent the 11 wild genomes of Oryza (830,000 fingerprints; 1,659,000 BAC ends) (Table 1); 2) align the 11 physical maps with the sequenced reference subspecies japonica and indica; 3) construct high-resolution physical maps of rice chromosomes 10 and 3 across the 11 wild genomes using a combination of hybridization and in silico anchoring strategies, and; 4) provide convenient bioinformatics research and educational tools (FPC and web-based) to rapidly access and understand the collective Oryza genome.
  - Experimental Plan
    - Agarose Fingerprinting Top
    We will fingerprint all BAC clones from each wild genome library for a total of 830,000 clones using robust techniques established in the AGI Physical Mapping Center (Chen et al. 2002; Marra et al. 1997). Briefly, DNA from BAC clones will be prepared from 1.2 ml cultures in 96 well format by a modified alkaline lysis method. Our lab employs a Tomtec Quadra 320 liquid handling robot to reduce manual pipetting errors. Typical yields from 1.2 ml cultures provide sufficient DNA for both a fingerprinting digest and two end-sequencing reactions. DNA from each prep will be divided into a fingerprinting sample and two end-sequencing samples. The sample for fingerprinting will be digested with HindIII and electrophoresed on high-resolution agarose gels. Fingerprint data will be scored using Image3 software from the Sanger Center ( The AGI fingerprinting lab has an established throughput of 24 96-well plates per day, allowing a production of 11,520 fingerprints/week. Bench work should be completed in approximately 18 months, and band calling by 24 months from project initialization. We plan to fingerprint one library at a time to avoid possible contamination between libraries. Band data will be uploaded to the assembly program FPC (Soderlund et al. 1997). All available marker data will be included with the assembly. Merging of contigs and additional finishing of the physical maps will be done in collaboration with CSHL and Purdue as the data set for each library is completed.
     Although other methods are available for BAC fingerprinting using capillary electrophoresis (e.g high information content fingerprinting (HICF, Ding et al. 2001)) we selected the agarose method because is much less expensive (~$2,000,000 less expensive than HICF for this project) and is robust in our laboratory. Further, by having all BACs end-sequenced (below) and the finished rice genome, we believe the contigs produced using the agarose method will be quite sufficient for aligning them to the reference rice genome. Ed Butler will run and coordinate the day to day of operations this component.
    - BAC End Sequencing Top
    We will sequence the ends of the 830,000 BAC clones (1,659,000 ends: 3/4 at AGI, 1/4 at Purdue) using the same template preparation used for fingerprinting using routine and standard techniques established in the AGI DNA Sequencing Center. Briefly, 5ul of DNA isolated as described above is reacted with 4ul of sequencing chemistry (ABI BigDye v.3.x) with T7 and M13 reverse primers in total reaction volume of 15ul. Cycle sequencing reaction is done on Tetrad (M.J. Research) in 96-well format with following parameters; 96 C for 4min followed by 100 cycles of 95 C for 15sec, 50 C for 10 sec and 60 C for 3 min. EtOH precipitation is applied to remove excessive terminators from the sequencing reaction and purified reactions are resuspended with 5ul of HIDI (Applied Biosystems). Reactions are separated on ABI3730xl DNA sequencer for 90 min with injection time of 45 sec. Software Phred (Ewing and Green 1998; Ewing et al. 1998) and Cross_match (Green 1999) are used for base-calling and trimming vector sequences. Successful sequences (sequence having > 100bp of phred 20 value) are collected and submitted to Genbank with trace files using our AGI automated sequence pipe-line. At the same time all BAC end sequences will be displayed through the web at AGI. Based on these standard conditions, we can routinely achieve a 90% success rate with >650bp high quality bases for BAC end sequencing. We are considering using liquid handling robot to set up and clean up reactions to reduce a human error and increase a consistency between plates. The DNA Sequencing Center at AGI routinely generates over 3,000 reads/day (max. 3,456 reads) with two 3730xl DNA sequencers. Availability of third ABI 3730xl at AGI and a fourth at Purdue will increase throughput to over 5,000 reads/day and we expect to complete 1.6M reactions in two years from project initiation without conflict with currently funded projects.
     Yeisoo Yu will run and coordinate the day to day operations of this component at AGI. AGI will perform DNA isolations and sequencing reaction for all samples. One quarter of the reacted samples will be shipped to Purdue where they will be loaded onto automated DNA sequencers, base-called and submitted to Genbank under the direction of Phillip SanMiguel.
    - Informatics Top
    Fingerprint maps and end sequences produced by this project will be transferred by electronic means to Cold Spring Harbor Laboratory, where they will be aligned to the Oryza sativa japonica and indica genomes, analyzed for SNPs and repeats and integrated into the resources available through Gramene and made available to the research community.

Data Transfer between CSHL and AGI/Purdue: We will set up a secure incoming FTP site at CSHL that allows AGI and Purdue to deposit fingerprint maps and BAC end sequences. BAC end sequence data will include the quality scores produced by Phred (Ewing and Green 1998; Ewing et al. 1998) in order to facilitate SNP calling. We will transfer BAC contig data using the .ace format output by the FPC program. Uploads will be checksummed to ensure completeness and acknowledged by the project data manager.
    Analysis results, such as mapping positions and SNPs, will be returned to AGI via an outgoing FTP directory. Both the incoming and outgoing FTP sites will be password protected to avoid casual abuse.

Mapping of BAC End Sequences to the Rice Genome: We will use the Biopipe/Blat mapping pipeline to align BAC end sequences from the 11 wild rice species to the reference japonica and indica genomes. Since it is estimated that these genomes different from each other at one position every 200 bp, the polymorphism rate is well below the single-pass sequencing error rate, and the presence of SNPs will not present a significant obstacle to alignment.
     Under our current mapping protocols, 80% of the BAC end sequences are expected to align uniquely to the genome. This means that essentially all BAC contigs will be anchorable to the reference genome except for those locations where the genome sequence will contain gaps due to centromeric repeats or unclonable regions
     An important internal quality control metric will be contiguity. In the ideal case, we expect all BAC end sequences in a contig to map to a contiguous region. In the real case, we will have to deal with three confounding factors:

1. Incorrect mapping of one or more BAC ends. An error in mapping a BAC end sequence will lead to outliers in the distribution of map positions in the contig. This can be recognized easily because the mismapped ends will be in the minority and uncorrelated by their position within the contig.

2. An FPC contig misassembly. Typically this will appear as a "chimeric" contig in which the BAC ends in two or more segments of the contig map to different areas in the genome. Within a segment, the map positions will be correlated. Such cases will be reported back to AGI for manual inspection.

3. A genomic rearrangement. The contig is correct, but there has been a genomic rearrangement between a wild species and the reference species since divergence from their common ancestor. For large rearrangements that are not spanned by the BAC contig, this case will often be indistinguishable from (2). For small inversions and rearrangements in which both endpoints of the rearrangement are captured by the BAC contig, we will be able to distinguish and flag this event. In any event, these cases will be reported back to AGI for manual inspection. Cases that cannot be confirmed to be the result of a BAC contig misassembly will be returned to CSHL for entry into the database as a putative rearrangement event.

Identification of SNPs (Stein): After aligning BAC ends to the genome, we will call SNPs using the SSAHA-SNP program (Mullikin and Ning 2003). This software uses neighborhood quality scores to identify high likelihood SNPs in single genomic reads aligned to a reference genome. SSAHA-SNP was used by The SNP Consortium (TSC) to call over 2 million SNPs in the human genome, and the Cold Spring Harbor group has experience in using the software from its participation in this project.
     For the purposes of OMAP, we will restrict SNP calling to regions of the finished genome. This is due to anticipated difficulties in obtaining Phrap quality scores for phase II rice sequence data. At the beginning of the project, we know that finished sequence will be available from at least rice chromosomes 1, 3, 4 and 10. SNP calling on other chromosomes will be performed as their finished sequences are released by the IRGSP.
     In addition to calling SNPs between the wild strains and the reference genomes, we will call SNPs that appear between the wild strains. Putative SNPs will be correlated with genomic annotations and characterized according to their presence within genes and exons, whether they introduce a non-synonymous amino acid change, or affect a splice site. This characterization will use software previously developed for use with the TSC project.

Evolution of repetitive DNA sequences in the wild Oryza genomes (SanMiguel and Jackson): Repetitive DNA sequences constitute a majority of most plant genomes. In rice, repetitive elements comprise nearly 40% of the sequenced genome. Previous studies have shown that there is some variation in copy numbers of certain repeats in the Oryza genomes and that this may have contributed to the variation in genome sizes (Uozu et al. 1997). However, in comparison to other cereals (i.e. wheat or maize) there is little variation in genome size among the Oryzas-except that due to polyploidy. Although there is little variation in genome size, the repeats that constitute a major portion of each Oryza genome do differ from each other (reviewed in, Uozu et al. 1997). Furthermore, we know that transposable elements insert and mutate beyond recognition in less than 10 million years (SanMiguel et al. 2002) and that there is approximately 15 million years of evolution represented in the genus Oryza (E. Kellogg, personal communication). The BAC end sequence data representing nearly 10% of each genome presents an opportunity to sample the repetitive fraction of each genome.
     Studies of the transposable element fraction of orthologous segments of grain genomes (For example, in maize and sorghum (Tikhonov et al. 1999) and in wheat and barley (SanMiguel et al. 2002) found no conserved transposable elements that could convincingly be shown to have inserted before the divergence of their respective species. One class of transposable elements, retrotransposons, can be dated as to when they inserted by determining the sequence divergence of their 5' and 3' long terminal repeats (LTRs) (SanMiguel et al. 1998; SanMiguel et al. 2002) The maize and wheat sequences compared were composed largely of these type of elements. Most of these retrotransposons that could be dated, had inserted in the last 3 million years. None that could be dated had inserted more than 6 million years ago.
     Evidence of insertions that occurred longer than 6 million years ago appears to be lost through deletion via an illegitimate recombination mechanism (Devos et al. 2002) Hence, studying orthologous sequences in species that diverged longer than 10 million years ago does not allow one to study orthologous transposable elements. Given the more limited temporal scope of the species studied in this proposal, orthologous transposable elements should be founC.
     A BAC-end sequence that is composed entirely of a repetitive sequence will confound the anchoring of that BAC-end on the Nipponbare genomic sequence on the basis of sequence similarity. However, it might be anchored as a member of a fingerprinted BAC contig. Further, a sequence that is repeated many times in the entire genome may be present only once in a segment of the Nipponbare genome to which a contig is anchored. In this limited segment of the genome, the sequence read would no longer be repetitive and in many places could be placed unambiguously. The majority of such repeats would likely be transposable elements.
    The insertion times of numerous retrotranspons have been estimated. But these times are based on the speculation that retrotransposon sequence will mutate at a rate approximately equal to other sequences not under selection. That is, synonymous site bases in codons or introns. The mutation rate of retrotransposons (and other transposable elements) may well be much higher. In cases where BAC end sequences from several Oryza species can be shown to both overlap a retrotransposon and are positioned orthologously, mutation rates for this type of DNA can be surmised. As the BACs will be constructed using a few restriction enzymes, the chance of any given read of the 0.1x coverage of each species lining up with other non-Nipponbare BAC-end reads is substantially increased.
     A repeat database will be constructed for each Oryza genome structured to map each repeat class onto a complete transposable element, where possible. Further, it will be noted for each repeat discovered whether it has been reliably mapped to a certain position in the Nipponbare genomic sequence. The repeats present in each genome will be broken down into classifications by presumed transposition mechanism (RNA or DNA) and further sub-classified where appropriate. Each repeat db will be compared against another in pair-wise comparisons focusing especially on sequences present at orthologous locations. In addition, conserved classes of repeats will be used to infer genome phylogeny.
     To further understand how these repetitive elements evolve and mold genomes, select repetitive elements will be mapped to chromosomes and extended DNA fibers to get a picture of their long-range (chromosomal) organization. Together with the sequence data this will present a picture of the copy numbers of repetitive DNA sequences, their sequence arrangement and conservation as well as the chromosomal distribution and evolution of these repeats. This approach will provide a unique, almost complete, snapshot of how repetitive DNA sequences evolve in a family structure and how they mold genomes/chromosomes.

Databasing of Results (Stein): Data from FPC contig mapping and SNP calling activities will be entered into the Gramene project database. Our schema does not currently handle deep sets of variations among a collection of genomes.
     To accommodate this, we will modify our schema so that it follows a data model originally developed by the Neomorphic Corporation for use in the Celera annotation of Drosophila, and since adapted and used for the Chado modular sequence database schema []. In this data model, a region of the genome is represented as a set of triplets S=[[r1,s1,e1],[r2,s2,e2]...] where r is the ID of the reference coordinate system (e.g. the contig), and s and e are the start and end of the sequence using interbase coordinates. An additional pointer provides access to the variant nucleotide sequence itself. This data model provides sufficient flexibility to represent a deep set of variations including single nucleotide polymorphisms, indels, inversions, and rearrangements.

Analysis of Conserved Regions (Stein): The alignment of multiple wild rice species to the reference genome will present us with an opportunity to proofread and improve the rice genome annotations. For example, an Oryza sativa pseudogene that was misannotated as a protein coding gene in the reference sequence, can be caught if it is missing entirely in the wild rice species or has accumulated one or more frameshift or nonsense mutations. In other cases, an unannotated region of the genome where the polymorphism rate drops off significantly may indicate a missed functionally important region, such as a protein-coding gene or RNA. We will systematically examine the genome for such regions.
     Although the evolutionary closeness of the wild and domestic strains diminishes the information available to such tools, we will experiment with using recently-published software that uses sequence conservation to improve predictions of protein-coding and RNA genes. We will scan for protein-coding genes using the TwinScan software (Korf et al. 2001) package, which was recently used to improve gene predictions in regions of human/mouse orthology (Flicek et al. 2003). This software performs ab initio gene prediction on two genomes simultaneously, taking into account such factors as the increased likelihood of seeing polymorphisms at the wobble base of coding sequence. We will also scan the genome for putative functional RNA genes using QRNA (Rivas and Eddy 2001). This software identifies putative RNA genes by searching for coordinated mutations across the axes of symmetry introduced by RNA stem-loop structures.

Data Presentation (Stein): The results of these analyses will be presented on the Gramene web site. The physical maps of the wild rice strains will be displayed side-by-side with each other and with the reference maps using the comparative map viewer CMap. This will provide an intuitive visualization of rearrangements among the various wild genomes of rice. The contig maps displayed on the Gramene viewer will be linked to the WebFPC viewer at AGI for the convenience of researchers wishing to examine the underlying fingerprint data.
     BAC end alignments that support the alignment of FPC contigs to the reference genomes will be displayed using the Gramene sequence viewer. Researchers will be able to "drill down" into the data to the individual nucleotide level, where they will be shown a multiple alignment of the reference genome and one or more aligning wild strains.
     We will provide specialized query tools in addition to the standard search tools already on the Gramene web site. One query tool will allow researchers to search for and download sets of SNPs selected by map position and type, such as coding region SNPs. Another tool will allow researchers to retrieve data on all wild strain contigs that overlap a particular region of the reference genome.
     For the convenience of researchers wishing to design assays for selected SNPs, we will provide a primer-picking interface to the PRIMER3 (Rozen and Skaletsky 2000) program.
     All data analysis results will be downloadable from the Gramene web site as flat files and in XML format. The data will also be available for remote query and browsing using the Distributed Annotation System (Dowell et al. 2001), an XML-based protocol for sharing genomic annotations. Further, all fingerprint data will be downloadable from the AGI web sites to construct and manipulate FPC maps locally

    - Detailed Reconstruction of Rice Chr 1, 3, 10 Top
    In silico anchoring: Approximately 10% of each of the 11 genomes will be represented at the sequence level as a result of the BAC end sequencing included in this project. This will be a rich resource with which to anchor these genomes to the sequenced rice genome. CoPI Stein will map BAC end sequences to the rice genome (see informatics section). If the wild rice genomes have a repetitive DNA content similar to rice then approximately 50% of the BAC end sequences will be repetitive (this can vary depending on restriction sites used to construct the BAC libraries). Given that 10% of the genome (or a chromosome) will be represented in ~500 bp sequence fragments, and that 1/2 of these sequences will not unambiguously anchor to the rice genome due to repetitive elements, there will then be a predicted sequence link about every 10 kb.
     There are, as noted previously (informatics section), several issues that will confound proper alignment of orthologous BAC contigs to the rice genome. From a computational perspective, BAC end sequences may be incorrectly mapped to the rice genome or the contigs may be misassembled. From a genome evolution perspective there are several issues such as local and or long-range chromosomal rearrangements. All of these can lead to chimeric contigs or BAC end sequences that do not collapse properly to the rice genome. Gene/chromosome duplications will also introduce error into the alignment of the genomes due to BAC ends derived from different genomic regions that collapse onto the same location in the rice genome. BAC end sequences that contain sequences repetitive in rice but not in their respective genomes will also cause errors in map alignments. Conversely, elements repetitive in rice but not in related species will introduce errors in aligning contigs to the rice genome. Another problem is that there will likely be genomic regions, both in rice and the related genomes, that are under-represented in BAC end sequences and are therefore not easily aligned.
     The Jackson lab will work with the integrated FPC and sequence data at Gramene but will also assemble - in house - the BAC end sequences and contigs onto the rice sequence map using the Gramene CMAP viewer and the GMOD project's Gbrowse: Generic Genome Browser ( The alignments will be scanned both computationally and visually for errors in assembly resulting from the previously described problems. Our goal is to reconstruct, in a contiguous manner as possible, rice chromosomes 1, 3 and 10 in the wild rice genomes. In chromosomal regions where the BAC contigs from the wild rice genomes cannot be collapsed completely on the rice framework, we will use overgo technology to build a scaffold on which to either collapse the contigs or to better understand what evolutionary events have perturbed that region to the point where it is impossible to align these chromosomes. We will also use fluorescence in situ hybridization to probe rearrangements at the chromosomes level.
  - Roles of Project Personnel Top


Rod Wing: Wing will serve as overall project director for OMAP. Members of his laboratory will serve as Project Manager (Dave Kudrna), direct the fingerprinting and assembly (Ed Butler), direct BAC-end sequencing (Yeisoo Yu). In years 3-4, the Wing lab will hire a Postdoc and Graduate Student to work on the global alignment and chromosome reconstruction experiments in collaboration with Lincoln Stein and Scott Jackson. The AGI will also serve as a mentor for 2 high school teachers during each summer and several UBRP students throughout the year.

Lincoln Stein: Stein will oversee all aspects of integrating OMAP data into Gramene and coordinating with the other project PIs and Senior Personnel to update the global alignment and chromosome reconstruction experiments during the course of the project. The Stein lab will also conduct the SNP data mining analysis and present the results though Gramene and in publications. The Stein lab will also serve as a mentor for 2 high school teachers and 1 UBRP intern during each summer.

Scott Jackson: Jackson will oversee all aspects, bioinformatics and experimental, of the chromosome reconstruction experiments in collaboration with the Wing and Stein labs. The Jackson and SanMiguel labs will also serve as a mentor for 2 high school teachers, several MARC/AIM students and 1 UBRP intern during each summer.

Phillip SanMiguel: SanMiguel will oversee all aspects of BAC-end sequencing at the Purdue sequencing center. SanMiguel will also lead the repeat analysis efforts in collaboration with the Jackson lab.

  - Establish & Maintain an Exp. Advisory Committee Top
    An active and experienced Advisory Committee (3-4 members) will be essential for the success of OMAP. We have already received emails from Drs. Ronald Phillips (University of Minnesota) and Susan Wessler (University of Georgia) and Jonathan Wendel (Iowa State University) agreeing to serve on the committee if the proposal is awarded .
     We envision the committee will travel to AGI, CSHL, Purdue and AGI over the 4-year course of the project, respectively, once a year for a 1 day AC meeting to review our progress and give advice. The committee will be expected to write a formal report that we will use as a guide to help us during the course of the year. This report and our response will become part of our NSF annual progress report due in June of each year.
  - References Top


Ahn,S., J.A. Anderson, M.E. Sorrells, and S. Tanksley. 1993. Homoeologous relationships of rice, wheat, and maize chromosomes. Mol. Gen. Genet. 241: 483-490.

Ahn, S. and S. Tanksley. 1993. Comparative linkage maps of the rice and maize genomes. Proc. Natl. Acad. Sci. (USA) 90:7980-7984.

Altschul, S.F., W. Gish, W. Miller, E.W. Myers, and D.J. Lipman. 1990. Basic local alignment search tool. J. Mol. Biol 215:403-410.

Barry, G.F. 2001. The use of the Monsanto draft rice genome sequence in research. Plant Physiology 125:1164-1165.

Bastide, M.d.l., D. Johnson, V. Balija, and W.R.McCombie. 2001. Strategies and techniques for finishing genomic sequence. In Rice Genetics IV (eds. G.S. Khush D.S. Brar, and B. Hardy). The International Rice Research Institute Press, Manila.

Birney, E. and R. Durbin. 2000. Using GeneWise in the Drosophila annotation experiment. Genome Research 10: 547-548.

Cai, H.W. and H. Morishima. 2000. Genomic regions affecting seed shattering and seed dormancy in rice. Theoretical and Applied Genetics 100: 840-846.

Chen, M., G. Presting, W. Barbazuk, JLGoicoechea, B. Blackmon, G. Fang, H. Kim, D. Frisch, Y. Yu, S. Higingbottom, K. Phimphilai, S. Phimphilai, S. Thurmond, B. Gaudette, P. Li, J. Liu, J. Hatfield, D. Main, S. Sun, K. Farrar, C. Henderson, L. Barnett, R. Costa, B. Williams, S. Walser, M. Atkins, C. Hall, I. Bancroft, J. Salse, F. Regad, T. Mohapatra, N. Singh, A. Tyagi, C. Soderlund, R. Dean, and R. Wing. 2002. An Integrated Physical and Genetic Map of the Rice Genome. Plant Cell 14: 537-545.

Cheng, Z., C. Buell, R. Wing, M. Gu, and J. Jiang. 2001a. Toward a Cytological Characterization of the Rice Genome. Genome Research 11: 2133-2141.

Cheng, Z., G. Presting, C.R. Buell, and R.A. Wing. 2001b. High resolution pachytene chromosome mapping of bacterial artificial chromosomes anchroed by genetic markers reveals the centromere location and the distribution of genetic recombination along chromosome 10 of rice. Genetics 157: 1749-1757.

Clamp, M., D. Andrews, D. Barker, P. Bevan, G. Cameron, Y. Chen, L. Clark, T. Cox, J. Cuff, V. Curwen, T. Down, R. Durbin, E. Eyras, J. Gilbert, M. Hammond, T. Hubbard, A. Kasprzyk, D. Keefe, H. Lehvaslaiho, V. Iyer, C. Melsopp, E. Mongin, R. Pettett, S. Potter, A. Rust, E. Schmidt, S. Searle, G. Slater, J. Smith, W. Spooner, A. Stabenau, J. Stalker, E. Stupka, A. Ureta-Vidal, I. Vastrik, and E. Birney. 2003. Ensembl 2002: accommodating comparative genomics. Nucleic Acids Research 31: 38-42.

Coe, E., K. Cone, M. McMullen, S. Chen, G. Davis, J. Gardiner, E. Liscum, M. Polacco, A. Paterson, H. Sanchez-Villeda, C. Soderlund, and R.A. Wing. 2002. Access to the maize genome: an integrated physical and genetic map. Plant Physiology 128: 9-12.

Cone, K., M. McMullen, I.V. Bi, G. Davis, Y. Yim, J. Gardiner, M. Polacco, H. Sanchez-Villeda, Z. Fang, S. Schroeder, S. Havermann, J. Bowers, A. Paterson, C. Soderlund, F. Engler, R.A. Wing, and E. Coe. 2002. Genetic, Physical, and Informatics Resources for Maize. On the Road to an Integrated Map. Plant Physiololgy 130: 1598-1605.

Davenport, R.J. 2001. Syngenta Finishes. Consortium Goes On 291: 807.

Devos, K.M., J.K.M. Brown, and J.L. Bennetzen. 2002. Genome size reduction through illegitimate recombination counteracts genome expansion in Arabidopsis. Genome Research 12: 1075-1079.

Ding, Y., M.D. Johnson, W.Q. Chen, D. Wong, Y.J. Chen, S.C. Benson, J.Y. Lam, Y.M. Kim, and H. Shizuya. 2001. Five-color-based high-information-content fingerprinting of bacterial artificial chromosome clones using type IIS restriction endonucleases. Genomics 74: 142-154.

Dowell, R.D., R.M. Jokerst, A. Day, S.R. Eddy, and L. Stein. 2001. The Distributed Annotation System. BMC Bioinformatics 2: 7.

Ewing, B. and P. Green. 1998. Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res. 8: 186-194.

Ewing, B., L. Hillier, M. Wendl, and P. Green. 1998. Base-Calling of Automated Sequencer Traces Using Phred. I. Accuracy Assessment. Genome Res. 8: 175-185.

Feng, Q., Y. Zhang, P. Hao, S. Wang, G. Fu, Y. Huang, Y. Li, J. Zhu, Y. Liu, X. Hu, P. Jia, Q. Zhao, K. Ying, S. Yu, Y. Tang, Q. Weng, L. Zhang, Y. Lu, J. Mu, L.S. Zhang, Z. Yu, D. Fan, X. Liu, T. Lu, C. Li, Y. Wu, T. Sun, H. Lei, T. Li, H. Hu, J. Guan, M. Wu, R. Zhang, B. Zhou, Z. Chen, L. Chen, Z. Jin, R. Wang, H. Yin, Z. Cai, S. Ren, G. Lv, W. Gu, G. Zhu, Y. Tu, J. Jia, J. Chen, H. Kang, X. Chen, C. Shao, Y. Sun, Q. Hu, X. Zhang, W. Zhang, L. Wang, C. Ding, H. Sheng, J. Gu, S. Chen, L. Ni, F. Zhu, W. Chen, L. Lan, Y. Lai, Z. Cheng, M. Gu, J. Jiang, J. Li, G. Hong, Y. Xue, and B. Han. 2002. Sequence and analysis of rice chromosome 4. Nature 420: 316-320.

Flicek, P., E. Keibler, P. Hu, I. Korf, and M.R. Brent. 2003. Leveraging the mouse genome for gene prediction in human: from whole- genome shotgun reads to a global synteny map. Genome Res 13: 46-54.

Ge, S., T. Sang, B.-R. Lu, and D.-Y. Hong. 1999. Phylogeny of rice genomes with emphasis on origins of allotetraploid species. Proc Natl Acad Sci U S A 96: 14400-14405.

Goff, S.A., D. Ricke, T.H. Lan, G. Presting, R. Wang, M. Dunn, J. Glazebrook, A. Sessions, P. Oeller, H. Varma, D. Hadley, D. Hutchison, C. Martin, F. Katagiri, B.M. Lange, T. Moughamer, Y. Xia, P. Budworth, J. Zhong, T. Miguel, U. Paszkowski, S. Zhang, M. Colbert, W.L. Sun, L. Chen, B. Cooper, S. Park, T.C. Wood, L. Mao, P. Quail, R. Wing, R. Dean, Y. Yu, A. Zharkikh, R. Shen, S. Sahasrabudhe, A. Thomas, R. Cannings, A. Gutin, D. Pruss, J. Reid, S. Tavtigian, J. Mitchell, G. Eldredge, T. Scholl, R.M. Miller, S. Bhatnagar, N. Adey, T. Rubano, N. Tusneem, R. Robinson, J. Feldhaus, T. Macalma, A. Oliphant, and S. Briggs. 2002. A draft sequence of the rice genome (Oryza sativa L. ssp. japonica). Science 296: 92-100.

Green, P. 1999. swat/cross_match/phrap package. P. Green.

Hass, B., J.W. Lilly, J.C. Pires, R. Porter, P.L. Philips, and S.A. Jackson. 2003. Comparative genetics at the gene and chromosome levels between rice (Oryza sativa) and wild rice (Zizania palustris). Theoretical and Applied Genetics (in press).

Hulbert, S.H., T.E. Richter, J.B. Axtell, and J.L. Bennetzen. 1990. Genetic mapping and characterization of sorghum and related crops by means of maize DNA probes. Proc. Natl. Acad. Sci. (USA) 87:4251-4255.

IRGSP. 2002. Completion of the rice genome sequence announced.

Jackson, S.A., M.L. Wang, H.M. Goodman, and J. Jiang. 1998. Application of fiber-FISH in physical mapping of Arabidopsis thaliana. Genome 41: 566-572.

Jaiswal, P., D. Ware, J. Ni, K. Chang, W. Zhao, S.Schmidt, X. Pan, K. Clark, L. Teytelman, S. Cartinhour, and L. Stein. 2002. Gramene: development and integration of trait and gene ontologies for rice. Comparative and Functional Genomics 3: 2.

Jamison, D.C., J.W. Thomas, and E.D. Green. 2000. ComboScreen facilitates the multiplex hybridization-based screening of high-density clone arrays. Bioinformatics 16: 678-684.

Korf, I., P. Flicek, D. Duan, and M.R. Brent. 2001. Integrating genomic homology into gene structure prediction. Bioinformatics 17: S140-148.

Kurata, N., G. Moore, Y. Nagamura, T. Foote, M. Yano, Y. Minobe, and M.D. Gale. 1994. Conservation of Genome Structure between Rice and Wheat. Bio/Technology 12: 276-278.

Mao, L., T.C. Wood, Y. Yu, M.H. Budiman, S.S. Woo, M. Sasinowski, S.A. Goff, R.A. Dean, and R.A. Wing. 2000. Rice Transposable Elements: A Survey of 73,000 Sequence-Tagged-Connectors (STCs). Genome Research 10: 982-990.

Marra, M., T. Kkucaba, N. Dietrich, E. Green, B. Brownstein, R. Wilson, K. McDonald, L. Hillier, J. McPherson, and R. Waterston. 1997. High throughput fingerprint analysis of large-insert clones. Genome Research 7: 1072-1084.

McCouch, S.R., L. Teytelma, Y. Xu, K.B. Lobos, K. Clare, M. Walton, B. Fu, R. Magnirang, Z. Li, Y. Xing, Q. Zhang, I. Kono, M. Yano, R. Fjellstrom, G. DeClerck, D. Schneider, S. Cartinhour, D. Ware, and L. Stein. 2002. Development of 2,243 new SSR markers for rice (Oryza sativa L.). DNA Research (in press).

Moore, G., K.M. Devos, Z. Wang, and M.D. Gale. 1995. Grasses, line up and form circle. Curr.Biol. 5: 737-739.

Mulder, N., R. Apweiler, T. Attwood, A. Bairoch, D. Barrell, A. Bateman, D. Binns, M. Biswas, P. Bradley, P. Bork, P. Bucher, R. Copley, E. Courcelle, U. Das, R. Durbin, L. Falquet, W. Fleischmann, S. Griffiths-Jones, D. Haft, N. Harte, N. Hulo, D. Kahn, A. Kanapin, M. Krestyaninova, R. Lopez, I. Letunic, D. Lonsdale, V. Silventoinen, S. Orchard, M. Pagni, D. Peyruc, C. Ponting, J. Selengut, F. Servant, C. Sigrist, R. Vaughan, and E. Zdobnov. 2003. The InterPro Database, 2003 brings increased coverage and new features. Nucleic Acids Research 31: 315-318.

Mullikin, J. and Z. Ning. 2003. The phusion assembler. Genome Research 13: 81-90.

O'Brien, S.J., J. Wienberg, and L.A. Lyons. 1997. Comparative genomics: lessons from cats. Trends Genet 13: 393-399.

Rivas, E. and S.R. Eddy. 2001. Noncoding RNA gene detection using comparative sequence analysis. BMC Bioinformatics 2: 8.

Rozen, S. and H. Skaletsky. 2000. Primer3 on the WWW for general users and for biologist programmers. Methods Mol Biol. 132: 365-386.

Salamov, A. and V. Solovyev. 2000. Ab initio gene finding in Drosophila genomic DNA. Genome Research 10: 516-522.

SanMiguel, P., B.S. Gaut, A. Tikhonov, Y. Nakajima, and J.L. Bennetzen. 1998. The paleontology of intergene retrotransposons of maize. Nat Genet 20: 43-45.

SanMiguel, P.J., W. Ramakrishna, J.L. Bennetzen, C. Busso, and J. Dubcovsky. 2002. Transposable elements, genes and recombination in a 215-kb contig from wheat chromosome 5Am. Functional and Integrative Genomics 2: 70-80.

Sasaki, T. and B. Burr. 2000. International Rice Genome Sequencing Project: the effort to completely sequence the rice genome. Curr Opin Plant Biol 3: 138-141.

Sasaki, T., T. Matsumoto, K. Yamamoto, K. Sakata, T. Baba, Y. Katayose, J. Wu, Y. Niimura, Z. Cheng, Y. Nagamura, B.A. Antonio, H. Kanamori, S. Hosokawa, M. Masukawa, K. Arikawa, Y. Chiden, M. Hayashi, M. Okamoto, T. Ando, H. Aoki, K. Arita, M. Hamada, C. Harada, S. Hijishita, M. Honda, Y. Ichikawa, A. Idonuma, M. Iijima, M. Ikeda, M. Ikeno, S. Ito, T. Ito, Y. Ito, A. Iwabuchi, K. Kamiya, W. Karasawa, S. Katagiri, A. Kikuta, N. Kobayashi, I. Kono, K. Machita, T. Maehara, H. Mizuno, T. Mizubayashi, Y. Mukai, H. Nagasaki, M. Nakashima, Y. Nakama, Y. Nakamichi, M. Nakamura, N. Namiki, M. Negishi, I. Ohta, N. Ono, S. Saji, K. Sakai, M. Shibata, T. Shimokawa, A. Shomura, J. Song, Y. Takazaki, K. Terasawa, K. Tsuji, K. Waki, H. Yamagata, H. Yamane, S. Yoshiki, R. Yoshihara, K. Yukawa, H. Zhong, H. Iwama, T. Endo, H. Ito, J.H. Hahn, H.I. Kim, M.Y. Eun, M. Yano, J. Jiang, and T. Gojobori. 2002. The genome sequence and structure of rice chromosome 1. Nature 420: 312-316.

Shantz. 1954. The place of grasslands in the earth's cover of vegetation. Ecology 35: 143-145.

Shishido, R., Y. Sano, and K. Fukui. 2001. Ribosomal DNAs: an exception to the conservation of gene order in rice genomes. Mol. Gen. Genet. 263: 586-591.

Soderlund, C., F. Engler, J. Hatfield, S. Blundy, M. Chen, Y. Yu, and R.A. Wing. 2002. Mapping Sequence to FPC Rice Map. In Computational Biology and Genome Informatics. (eds. C. Wu P. Wang, and J. Wang), pp. In press. World Scientific Publishing.

Soderlund, C., I. Longden, and R. Mott. 1997. FPC: A system for building contigs from restriction fingerprinted clones. CABIOS 13: 523-535.

Thomas, J.W., A.B. Prasad, T.J. Summers, S.Q. Lee-Lin, V.V. Maduro, J.R. Idol, J.F. Ryan, P.J. Thomas, J.C. McDowell, and E.D. Green. 2002. Parallel construction of orthologous sequence-ready clone contig maps in multiple species. Genome Res 12: 1277-1285.

Tikhonov, A.P., P.J. SanMiguel, Y. Nakajima, N.M. Gorenstein, J.L. Bennetzen, and Z. Avramova. 1999. Colinearity and its exceptions in orthologous adh regions of maize and sorghum. Proc Natl Acad Sci U S A 96: 7409-7414.

Tomkins, J.P., G. Davis, D. Main, Y. Yim, N. Duru, T. Musket, J.L. Goicoechea, D.A. Frisch, E.H. Coe, Jr., and R.A. Wing. 2002. Construction and Characterization of a Deep-Coverage Bacterial Artificial Chromosome Library for Maize. Crop Sci 42: 928-933.

Uozu, S., H. Ikehashi, N. Ohmido, H. Ohtsubo, E. Ohtsubo, and K. Fukui. 1997. Repetitive sequences: cause for variation in genome size and chromosome morphology in the genus Oryza. Plant Mol Biol 35: 791-799.

Vaughan, D.A. 1994. The wild relatives of rice. International Rice Research Institute,Manila.

Ware, D., P. Jaiswal, J. Ni, X. Pan, K. Chang, K. Clark, L. Teytelman, S. Schmidt, W. Zhao, S. Cartinhour, S. McCouch, and L. Stein. 2002a. Gramene: a resource for comparative grass genomics. Nucleic Acids Res 30: 103-105.

Ware, D.H., P. Jaiswal, J. Ni, I.V. Yap, X. Pan, K.Y. Clark, L. Teytelman, S.C. Schmidt, W. Zhao, K. Chang, S. Cartinhour, L.D. Stein, and S.R. McCouch. 2002b. Gramene, a tool for grass genomics. Plant Physiol 130: 1606-1613.

Wheeler, D., DMChurch, S. Federhen, A. Lash, T. Madden, J. Pontius, G. Schuler, L. Schriml, E. Sequeira, T. Tatusova, and L. Wagner. 2003. Database resources of the National Center for Biotechnology. Nucleic Acids Research 31: 28-33.

Wing, R.A., Y. Yu, G. Presting, D.A. Frisch, T.C. Wood, S.S. Woo, M.H. Budiman, L. Mao, H.R. Kim, T. Rambo, E. Fang, B. Blackmon, J.L. Goicoechea, S. Higingbottom, M. Sasinowski, J.P. Tomkins, R.A. Dean, and C.A. Soderlund. 2001. Sequence-tagged connector/DNA fingerprint framework for rice genome sequencing. In Rice Genetics IV (eds. G.S. Khush D.S. Brar, and B. Hardy), pp. 215-225. The International Rice Research Institute Press, Manila.

Wing, R.A., Y. Yu, T. Rambo, J. Currie, C. Saski, H.R. Kim, K. Collura, S. Thompson, J. Simmons, T.J. Yang, G.N. Park, A.J. Patel, S. Thurmond, D. Henry, R. Oates, M. Palmer, G. Pries, J. Gibson, H. Anderson, M. Paradkar, L. Crane, J. Dale, M. Carver, T. Wood, D. Frisch, F. Engler, C. Soderlund, W.R.M.e. al., P. Minx, D. Johnson, H. Cordum, E. Mardis, R. Wilson, J. Messing, R. Song, G. Fuks, V. Llaca, S. Kovchak, S. Young, Z. Cheng, J. Jiang, M.A. Johns, L. Mao, H. Pan, R.A. Dean, J.E. Bowers, A.H. Paterson, Q. Yuan, S. Ouyang, J. Liu, K.M. Jones, K. Gansberger, K. Moffat, J. Hill, T. Tsitrin, L. Overton, J. Bera, M. Kim, S. Jin, L. Tallon, A. Ciecko, G. Pai, S.V. Aken, T. Utterback, S. Reidmuller, J. Bormann, T. Feldblyum, J. Hsiao, V. Zismann, S. Blunt, A.d. Vazeilles, T. Shaffer, H. Koo, B. Suh, Q. Yang, B. Haas, J. Peterson, M. Pertea, N. Volfovsky, J. Wortman, O. White, S.L. Salzberg, C.M. Fraser, and C.R. Buell. 2003. Sequence, annotation, and comparative analyses of rice chromosome 10. (Submitted) XX: XX-XX.

Yuan, Q., F. Liang, J. Hsiao, V. Zismann, M.I. Benito, J. Quackenbush, R. Wing, and R. Buell. 2000. Anchoring of rice BAC clones to the rice genetic map in silico. Nucleic Acids Res 28: 3636-3641.

Zhao, Q., Y. Zhang, Z. Cheng, M. Chen, S. Wang, Q. Feng, Y. Huang, Y. Li, Y. Tang, B. Zhou, Z. Chen, S. Yu, J. Zhu, X. Hu, J. Mu, K. Ying, P. Hao, L. Zhang, Y. Lu, L.S. Zhang, Y. Liu, Z. Yu, D. Fan, Q. Weng, L. Chen, T. Lu, X. Liu, P. Jia, T. Sun, Y. Wu, Y. Zhang, Y. Lu, C. Li, R. Wang, H. Lei, T. Li, H. Hu, M. Wu, R. Zhang, J. Guan, J. Zhu, G. Fu, M. Gu, G. Hong, Y. Xue, R. Wing, J. Jiang, and B. Han. 2002. A Fine Physical Map of the Rice Chromosome 4. Genome Research 12: 817-823.

Zhou, Y., W. Li, W. Wu, Q. Chen, D. Mao, and A.J. Worland. 2001. Genetic dissectionof heading time and its components in rice. Theoretical and Applied Genetics 102: 1236-1242.

Phylogeny of the Genus Oryza Top
Education and Outreach opportunities with OMAP Top
  Teacher interns will gain practical experience in physical mapping, DNA sequencing, bioinformatics and comparative genomics to better understand the importance of rice and using wild genomes to improve rice. We will host 6 Teacher Interns per summer, 2 at each site (AGI, Purdue, CSHL). Weekly conference calls will be organized so that the OMAP interns can discuss what they are doing, share their experiences and coordinate activities if needed. Development of a (web-based) lesson plan incorporating the principles learned during the internship will be required of each intern.
Underrepresented undergraduates at University of Arizona will participate in plant genomics research through the UBRP and MARC/AIM programs. Students will work with faculty, postdoctoral and graduate student mentors on specific projects in physical mapping, DNA sequencing, bioinformatics and comparative genomics.
UBRP students will work throughout the year at AGI and (2-4/year) will be given the opportunity to perform short (2 week) summer research projects in our collaborators labs at Purdue and CSHL. The UBRP and MARC/AIM students will gain valuable experience in learning how research is done and will help prepare them for careers at all levels of science.
At Purdue University, CoPIs Jackson and SanMiguel participate in the Purdue MARC/AIM Summer research program which brings undergraduate minority students to Purdue University for 8 weeks during the summer session to participate in lab research. The students are expected to be actively involved in research and must write a research summary at the end of the session. This program is used to expose minority students to research and, hopefully, to spark an interest in biological research as well as to recruit minority students to pursue a graduate career at Purdue University. This program has been extremely successful. In its 20 year history more than 700 students have matriculated of which at least 375 have pursued or are currently pursuing postgraduate education. Nearly 100 of these students have been awarded a Ph.D., M.D., D.D.S. or D.V.M. at various universities.
For OMAP, we will actively recruit underrepresented undergraduates to participate in plant genomics research through the UBRP and MARC/AIM programs. Students will work with faculty, postdoctoral and graduate student mentors on specific projects in physical mapping, DNA sequencing, bioinformatics and comparative genomics. UBRP students will work throughout the year at AGI and (2-4/year) will be given the opportunity to perform short (2 week) summer research projects in our collaborators labs at Purdue and CSHL. The UBRP and MARC/AIM students will gain valuable experience in learning how research is done and will help prepare them for careers at all levels of science.

Teacher Internships in Plant Genomics, University of Arizona
Outreach to underrepresented students
Undergraduate Biology Research Program (UBRP), University of Arizona.
Minority Access to Research Careers (MARC) Program, University of Arizona
The Purdue MARC/AIM Summer Research Program, Purdue University

Teacher Internships in Plant Genomics Back
  The University of Arizona (UA) has a strong track record in education and outreach. One of the most recognized programs is "The Teacher Internships in Plant Genomics Program" (TIPG) which is designed to provide pre-service and in-service biology teachers with university-based lab experience in plant genomics at UA.
Teacher Interns are placed in UA plant genomics labs for eight-week sessions of summer research, and are generally invited to return for a second summer. The interns are paired with an experienced faculty member, post-doctoral fellow, or graduate student who serves as the intern's mentor. This program is designed to provide Teacher Interns with opportunities to understand the nature of science, gain first-hand experience in scientific inquiry, and to better understand and share their ideas about genetics, genomics, and plant biology.
In addition, this program provides scientist mentors with an opportunity to learn about communicating and sharing their work with science teachers, pre-college students, and the general public. Scientist mentors learn how to present content, concepts, and methodologies to a non-scientist audience. The TIGP program was organized by Plant Science Professor Rich Jorgensen and has been in place for the last 2 years, funded through a NSF RUE grant to Jorgensen. The program presently has 12 faculty mentors and has trained 11 teachers. Dr. Nadja Wehmeyer, with over 6 years of experience in education and outreach in biology, is the coordinator of this successful program.
The structure of each Teacher Intern's research experience is tailored by the host PI and research mentor, but may include: conducting a small, independent research project with a finite endpoint or collecting and analyzing data for a larger project that will continue beyond the termination of the internship.
During the last week of the internship, the Teacher Interns are expected to produce a poster summarizing the research conducted during this program and present it at a poster conference.
During weekly meetings, Teacher Interns have the opportunity to develop and share teaching materials related to genetics, genomics, and plant biology as well as discuss their summer research. Equipment needed to teach these newly developed activities will be provided on a loan basis for the teachers throughout the school year, and up to $1000 is made available to each teacher for materials and supplies. Providing classroom support allows the Plant Genomics outreach to target middle and high schools students, reaching potentially thousands of students, a large number of whom are underrepresented minorities (especially Hispanic and Native American).
The Plant Genomics Internship aids teachers in enriching their own understanding of plant biology and the nature of science through research partnerships with university scientists. Pre-service and in-service teachers will have increased accessibility to university-based resources, which they can bring back to the classroom. By partnering with pre-service and in-service teachers to shape the future, we are striving to provide the best possible context for pre-college students to understand biology and the nature of science.
For OMAP, Interns will gain practical experience in physical mapping, DNA sequencing, bioinformatics and comparative genomics to better understand the importance of rice and using wild genomes to improve rice. We will host 6 Teacher Interns per summer, 2 at each site (AGI, Purdue, CSHL). Weekly conference calls will be organized so that the OMAP interns can discuss what they are doing, share their experiences and coordinate activities if needed. Development of a (web-based) lesson plan incorporating the principles learned during the internship will be required of each intern.

Contact Dr. Nadja Wehmeyer: ( for information on this opportunity.
Outreach to underrepresented students Back
  At the University of Arizona, 19% of the undergraduates are underrepresented minorities, of which Hispanics, predominate. Moreover, a significant number of Native American undergraduates are enrolled on our campus.
Underrepresented students are recruited for research training through the Undergraduate Biology Research Program (UBRP), which has links to statewide Community Colleges and targeted funds for minorities (
Of the 1,219 students accepted to UBRP since 1988, 57 % are women and 37% are minority students (of these, 19% are students from ethnic groups underrepresented in the sciences). Additional outreach programs to minority students include the Minority Access to Research Careers Program (MARC), which recruits students from UBRP, and the McNair Program for disadvantaged and underrepresented minority students.
In addition, the campus has a very active AISES group (American Indians in Science and Engineering Society) which was ranked 2nd in the nation in 2000. A large contingent of UA undergraduates attends the annual Society for Chicanos and Native Americans in Science (SACNAS) meeting, and many U.A. students have presented and won awards at these meetings. Moreover, research experienced undergraduates can participate in the BRAVO! (Biomedical Research Abroad: Vistas Open!) Program, to work with their UBRP faculty sponsor's foreign collaborator(s).Since 1992, 118 students have had a BRAVO! experience lasting anywhere from 3 months to a year.
For OMAP, we will actively recruit underrepresented undergraduates to participate in plant genomics research through the UBRP and MARC/AIM programs. Students will work with faculty, postdoctoral and graduate student mentors on specific projects in physical mapping, DNA sequencing, bioinformatics and comparative genomics.
UBRP students will work throughout the year at AGI and (2-4/year) will be given the opportunity to perform short (2 week) summer research projects in our collaborators labs at Purdue and CSHL. The UBRP and MARC/AIM students will gain valuable experience in learning how research is done and will help prepare them for careers at all levels of science.
Undergraduate Biology Research Program (UBRP) Back
  The Undergraduate Biology Research Program (UBRP), University of Arizona ( is an educational program designed to teach students science by involving them in biologically related research.
Students are paid for their time in the lab where they develop an understanding of scientific method and receive a realistic view of biological research. They also acquire the tools necessary to be successful in post-graduate studies in biology should they choose careers related to biology or biomedical research.
UBRP demonstrates how the resources of a major research university can be brought to bear on undergraduate education.

Contact information is through the Director, Carol Bender (
Minority Access to Research Careers (MARC) Program Back
  Minority Access to Research Careers (MARC) Program, University of Arizona

unique research, mentoring, financial and academic opportunity for underrepresented minority students who have interest and potential to pursue careers in biomedical research
training and financial support for the last two years of the student's enrollment in the University
financial benefits including: tuition and fees support; health insurance; monthly stipend
funding to attend national scientific meetings and to seek a summer research experience outside the UA
outstanding faculty from Colleges of Science, Agriculture and Medicine with active and well-funded research programs to provide research guidance and intensive mentoring to participants
overall mentoring provided by Professor Marc Tischler, Biochemistry, and by Professor William Velez, Mathematics
assistance with preparation for the Graduate Record Exam, and applying to graduate schools and for graduate fellowships by Associate Dean Maria Teresa Velez, Graduate College
a seminar series for trainees to meet outstanding minority scientists from other institutions and interact amongst themselves and with other UA faculty mentors

Contact Dr. Tischler ( to apply.
The Purdue MARC/AIM Summer Research Program Back
  The Purdue MARC/AIM Summer Research Program, Purdue University ( offers to talented African-American, Pacific Islander, Hispanic, and Native-American college students who are U.S. citizens:

8 weeks of research under the direction of a Purdue faculty mentor
GRE Preparation Workshop
Graduate school information-how to apply and how to obtain funding
An opportunity to gain viewpoints from faculty and from graduate students
Oral and written presentations of research results
Recreational activities and access to the university gym
An opportunity to form friendships with other participants from all over the U.S.
Double occupancy university housing
A stipend of $3,600, from which you pay for meals and other living expenses
Round-trip travel for external students

Contact information: Prof. Ron Coolbaugh (

Email comments to: