General information

Transcription factors (TFs) are important class of regulatory molecules with functionality ranges from normal development to various biotic and abiotic stress. Despite their importance, a few TF genes have been characterized in rice, a staple food and monocot model for the genetic studies of crop improvement. Genetic redundancy that accompanied by the evolutionary duplication events attributed the slow rate of characterization. In this regard, choosing candidate genes for in-depth functional studies is a challenging step. Gene functional validation is a key requirement for developing agronomically superior plants. The Rice TF Database was created to integrate and host functional genomic information for all putative rice transcription factors (TFs) and other transcriptional regulators. The database contains information on 2,384 putative rice TFs including 58 TF families and 22 epigenetic regulators. TFs were retrieved from Plant Transcription Factor Database . Our goal is to integrate disparate data sets into a logical, user-friendly format. To accomplish this, we have developed a platform to display user-selected functional genomic data on a phylogenetic tree, including sequence information, mutant line information, expression data and so on. The main functionality, Treeview, has the option for multi-platform derived transcriptome data includes microarray based on Affymetrix and Agilent, and RNASeq. The database also includes an interactive chromosomal map showing the positions of all rice TFs. Links are provided to the MSU/TIGR and RAP-DB rice genome annotation databases. We hope this format will make it easier to compare closely related TFs within different families as well as perform global comparisons between sets of related families. We would greatly appreciate any feedback you may have that would assist us in continually improving this database.

Data description

For each family with more than three members, the full protein sequences or domain sequences from RGAP V7 were aligned using ClustalW Version 2.1 with default options. Then maximum likelihood trees were bulit using PhyML 3.0 with JTT model . The tabular view consists of options for identifying gene chromosome and location, RAP-DB ID, NCBI blast search and available literature information. The orthologs in sequenced plants were identified by InParanoid Version 4.1 (Remm et al., 2001) and OMA browser (Schneider et al., 2007). Altogether, orthologs of 15 plant species, including 7 monocot were included. TM indicates the presence of one or more predicted transmembrane domains and identified by TMHMM Server Version 2.0. Presence and location of signal peptide cleavage sites in amino acid sequences (N-terminal Signal Peptides) were identified by SignalP Version 3.0. ChloroP Version 1.1 was used to determine presence of predicted chloroplast transit peptide. ngLOC was used to predict sub-cellular localization of the protein. OryGenesDB was used to map flanking sequence tags (FSTs) from different mutant libraries to the TIGR Version 7 rice pseudomolecules by identifying the highest hit based on a e-10 cut-off. The mapped insertions were then assigned to rice genes based on the insertion map locations relative to the genome annotation. In the OryGenesDB database, a gene was defined as beginning 800 bp 5' of the initiation codon and to the end of the 3'-UTR, where known. The Postech activation lines were obtained from the Postech Rice T-DNA Insertion Sequence Database. We gathered mutant line information from the National Institute of Agrobiological Sciences (NIAS) Tos17 Insertion Mutant Database, UCD Rice Transposon Flanking Sequence Tag Database with Ds KO lines, Oryza Tag Line (OTL) Database with Tos17 and T-DNA KO lines, Rice Mutant Database (RMD) with T-DNA KO lines, Taiwan Rice Insertional Mutants Database (TRIM) with T-DNA KO lines and Postech Rice T-DNA Insertion Seqence Database with T-DNA KO and AC lines. The Affymetrix raw data was downloaded from NCBI GEO and EBI ArrayExpress. We used the MAS 5.0 method provided by the affy R package to convert probe level data to expression values. The trimmed mean target intensity of each array was arbitrarily set to 500. The data within this database was log transformed. There is a little difference between this MAS 5.0 normalization method that we used and the MAS 5.0 provided by Affmetrix Inc. Affymetrix normalization is usually done after summarization and the normalization we used was carried out before summarization. The Rice Multiple-platform Microarrary Element Search tool was used to get the corresponding Affymetrix probe sets for rice genes and only unique probe sets that match unique rice loci were included in this database. If several unique probe sets are available for one certain rice gene, we only select one probe set with the highest expression level. Currently, database include anatomy RNASeq data provided in the CARMO platform and spatio-temporal RNASeq transcriptome data generated for rice phosphate starvation and recovery in root and shoot by Secco et al. (2013).