Finding “Small” Proteins and Discovering How They Affect Plant Biomass Growth
Proteins less than 200 amino acids in length are commonly called “small proteins.” They have recently been found to have important roles in regulating biological processes such as stress response, flowering, and cell-to-cell communication in plants. However, identification of short open reading frames (sORFs), the genes that encode small proteins, has been a problem because their small size makes accurate prediction difficult. Researchers at Oak Ridge National Laboratory, working with scientists at the DOE BioEnergy Research Center, have applied computational biology to gene expression and protein data to discover sORFs encoding small proteins in the promising bioenergy feedstock Populus deltoids (poplar). Using the capacity of the DOE Joint Genome Institute for deep RNA sequencing, they reconstructed high-quality, full-length genes directly from the set of genes expressed in poplar (transcriptome), thus avoiding the uncertainty of prediction from genome sequence. The team then applied three computational filters to enrich for protein-encoding sORFs: prediction based on known protein sequences, evolutionary conservation between poplar and other plants, and protein family clustering. The results demonstrated the efficacy of this strategy in discovering candidate sORFs in sequenced as well as yet unannotated genomes. This method will greatly enhance understanding of the regulatory mechanisms underlying processes such as growth and stress response, features important to the development of high-yielding, sustainable bioenergy feedstocks.
Yang, X., T. J. Tschaplinski, G. B. Hurst, S. Jawdy, P. E. Abraham, P. K. Lankford, R. M. Adams, M. B. Shah, R. L. Hettich, E. Lindquist, U. C. Kalluri, L. E. Gunter, C. Pennacchio, and G. A. Tuskan. 2011. “Discovery and Annotation of Small Proteins Using Genomics, Proteomics, and Computational Approaches,” Genome Research doi:10.1101/gr.109280.110. Published online March 2, 2011.