logo logo
Amino acid difference formula to help explain protein evolution. Grantham R Science (New York, N.Y.) A formula for diference between amino acids combines properties that correlate best with protein residue substitution frequencies: composition, polarity, and molecular volume. Substitution frequencies agree much better with overall chemical difference between exchanging residues than with minimum base changes between their codons. Correlation coefficients show that fixation of mutations between dissimilar amino acids is generally rare. 10.1126/science.185.4154.862
Amino acid preferences of small proteins. Implications for protein stability and evolution. White S H Journal of molecular biology The dependence of amino acid frequency on sequence length has been examined for the 20 natural amino acids using a set of 2275 protein sequences with little sequence identity. As expected, the frequency of cysteine increases dramatically for sequences shorter than 100 amino acids with a length-dependence that corresponds to an average of two Cys per sequence independent of length. Surprisingly dramatic changes were also observed for the frequencies of arginine, lysine, aspartic acid, and glutamic acid: Arg and Lys frequencies increase for short sequences whereas Asp and Glu frequencies decrease. These changes do not appear to be due to an over-abundance of DNA- and membrane-binding proteins in the database and may, therefore, be related to protein stability. Possible stabilizing mechanisms include increased hydrogen bonding by Arg and increased hydrophobic stabilization due to the amphiphilic character of Arg and Lys. These observations suggest that amino acid composition played an important role in the evolution of small proteins. 10.1016/0022-2836(92)90515-l
Impact of C-terminal amino acid composition on protein expression in bacteria. Molecular systems biology The C-terminal sequence of a protein is involved in processes such as efficiency of translation termination and protein degradation. However, the general relationship between features of this C-terminal sequence and levels of protein expression remains unknown. Here, we identified C-terminal amino acid biases that are ubiquitous across the bacterial taxonomy (1,582 genomes). We showed that the frequency is higher for positively charged amino acids (lysine, arginine), while hydrophobic amino acids and threonine are lower. We then studied the impact of C-terminal composition on protein levels in a library of Mycoplasma pneumoniae mutants, covering all possible combinations of the two last codons. We found that charged and polar residues, in particular lysine, led to higher expression, while hydrophobic and aromatic residues led to lower expression, with a difference in protein levels up to fourfold. We further showed that modulation of protein degradation rate could be one of the main mechanisms driving these differences. Our results demonstrate that the identity of the last amino acids has a strong influence on protein expression levels. 10.15252/msb.20199208
Fundamental amino acid mass distributions and entropy costs in proteomes. Lehmann Jean,Libchaber Albert,Greenbaum Benjamin D Journal of theoretical biology We examine whether the frequency of amino acids across an organism's proteome is primarily determined by optimization to function or other factors, such as the structure of the genetic code. Considering all available proteins together, we first point out that the frequency of an amino acid in a proteome negatively correlates with its mass, suggesting that the genome preserves a fundamental distribution ruled by simple energetics. Given the universality of such distributions, one can use outliers, cysteine and leucine, to identify amino acids that deviate from this simple rule for functional purposes and examine those functions. We quantify the strength of such selection as the entropic cost outliers pay to defy the mass-frequency relation. Codon degeneracy of an amino acid partially explains the correlation between mass and frequency: light amino acids being typically encoded by highly degenerate codon families, with the exception of arginine. While degeneracy may be a factor in hard wiring the relationship between mass and frequency in proteomes, it does not provide a complete explanation. By examining extremophiles, we are able to show that this law weakens with temperature, likely due to protein stability considerations, thus the environment is essential. 10.1016/j.jtbi.2016.08.011