This web page was produced as an assignment for an undergraduate course at Davidson College.

Paper Review One

"The complete genome sequence of Lactobacillus bulgaricus extensive and ongoing reductive evolution"

by M. van de Guchte et al.

What is the bacteria being studied?

The researchers studied lactic acid-producing bacteria called Lactobacillus delbrueckii ssp. bulgaricus (L. bulgaricus). L.bulgaricus. L.bulgaricus is important because of its application in yogurt production. The environment that L.bulgaricus currently lives in contains abundant lactose-rich milk and stable proteins. L.bulgaricus have been living in this environment for at least 5000 years because "the first records of yogurt (kisim) were 3200 before Christ". The benefits of consuming yogurt containing L.bulgaricus include attenuation of lactose intolerance, immune modulation and diarrhea-alleviation. The assembled genomic sequence of L.bulgaricus is about 1,864,998 bp.

What is the researchers's claim?

The researchers claimed that there is an "extensive and ongoing reductive evolution" in the complete genome sequence of L.bulgaricus.

What are the main supporting arguments?

First, the authors argued that the GC content at codon position 3 is much higher than the overall GC content indicating that the composition of the genome is evolving toward a higher GC content. Second, the authors suggested that the genome has undergone a recent phase of size reduction because of the high number of pseudogenes and the high numbers of rRNA and tRNA genes. Third, the researchers suggested that the 47.5 kbp inverted repeat in the replication termination origin might be interpreted as a transient stage in genome evolution. In addition, the researchers blasted L.bulgaricus with L.acidophilus and L.johnsonii and found that though there is a global synteny among these bacteria, L.bulgaricus lost many genes that are involved in sugar transport and metabolism compared to L.acidophilus and L.johnsonii.

What are the data that support the arguments?

Table 1 indicated that the GC content at codon position 3 (65.0 %) is much higher than the overall GC content of coding region (51.6 %). Figure 1 shows the relationship between GC content at position 3 of coding sequences (GC3) (y axis) and genomic GC content in 232 eubacterial genomes. The best-fit line was drawn and L.bulgaricus value was circled. Visually, the data does seem to support that with similar genome size, L.bulgaricus have higher GC content at codon 3 compared to other bacteria. However, the researchers could have provided more information. For instance, it is not clear why the pseudogenes were excluded in L.bulgaricus. Or were the pseudogenes excluded in all bacterial genomes compared? I wonder what the graph would look like if the researchers included the pseudogenes. At about 40 and about 57 in Y axis (GC genome (%)) in Figure 1, it seems to me that there are also some outliers whose GC content at codon 3 are higher than other bacteria with similar genome size. It would be helpful to know what the significance of the difference is (like error bars). Is L.bulgaricus the only outlier? If not, I would also like the researchers to compare L.bulgaricus to other outliers. (For example, have they also undergone reductive evolution? what are their environments?) .

Figure 2 was generated using genewiz software. There are seven circles. Circle 1 (outermost circle) are pseudogenes with red being positive strand, and blue being negative. Circle 2 are transposases with elements fewer than four copies represented as gray, and elements with more than five copies represented as red (ISL7), purple (ISL4), blue (ISL5), and green (ISL4-5). Circle 3 is coding regions excluding the pseudogenes. Circle 5 is GC window size (2000) from -0.1 (cyan) to 0.1 (red). Circle 6 is AT window size (500) from .3 (cyan) to .7 (red). Circle 7 is position on the genome. From Figure 2, it is clear that there are a large number of pseudogenes in L.bulgaricus (270 pseudogenes) regularly distributed across the genome. The high number of pseudogenes does seem to suggest the active state of gene elimination and size reduction in L.bulgaricus genome. However, it would be very helpful if the researchers also compared the number of pseudogenes as a function of genome size (like Figure 1 and Figure 3). In this way, other bacteria also with a high number of pseudogenes might be found and if the researchers knew that they also had changed their environment (like from plant-associated habitat to lactose-rich milk environment for L.bulgaricus), the researchers might obtain better evidence.

Figure 3 shows the relationship between rRNA and tRNA genes and the genome size among 54 firmicutes genomes. A is a graph of the number of 16S rRNA genes (Y axis) against the genome size in 54 firmicutes genomes (X axis). B has the number of tRNA genes in the Y-axis and genome size of 54 firmicutes genomes in the X-axis. The data point for L.bulgaricus is circled in both A and B. It is obvious that L.bulgaricus is an outlier in both A and B. The researchers indicated in the paper that the numbers of tRNA and rRNA genes are about 50% higher than the average and about 25% higher than the highest values observed in similar size genomes. The researchers also mentioned that variation in rRNA and tRNA gene copy numbers may be related to the capacity to respond to changing environments and the high numbers observed in L.bulgaricus might indicate that the genome has gone through a recent phase of size reduction. However, I also observed other outliers in the figure. For example, the outlier at 3 (Genome Size Mbp (X axis)) in both A and B seem to have a high number of rRNA and tRNA genes compared to others with the similar genome size. It is worth comparing L.bulgaricus with other outliers because if all the other outliers with higher number of rRNAs and tRNAs had not undergone a recent phase of size reduction, is rRNA/tRNA evidence still strong enough? On the contrary, if all (or most) outliers had shown to undergo a recent phase of size reduction, then rRNA/tRNA argument would be strengthened.

Figure 4 compares the synteny between L.bulgaricus (X axis) and L.acidophilus genomes (Y axis). 0 is the replication origin. Colors show protein similarity by blast score ratio according to the scale on the right. Cutoff value of 0.4 that were the results of a blastp reciprocal best-hit analysis was used for blast. The discontinuous diagonal lines from left to right in the figure show that there is a clear global synteny. The researchers said in the paper that most unconserved regions are sequences with unknown functions. They also mentioned that proteins of known function for biosynthesis of flolate, saturated fatty acids, purines and pyrimidines found in L.bulgaricus are only partially present in L.acidophilus or L. johnsonii. Proteins that are present in L.acidophilus or L.johnsonii but not in L.bulgaricus are mainly involved in sugar transport and metabolism.

The researchers found an inverted repeat of 47.5 kbp at positions 918952-966484 for the ATCC11842 genome. They argued that it is very rare to observe inverted repeats of this size in bacteria genomes. The authors asserted that the results of their one primer PCR amplification in 30 different L.bulgaricus strains showed that an inverted repeat is conserved in most strains. The data for their results were not shown and they also acknowledged that the size of the inverted repeat might largely vary between strains (these data were not shown neither). It would be more convincing if the researchers published their data.

The authors also talked about protocooperation between L.bulgaricus and S.thermophilus during milk fermentation. For example, L.bulgaricus has an extracellular cell wall bound protease and thus can supply peptides and amino acids to S. themophilus by degradation of milk products. In return, S.thermophilus pays back by producing formate and CO2. Interesting, though L.bulgaricus lost a large number of enzymes in the biosynthesis of amino acids, S.thermophilus has retained almost all enzymes for synthesizing amino acids. The researchers speculated that S.thermophilus retained its ability to synthesize amino acids because it does not possess an extracellular protease to break down the milk proteins. However, it will be interesting to see some more vigorous alternative explanations.




Van de Guchte, M, Penaud, S, Grimaldi, C, Barbe, V, Bryson, K, Nicolas, P, Robert, C, Oztas, S, Mangenot, S, Couloux, A, Loux, V, Dervyn, R, Bossy, R, Bolotin, A, Batto, JM, Walunas, T,Gibrat, JF, Bessières, P, Weissenbach, J, Ehrlich, SD and Maguin, E. 2006 Jun 13. The complete genome sequence of Lactobacillus bulgaricus reveals extensive and ongoing reductive evolution. Proceedings of the National Academy of Sciences of the United States of America 103: 9274-9279. Accessed Oct 9, 2008.



Genomics Homepage

Biology Homepage


© Copyright 2008 Department of Biology, Davidson College, Davidson, NC 28035