Genome sequence and analysis
The P. chrysogenum genome was sequenced by the whole-genome sequencing method. The nuclear genome of 32.19 Mb was covered by 49 supercontigs, including 21 supercontigs larger than 5 kb and 14 supercontigs larger than 100 kb. Annotation based on a minimum open reading frame (ORF) size of 100 amino acids revealed 13,653 ORFs (Table 1), including 592 probable pseudogenes and 116 ORFs whose sequences were truncated because their coding regions spanned contig borders. Two ORFs were considered unlikely to encode proteins because of their small size, absence of detectable protein motifs and low codon adaptation index. The sequenced mitochondrial genome comprised 31,790 bp and 17 identified ORFs.
BLASTP matches were found for 11,472 ORFs (P < 0.001) to a nonredundant protein database, whereas the remaining 2,198 ORFs showed no significant similarities. Predicted protein-coding sequences account for 56.6% of the P. chrysogenum genome, with an average gene length of 1,515 bp. The GC content was 48.9% (52.8% for exons, 45.3% for introns and 44.4% for intergenic regions). On average, each gene contained 3.0 exons, with 83.5% of the genes containing introns. Using the FunCat classification system7, 5,329 of the 12,943 predicted nuclear-encoded proteins could be assigned to the functional protein classes metabolism, energy, cellular transport and protein fate (Fig. 1).
Comparison with other fungal genomes
The sequenced P. chrysogenum genome is comparable in size to that of other filamentous fungi (Supplementary Table 1 online). FunCat classification revealed a conserved orthologous core fungal proteome (Supplementary Fig. 1 online) involved in energy production, protein fate and cell fate. Phylogenetic analysis based on the concatenated protein set (Fig. 2a) confirmed a close relationship to Aspergillus species. The tree topology indicated that P. chrysogenum is only distantly related to the other two sequenced Penicillium species, Penicillium marneffei and Talaromyces stipitatus (teleomorph of Penicillium stipitatum). This contradicts a previously published phylogeny8 but is consistent with morphological observations9.
The 14 largest supercontigs (between 167 kb and 6,387 kb), presumably correspond to chromosome arms or even entire chromosomes. Alignment against various Aspergilli chromosomes suggests extensive reshuffling has occurred after divergence of the Aspergillus and Penicillium lineages (Fig. 2b). Several supercontigs are bounded by areas with multiple synteny breaks, which may correspond to subtelomeric regions (Fig. 2b). Indications for subtelomeric instability have also been observed in Aspergilli10 and Magnaporthe oryzae11. Five supercontigs (nos. 12, 16, 20, 21 and 22) contain gaps surrounded by larger syntenic blocks, which appear to be recombination cold spots. These gaps resemble putative centromeres in the eight A. fumigatus chromosomes12. Due to their high repeat content, centromeres as well as ribosomal DNA (rDNA) repeats regions typically do not get assembled into supercontigs in fungal genomes12,13. Coincidentally, the upper gap on supercontig no.16 is not surrounded by large syntenic blocks and is likely to contain the rDNA region (Fig. 2b).
Genome alignment revealed four supercontigs (nos. 17, 19, 23 and 24), representing 4% of the genome, that show little similarity to Aspergilli. These regions contain P. chrysogenum-specific genes, which are typically smaller and contain fewer introns than other genes. Their biological roles are mostly unknown, although some seem to function in transport, metabolism or transcriptional regulation (Supplementary Fig. 2c online). These four nonsyntenic regions also contain numerous repeat elements and 23% of the genome's transposable elements (Supplementary Table 2 online). Similar genomic islands have been found in other fungal genomes11,13,14.
Almost 30% of the predicted P. chrysogenum proteins lack orthologs in other sequenced fungi. In the closely related genus Aspergillus, the origin of lineage-specific genes has been largely attributed to either gene acquisition through horizontal gene transfer10,15 or to gene duplication followed by accelerated diversification and differential gene loss13. These genes tend to function in secondary metabolism and other accessory roles (Supplementary Fig. 2a) and may have a recent evolutionary origin. Phylogenetic analysis was applied to a subset of putative secondary metabolism genes. Thirty-three of such genes were identified using SMURF software (http://www.tigr.org/software/) encoding for: 20 polyketide synthases (PKS), 10 nonribosomal peptide synthetases (NRPS), 2 hybrid NRPS-PKS enzymes and 1 dimethylallyltryptophan synthase (Supplementary Table 3 online). This is similar to the numbers found in Aspergilli10,12,15,16. The penicillin cluster is well known17, and the siderophore synthetases for ferrichrome (Pc13g05250) and triacetylfusarinine (Pc16g03850, Pc22g20400) were readily assigned by homology (Supplementary Table 4 online). None of the remaining six NRPS could be confidently identified. Pc21g15480 may encode roquefortine synthetase18 and is clustered with tryptophan dimethylallyl transferase (Pc21g15430). The putative tetrapeptide synthetases Pc13g14330 and Pc16g04690 have similar architectures to those in Aspergilli and may form cyclopeptides with two adjacent D-amino acids presumably related to malformin19. Pc21g10790 may form a cyclohexapeptide containing a fatty-acid derived component and is orthologous to a similar NRPS found in A. oryzae.
Penicillin biosynthetic genes
Several prokaryotic features of two penicillin biosynthetic genes, pcbAB and pcbC, encoding α-aminoadipoyl-L-cysteinyl-D-valine synthetase and isopenicillinN (IPN) synthase, suggested that the penicillin gene cluster emerged through horizontal gene transfer from bacteria to fungi20. Both genes lack introns (which is unique for large NRPS genes like pcbAB), are highly homologous to their bacterial counterparts and are physically linked. Other features to consider are GC content (which is above 60% in prokaryotic penicillin producers) and specific codon usage. In P. chrysogenum, the GC content of the penicillin biosynthetic genes is only slightly higher than the overall genome average (Supplementary Data online). In the clavulanic acid producer S. clavuligerus the phenylalanine-codon UUU is extremely rare compared to UUG; UUU comprises only ∼2.3% of total phenylalanine-codons. Whereas, in P. chrysogenum, the UUU codon overall is used in one-third of the cases, it is used for 26.2% and 17.6% of the phenylalanine-codons in pcbAB and pcbC, respectively. This can be interpreted as near complete codon adaptation because of the hypothesized transfer acquisition event.
Three other examples of possible horizontal gene transfer were identified in the P. chrysogenum genome: the arsenate-resistance cluster and two 6-methylsalicylic acid clusters (Supplementary Data and Supplementary Table 4). These gene clusters contain highly conserved bacterial-like genes with GC content well above the surrounding genes (exons with 55–58% GC).
The penicillin biosynthetic genes are clustered on supercontig 21 in the middle of a 120-kb region that is amplified in industrial P. chrysogenum strains4. Thirty-nine additional ORFs were identified in this region (Supplementary Table 5 online), including genes encoding transporters and transcriptional regulators. However, the predicted annotation of these ORFs does not suggest clear functions in penicillin biosynthesis as reported recently21,22.
The third penicillin biosynthetic gene, penDE, encoding acyl-CoA:isopenicillinN acyltransferase, has a paralog, Pc13g09140. This gene was not transcribed under the conditions studied (Supplementary Table 6 online). Detailed analyses are needed to reveal its actual function. Several orthologs of β-lactam biosynthesis genes were identified throughout the genome (Supplementary Table 6). As deletion of phl, which encodes phenylacetyl-CoA ligase, resulted only in a partial loss of penicillinG production23, other phenylacetyl-CoA ligases must be present24. These may include identified orthologs of 4-coumarate-CoA ligase. Surprisingly, several orthologs of bacterial25 and fungal26 isopenicillinN epimerase were identified. The predicted protein sequence of Pc12g11540 shares 40% homology with S. clavuligerus isopenicillinN epimerase, although it probably functions as an aminotransferase. Also, orthologs of Acremonium chrysogenum cefD1 and cefD2 were identified. The presence of these ORFs is remarkable, as P. chrysogenum can only produce penicillinN after introduction of both A. chrysogenum genes27. The P. chrysogenum ORFs may be remnants of an ancestral cephalosporin pathway.
In P. chrysogenum, microbodies (peroxisomes) are essential for penicillin biosynthesis because the two final enzymatic steps catalyzed by acyl-CoA:isopenicillinN acyltransferase28 and phenylacetyl-CoA ligase29 are located in these organelles. Moreover, high-producing strains have enhanced microbody volume fractions (Fig. 3). Also, a further increase in microbody abundance by overexpression of the proliferation gene pex11 leads to a significant increase in penicillin production30. Genome 2D-searches31 with known consensus sequences for microbody targeting signals (PTS)32 identified 214 putative matrix proteins (196 and 17 with putative PTS1 and PTS2 respectively; 1 with both signals) (Supplementary Table 7 online). Remarkably, the putative isopenicillinN-CoA epimerase (Pc22g13680) has a predicted PTS1. Many of the proteins are β-oxidation homologs, including multiple acyl-CoA synthetases and putative 3-ketoacyl-CoA thiolases (Pc13g12930, Pc15g00410 and Pc22g06820). Indeed, P. chrysogenum readily consumes oleate as the sole source of carbon and energy. Other PTS-containing proteins, such as D-amino acid oxidases, may play a role in the metabolism of various carbon or nitrogen sources.
When genomic DNA of P. chrysogenum was hybridized to microarrays, 99.4% of the probe sets hybridized (Supplementary Data). Once validated (Supplementary Data), these microarrays were used to investigate, at the transcriptome level, the molecular basis of the improved penicillin productivity achieved via classical strain improvement. Transcriptome analysis was performed on aerobic, glucose-limited chemostat cultures of Wisconsin54-1255 and the derived industrial, high-producing strain DS17690 (ref. 33). Some intermediates of β-lactam biosynthesis are produced in the absence of the side-chain phenylacetic acid (PAA)33, but penicillinG biosynthesis is strictly dependent on PAA34 (Supplementary Data). In DS17690 grown in the absence of PAA, 67% of the genome (∼9,200 genes) yielded a detectable transcript (Supplementary Table 8 online). Under comparable conditions, 86% (∼5500 genes) of the smaller S. cerevisiae genome was transcribed35. To discriminate between PAA-responsive transcripts and transcripts potentially related to improved penicillinG productivity, the two strains were grown under penicillinG-producing and nonproducing conditions. In at least one of the four comparisons, 2,470 genes were differentially transcribed (Supplementary Table 8). By K-means clustering36, these genes were assigned to eight clusters (Fig. 4 and Supplementary Tables 9–16 online).
Transcription of the penicillinG biosynthesis genes pcbAB, pcbC and phl was independent of PAA, but two- to fourfold higher in the high-producing strain (Fig. 4, cluster 5). penDE showed a similar trend (1.9- and 1.5-fold difference in the presence and absence of PAA, respectively; P < 0.05 in a t-test). Genes encoding enzymes involved in the biosynthesis of the amino-acid precursors of penicillin (cysteine, valine and α-aminoadipic acid) were also transcribed at higher levels in the high-producing strain independent of the presence of PAA (Fig. 4, cluster 5 and Fig. 5). This included sulfur reduction and early stages of serine (and cysteine) biosynthesis (Pc20g03220; Pc12g02680 and Pc12g04370), as well as a homolog of O-acetylhomoserine (thiol)-lyase (Pc12g05420), a key enzyme in the trans-sulfuration pathway toward cysteine. Several genes encoding enzymes related to α-aminoadipic acid (lysine; Pc18g01330, Pc14g00150) and valine (Pc22g22510, Pc22g23110) metabolism showed a similar trend.
Of the genes predicted to encode microbody proteins, 27 showed higher transcript levels in DS17690, irrespective of PAA addition (Fig. 4, cluster 5). This class was also overrepresented among genes that were upregulated by addition of PAA (Fig. 4, cluster 2).
The homogentisate pathway for PAA degradation has been reported to be largely inactivated in Wisconsin54-1255 and, presumably, also in derived strains, owing to point mutations in the pahA gene encoding phenylacetate hydroxylase37. Nevertheless, both strains showed very low, but significant rates of PAA consumption that could not be attributed to penicillinG production (Supplementary Data). Despite the low in vivo activity of this homogentisate pathway, its transcriptional regulation has been retained throughout the strain improvement program, as its structural genes showed increased transcript levels in the presence of PAA in both strains (Fig. 4, clusters 1 and 2).
Several transcriptional regulators have been implicated in the transcription of pcbAB, pcbC and penDE in P. chrysogenum38. No penicillin-specific transcriptional regulator has been identified, although strong effects were reported from an enhancer sequence in the upstream region of pcbAB39 and chromatin modulation40. The laeA gene responsible for the latter effect strongly affects secondary metabolism in Aspergillus fumigatus40. Although the P. chrysogenum ortholog was transcribed, its transcript levels were not substantially influenced by strain or cultivation conditions (Pc16g14010, Supplementary Table 17 online). Several other putative transcription factors, whose functions remain to be elucidated, were found to be associated with secondary metabolite clusters (Supplementary Data).
Transport mechanisms for β-lactam antibiotics and intermediates across the fungal plasma membrane and between intracellular compartments are poorly understood. Industrial fermentations yield high amounts of penicillinG in the external broth, whereas intracellular concentrations are typically tenfold lower. Moreover, penicillinG secretion is sensitive to verapamil41, an antagonist of multidrug transporters. This implies that secretion is an active process possibly mediated by (an) ABC transporter(s), whose identity has remained elusive. P. chrysogenum contains 830 genes that specify transporter proteins. Secondary transporters (688) are numerous with the majority belonging to the major facilitator superfamily (416), whereas 51 ABC transporters were identified. The functional categories metabolism, transport and detoxification were among the most strongly overrepresented in the gene clusters that were transcriptionally upregulated in the presence of PAA in both strains (Fig. 4, clusters 1 and 2 and Supplementary Table 18 online). Several of these showed sequence similarity to known multidrug transporter genes. Transporters were significantly overrepresented within the class of genes expressed more highly in DS17690 than in Wisconsin54-1255, irrespective of PAA (P = 3.94 × 10−6, see Fig. 4, cluster 5), identifying sixty-eight potential active transporters (Supplementary Table 19 online). Interestingly, none of the previously suggested penicillin transporters41,42 are among this group. Although some transporter genes may be involved in transport of PAA rather than of β-lactams or intermediates, the strong enrichment of transporter genes suggest that penicillin secretion might result from the simultaneous activity of multiple transporters.
A. A. Amorim,1,2 M. V. Tognetti,3 P. Oliveira,1 J. L. Silva,1 L. M. Bernardo,1 F. X. Kärtner,4 and H. M. Crespo1,*
1IFIMUP and IN—Instituto de Física dos Materiais da Universidade do Porto, Instituto de Nanociências e Nanotecnologias, Departamento de Física, Faculdade de Ciências, Universidade do Porto, R. do Campo Alegre 687, 4169-007 Porto, Portugal
2Departamento de Física, Instituto Superior de Engenharia do Porto, R. de S. Tomé, 4200 Porto, Portugal
3Consorzio Nazionale Interuniversitario per le Scienze Fisiche della Materia, Unità di Siena, Dipartimento di Fisica, Università degli Studi di Siena, Via Roma 56, 53100 Siena, Italia
4Department of Electrical Engineering and Computer Science and Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
*Corresponding author: email@example.com