Homology Modelling , Bioinformatics Analysis and Insilico Functional Annotation of an Antitoxin Protein from Streptomyces coelicolor A 3 ( 2 )

Streptomyces coelicolor A3 (2) is a soil-dwelling, filamentous bacteria and a reservoir of a wide range of natural antibiotics. A hypothetical protein SCO2235 of this bacterium, comprising of 87 residues was selected for in silico analysis. Numerous bioinformatics tools were used to predict the structure and function of this protein. Subcellular localization of the targeted protein was also predicted. Multiple sequence alignment (MSA) was used to locate the conserved residues and for the secondary structure analysis. Sequence homology was assessed against the protein data bank and non-redundant database by using BLASTP program of NCBI, which that revealed the targeted protein have similarity with different antitoxin protein. Homology model was obtained and PDB ID: 3D55: A served as a template having 71% homology with the protein SCO2235. The three-dimensional structure was predicted through Modeller. Validation of the three-dimensional structure was obtained through PROCHECK and QMEAN6 programs. Root Mean Squared Deviation (RMSD) calculation was used to detect super imposition of query and template structure associated with Z score. To get understandings about the physical and functional association of the targeted protein with others, STRING network analysis was implemented. Finally, the CASTp server was used to predict the active site of the protein. That is usually specific for toxin binding and DNA binding. To end, whole results hinted the biological function of the target protein to be an antitoxin.


Introduction
Streptomyces are soil-conquering gram-positive bacteria and member of the order of Actinomycetales [1].Different strains of Streptomyces coelicolor produce numerous antibiotics, such as actinorhodin, methylenomycin, undecylprodigiosin and perimycin and it is also used for heterologous protein expression [2][3][4].
Streptomyces coelicolor A3 (2) is so far the best genetically learned Streptomyces strain and become a standard organism for Streptomyces species [5].The genome of S. coelicolor A3 (2) was sequenced in 2002 and, consists of 8,667,507 bp encoding 7,825 predicted genes, over 20 gene clusters for the synthesis of known or predicted natural products [6].It has further expanded the knowledge of this organism and enabled large-scale analysis of transcriptome and proteome [7,8].
Toxin-antitoxin (TA) system was widely adopted in many genomes like bacteria and archaea and usually recognized as maintenance or stability mediator [8,9].Although, the exact role of this system in the genome is not clear but acts as a sentinels against DNA lost and various stress management process like programmed cell death and antibiotic resistance [10].According to the mode of action, this TA system has been classified into three broad Classes-Class I, II and Class III.Among them, Class II is predominant in many organisms [11].
The Class II TA system consists of two proteins called toxin and antitoxin.Where toxin is neutralized by antitoxin through direct protein-protein interaction and/or interacts with palindromic sequences within the promoters for suppressing transcription of the TA system [12][13][14].
Nowadays, sequencing technology has become more sophisticated and dealing with massive amount of data.Unfortunately, many of these genomes are still not fully annotated and comprise of various genes or proteins with anonymous function and structure.This is due to several limitations, such as the cost and time necessary for experimental methodologies.Hence, an alternative approach far from wet lab procedure called bioinformatics is now well established and, it uses algorithms and different logics derived from wet lab research to annotate the genome [15].Recent time these sorts of approaches have got much popularity [16][17][18][19].
As the sequence is more unstable than the structure, here we tried to get some insights about the protein's (SCO2235) function, through predicting the secondary and three-dimensional structure as well as the comparative proteomics and catalytic sites.
The hypothetical protein SCO2235 (gi|21220706|) of Streptomyces coelicolor A3 (2), consisting of 87 amino acid residues, was selected for the study.Then the sequence was stored as an FASTA format for further analysis.

Analysis of physicochemical properties
The ProtParam (http://web.expasy.org/protparam/)tool of ExPASy was used for the analysis of the physiological and chemical properties of the targeted protein sequence [20].The properties including aliphatic index (AI), GRAVY (grand average of hydropathy), extinction coefficients, isoelectric point (pI), and molecular weight were analyzed using this tool.

Subcellular localization prediction
Determination of the subcellular localization is crucial for understanding protein function and is also vital for the genome analysis.Prediction of subcellular localization of Streptomyces coelicolor A3 (2) was carried out by CELLO v.2.5 which is a multiclass support vector machine classification system [21,22].

Comparative proteomics
The BLASTP program of NCBI database (http://www.ncbi.nlm.nih.gov/) was used for searching the similarity with our protein against the non-redundant database with default parameter [23].
Then the protein (SCO2235) was analyzed for the presence of conserved domains based on sequence similarity search with close orthologous family members.For this purpose three different bioinformatics tools and databases including Proteins Families Database (Pfam), NCBI Conserved Domains Database (NCBI-CDD) and SUPERFAMILY were used [24][25][26].Pfam is a database of protein families that includes their annotations and multiple sequence alignments, generated by using hidden Markov models.NCBI-CDD is a protein annotation resource that consists of a collection of wellannotated multiple sequence alignment models for ancient domains and full-length proteins.The SUPERFAMILY annotation is based on a collection of hidden Markov models, which represent structural protein domains at the SCOP superfamily level.The annotation is produced by scanning protein sequences from over completely sequenced genomes against the hidden Markov models.

Multiple sequence alignment and secondary structure analysis
To get structural and functional insights through the sequence comparison, a combined approach was implemented.We fetched several annotated antitoxin protein sequences of Streptomyces species from the NCBI Protein database and their multiple sequence alignment (MSA) with the targeted protein were obtained through   the BioEdit biological sequence alignment editor [27].After that, these aligned sequences were used for the prediction of the secondary structure by using EsPript 3.0 [28].We used PDB ID: 3D55 as a template source for this prediction.

Homology modelling
Homology modelling was used to determine the threedimensional structure of S. coelicolor A3 (2).A BLASTP search with default parameters was performed against the Brookhaven Protein Data Bank (PDB) to find suitable templates for homology modeling [23].PDB ID: 3D55: A, was identified as the best template based on sequence identity (71%) between query and template protein sequence.The tertiary structure was predicted by MODELLER through HHpred tools of the Max Planck Institute for Development Biology [29][30][31].

Model quality assessment
The quality of the predicted structure was determined by PROCHECK and QMEAN6 programs of ExPASy server of SWISS-MODEL Workspace [32][33][34].Furthermore, Root Mean Squared Deviation (RMSD), superimposition of query and template structure was generated by using UCSF Chimera 1.5.3 [35].The Z score of the template and query were also assessed by ProSA-web server [36].Finally, the proposed model and the superimposition structure were visualized by using PyMOL (The PyMOL Molecular Graphics System, Version 1.5.0.4,Schrödinger, and LLC) [37].

Protein-protein interaction analysis
To perform accurate functions, protein residues interact with each other in the biological systems.We used STRING (http://stringdb.org/)database for the analysis of the protein-protein interaction (PPI) of our targeted protein.This database works through physical and functional associations derived from genomic context, highthroughput experiments, co-expression and previous knowledge to predict PPI interactions.Currently, this database covers 5, 214, 234 proteins from 1133 organisms [38].

Active site determination
Active site of the protein was determined by the computed atlas of surface topography of proteins (CASTp) server, which provides an online resource for locating, delineating, and measuring concave surface regions on three-dimensional structures of proteins [39].

Results and Discussion
Various physiological and chemical properties of the hypothetical protein SCO2235 were assessed by ProtParam tool (Table 1).These are including aliphatic index (AI), instability index (II), pI, extinction coefficient and average hydropathicity.All of these calculations are related to the stability of protein and that are correlates with proper function [40].
Subcellular localization is an indispensable feature of a protein.Cellular functions are usually localized in specific enclosed area; so, foretelling the subcellular localization of an unknown protein may possibly use to obtain handy information about their function.Therefore, this information is also valuable for the drug designing and further study about the protein [41].Here, the subcellular localization of our targeted protein (SCO2235) predicted by CELLO is cytoplasm.
The BLASTP search against non-redundant database showed a higher homology with antitoxin proteins from different Streptomyces species and showed highest of 98% homology with the target protein (Table 2).Phylogenetic analysis was depicted in the Figure 1, by using the same data.The output of the tree with the true distance guided us about the evolutionary similarity of different antitoxin genes as well as proteins.
Numerous web tools were used to search the conserved domains and potential function of SCO2235.Based on consensus predictions made by Pfam, NCBI-CDD and SUPERFAMILY suggested that the protein SCO2235 contains PhdYeFM antitox superfamily domains and is currently classified as antitoxin PhdYefM, type II toxinantitoxin system.Pfam server predicted the Antitoxin PhdYefM, type II toxin-antitoxin system at 2-72 amino acid residues with an e-value of 1.6e-18.The PhdYeFM antitox super family was also found in NCBI-CDD server at 2-81 amino acid residues with an e-value of 2.71e-22.In the SUPERFAMILY server, the domain was found at 3-79 amino acid residues with an e-value of 2.88e-22.In this system once antitoxin protein bound to their toxin companions, they can bind DNA via the N-terminus and inhibit the expression of the operons which contain genes encoding TA system.[42,43].
MSA of different antitoxin protein of Streptomyces and our targeted protein (gi|21220706) are depicted in the Figure 2. The secondary structure of the proteins are also included in this figure and showed that they are mostly conserved throughout the alignment along with the template.
Homology modelling is an indispensable part of the structural genomics in the recent past for the comparative modelling of various unknown structure with enormous tools [44,45].In this study our targeted protein SCO2235, does not possess any solved crystal structure and, we predicted the comparative three-dimensional model of our protein through homology modelling.That is depicted in the Figure 3. Here, the template 3D55: A, is M. tuberculosis YefM antitoxin and showed higher amount of similarity with our target.
Quality assessment of the predicted three-dimensional model was acquired from PROCHECK through "Ramachandran plot" where we got 96.2% amino acid residues were within the most favored region (Figure 4 and Table S1).The quality of our model was further checked by QMEAN6 server where the model was placed inside the dark grey zone and considered as a good model with a QMEAN6 score of 0.608 (Figure S1).Superimposition between the model and the template is shown in the Figure 5A.The RMSD value indicates the degree to which both the template and query structures are similar.The lower value indicates more structural similarity.
The RMSD value obtained from the superimposition of SCO2235 and the template (3D55: A) in UCSF Chimera was found to be 0.343 Å, suggesting a reliable three-dimensional model.The Z score evaluates the global model quality and is used to check whether the input structure is within the range of scores usually found for native proteins of similar size.The z for the model obtained from PROSA was -3 (Figure 5B) and for the template was -3.44 (Figure 5C), proposing the homology between target and the model.
Protein-protein interaction analysis from STRING database for our targeted protein is shown in the Figure 6.From the analysis, it is quite clear about the function of our targeted protein and it might be an antitoxin.The functional partners were yoeB, SCO2237 and glnE.All of these partners are associated with the TA systems.
The active site of the protein was analyzed by CASTp server.The identification and characterization of functional sites on proteins have increasingly become an area of interest.On account of the analysis of the active site residues for the binding of ligands provides insight towards the design of inhibitors of an enzyme.In this study, we have also analyzed the best active site area of the experimental protein as well as the number of amino acid involved in it (Figure 7).Most of the cases, for the class II antitoxin have two domains, one is DNA-binding domain located in the N-terminal region and other is toxin binding domain located in the C-terminal end [46][47][48][49].In our analysis, we have also found similar domain based active sites in our modelled protein.Those were depicted as spherical view in the Figure 7.

Conclusion
We have used homology modelling and comparative proteomics approach to predict the three-dimensional structure and possible functions for the Streptomyces coelicolor A3 (2) hypothetical protein SCO2235.With the assistance of a clearly expressed structure and annotations, we can foretell protein functional and binding sites, which can help in understanding what biological role it fulfills.All the above findings suggested that the function of the target protein is "antitoxin" which acts as type II TA system.Hopefully, this comprehensive study on this track might produce some breakthrough leads for impending research.5A, the template 3D55: A (shown as red color) and the hypothetical protein SCO2235 (shown as cyan color).The RMSD value for this superposition is 0.343 Å. Figure 5B showed the Z score of the model and Figure 5C showed the Z score of the template 3D55:A.

Figure 1 :
Figure 1: Phylogeny analysis of different antitoxin protein of Streptomyces sp. with target protein SCO2235 (gi|212207067|) with true distance.

Figure 2 :
Figure 2: Multiple sequence alignment of different antitoxin protein with secondary structure analysis.Here, the gi|212207067| is for the protein SCO2235 and the secondary structure, α helix and the β sheet, are shown on the top of the alignment.

Figure 3 :
Figure 3: Predicted three-dimensional model of the hypothetical protein SCO223.The N-terminal end started with β sheet (Blue) and the C-terminal end is coiled structure (Red).

Figure 4 :
Figure 4: Ramachandran plot of modelled structure validated by PROCHECK program.

Figure 5 :
Figure 5: Three dimensional structure superposition of the template and predicted model.Here, in Figure5A, the template 3D55: A (shown as red color) and the hypothetical protein SCO2235 (shown as cyan color).The RMSD value for this superposition is 0.343 Å. Figure5Bshowed the Z score of the model and Figure5Cshowed the Z score of the template 3D55:A.

Table 1 :
ProtParam tool analysis result for the targeted protein SCO2235.

Table 2 :
Similar proteins obtained from non-redundant database.