Applying In Silico Approaches for Designing a Chimeric InaV/N-DFPase Protein and Evaluating its Binding with Diisopropyl - Fluorophosphate

. The N-terminal domain of the ice-nucleation protein InaV (InaV-N) of Pseudomonas syringae was applied to display the DFPase on the cell surface. In silico techniques were used to generate a model in order to examine the possibility of DFPase exhibition on the cell surface. The secondary and tertiary structures of a chimeric protein were determined and then, the predicted model was subjected to several repeated cycles of stereochemical evaluation and energy minimization. The homology-modeled structure of the InaV/N-DFPase protein was docked to DFP. The optimized inaV/N-dfpase gene was translated to 519 amino acids. The minimum free energy of the best-predicted secondary structures was formed by RNA molecules (-215.45 kcal/mol). SOPMA analysis results showed that the main helix peak corresponded to the anchor fragment. Validation of the 3D model indicated that 86.1% of amino acid residues were incorporated into the favored regions. The moldock score was 360.22 for DFP. Results of this study indicated that according to in silico analysis, all of these findings were effective in targeting DFPase.


Introduction
Nowadays, pollution due to Organophosphorus pesticides and chemical nerve agents is a major problem for animal and human health. The toxicity results from covalent binding within the active site of Acetyl Cholinesterase [1] enzyme, and irreversible inhibition of esterase activity that leads to at least 70% of neuropathies (organophosphate-induced delayed polyneuropathy) [2][3][4]. Biological treatment is considered as an effective and safe method to remediate contaminated environments using microbial biotechnology [5][6][7].
Among the various enzymes that have degrading ability of Organophosphorus compounds, several enzymes are well-characterized and sequenced such as Organophosphorus Acid Anhydrolase (OPAA), Organophosphorus Hydrolase [8], Diisopropyl-Fluorophosphatase (DFPase) and Serum paraoxonase/arylesterase 1 (PON1) [9,10]. DFPase is also known as a squid type DFPase (Loligo vulgaris) and is capable of cleaving the P-F bond of Diisopropyl-Fluorophosphate (DFP), Sarin and Soman. However, DFPase shows a weak ability to hydrolyze P-O and P-CN bonds and does not have any capacity to hydrolyze P-S bonds [11,12]. DFP is a structural analogue of G-type chemical warfare agents and is utilized as a simulant of G-type agents in research due to its reduced human neurotoxicity. The resulting enzymatic reaction products do not possess the toxicity of OP compound [13].
Appling native strains producing intracellular DFPase to degrade Organosphorous compounds is of interest, but its inability to pass across the membrane reduces the overall catalytic effect [14,15]. The problem can be solved by the displaying of DFPase on the cell surface by using of anchors including the Lpp-OmpA chimer, Ice Nucleation Protein [16] and auto-transporters [17][18][19][20][21].
Bioinformatics based processing tools and advanced algorithms have been utilized before experiments because at the beginning we would like to study whether the truncated ice nucleation protein InaV fragment that covered only the N-terminal domain (InaV-N) which could sufficiently be applied to display DFPase in silico [22]. Therefore, the current in silico study was conducted to investigate the properties of DFPase which displayed by the InaV-N anchor and its binding efficacy after docking with DFP.

Sequence retrieval and primary analysis of chimeric construct
The protein sequences for DFPase (Accession no. Q7SIG4) and InaV/N (Accession no. EPF65740) were obtained from the National Centre for Biotechnology Information (NCBI) at http://www.ncbi.nlm.nih.gov in FASTA format and then, subjected to PSI-BLAST (Position Specific Iterated -BLAST). A chimeric sequence of InaV/N-DFPase that contain the N-terminal domain of the Ice Nucleation Protein [16,23] and DFPase was constructed. The sequences were fused together using an EcoRI restriction site. The chimeric gene was optimized via in silico analysis using an online optimization tool (http://www.genscript.com/index.html) and Kazusa codon usage database (http://www.kazusa.or.jp/codon). In addition, the program mfold (http://www.bioinfo .rpi.edu/applications/mfold) was used to predict the mRNA secondary structure of the constructed gene [24].

The physico-chemical parameters and protein secondary structure prediction
The amino acid composition, molecular weight, theoretical pI, atomic composition, extinction coefficient, estimated half-life, instability index, aliphatic index and grand average of hydropathicity (GRAVY) was predicted via ProtParam at http://web.expasy.org/ protparam. Prediction of amino acid residues distribution was investigated by the pepwheel online tool (http://emboss.bioinformatics.nl/cgi-bin/emboss/pepwheel). The SOPMA method (Self-Optimized Prediction Method with Alignment) was used for the prediction of secondary structure. The solubility was predicted by SCRATCH Protein Predictor in http://scratch.proteomics.ics.uci.edu/ and Recombinant Protein Solubility Prediction website in http://www.biotech.ou.edu/.

Prediction of tertiary structure
The tertiary structure of chimeric construct was predicted using I-TASSER online server which generates three-dimensional models with a confidence score (C-score). The chimeric protein 3D models was produced by initio modeling were uploaded to the Swiss-PdbViewer server to illustrate the tertiary structural. The Discovery Studio Visualizer 1.7 tool was used to visualize the modeled 3D structures. Several repeated cycles of energy minimization was applied on the predicted model using SPDBV software. In addition, model stability was evaluated using Ramachandran plot analysis at http://mordred.bioc.cam.ac.uk/~rapper/rampage.php server.

Docking of DFP onto InaV/N-DFPase
The 3D structure of InaV/N-DFPase from I-TASSER and the 3D structure of DFP from Chemspider database (http://www.chemspider.com/) were used for docking of DFP. The Molegro Virtual Docker V4.2 was applied to detect the active sites and then, docking was completed by MolDock function. Docking was performed with the potential active sites detected on the InaV/N-DFPase molecule. A new hybrid search algorithm, called guided differential evolution is the basis of the MolDock.

Construct design and optimization of chimeric gene
To design the chimeric construct, only 203 amino acids from InaV-N were selected that have been reported to be involved in targeting and anchoring the INP protein on the Pseudomonas syringae cell surface. Therefore, InaV/N (GenBank No. O33479) and the gene sequence encoding DFPase were selected for the first and second fragment (GenBank No. O33479), respectively. These two fragment were selected in order to design a chimeric construct by which the DFPase is expected to be exposed on the bacterial surface via the InaV/N fragment. As shown in Fig. 1A, these two domains were separated via an EcoRI restriction site.
Since the possibility of high protein expression is correlated to the value of the Codon Adaptation Index. So, CAI was calculated for chimeric gene that was equal to 0.83 (Fig. 1B). It should be noted that a CAI of >0.8 is rated as good for expression in the desired expression organism. The overall GC content of the optimized sequence was 51.45% which should lead to the overall stability of mRNA from the synthetic gene (Fig. 1C). The percentage of codons distribution is shown (Fig. 1D). The optimized InaV/N-dfpase gene showed just 5% of codons with values of 21-30 and more than 50% had the value of 91-100. Additionally, the proper restriction enzyme sites (BamHI and HindIII) were introduced for cloning purpose at the 5′ and 3′ ends of the sequence.

mRNA structure predictions
The minimum free energy of the best-predicted secondary structures was formed by RNA molecules (-215.45 kcal/mol). The first nucleotides did not have a pseudoknot or long stable hairpin at the 5′ end ( Fig. 2A), all structural elements were obtained in our analysis which showed that folding of the RNA construct was adequately stable for efficient translation in the new host.

Protein secondary structure prediction
The total amino acid sequence length of chimeric protein was 519, with an alpha helix (Hh) motif accounting for 59 amino acids or about 11.37% of the protein. An extended strand motif was 165 amino acids in length accounting for 31.79% and, beta turn (Tt) motif of 52 amino acids represented 10.02% and a random coil (Cc) motif of 243 amino acids accounted for 46.82%. There were no 310 helix (Gg), Pi helix, Beta Bridge, bend region (Ss), ambiguous or other states. SOPMA analysis results showed that the main helix peak corresponded to the anchor fragment. Other sequence analyses, protein structure and function prediction algorithms such as those for lowcomplexity regions, transmembrane helices, coiled-coil regions and the Garnier-Osguthorpe-Robson (GOR) algorithm showed that the sequence does not contain long regions with any regular secondary structure (Fig. 2B).
Recombinant protein solubility showed that the protein sequence has a 58.8% chance of insolubility when overexpressed in E. coli. The scratch protein predictor showed solubility upon overexpression with probability 0.5.
International Letters of Natural Sciences Vol. 75

ProtParam analysis
According to ProtParam, the InaV/N-DFPase construct contained 519 amino acids which it has a molecular weight of 56703 Da and a theoretical pI of 5.27. The other obtained results were also showed in Table 1. -0.302

PepWheel
The PepWheel program was used for observation of the periodic distribution of amino acid residues in protein sequences. The periodic distribution of residues on an ideal alpha-helix can be appreciated when considering alongside the helical axis. As shown in Fig. 3, aliphatic residues, hydrophilic residues and positively charged residues are marked with squares, diamonds and octagons, respectively.

Tertiary structural prediction for the chimeric protein
The C-score of predicted model (I-TASSER) was 3.31. Generally, the higher value of C score shows the higher confidence of the model. The model with the maximum C-score was used to draw up the tertiary structural illustrations with DS Visualizer and Swiss-PdbViewer in order to conclude the final structure of the protein. The results showed several α-helices which most observed in the InaV/N fragment that is in consistent with the results of our secondary structure. Also, it showed that the formation of two separate domains in the chimeric protein (Fig. 4A). In addition, as shown in Fig. 4B, validation of protein 3D model via Ramachandran plot indicated that 86.1% of amino

ILNS Volume 75
acid residues were incorporated into the favored regions of the plot and 7.2% of residues were in allowed regions of the plot. 6.8% of residues were in the unfavored regions of the plot such as Pro88, Asp111, Pro209, and Thr400. The profile of energy minimization was calculated by spdbv (Swiss-PdbViewer) (-5923.671 Kcal/mol).

Docking of DFP onto InaV/N-DFPase
Computational validation and exploration of the binding affinity of the recombinant InaV/N-DFPase protein were carried out to chlorpyrifos by using Molegro Virtual Docker. Only one cavity was detected in the InaV/N-DFPase model with good interaction with DFP (Fig. 4C). The moldock score was 360.22 for DFP. The active site residues is shown in Fig. 4D. Chlorpyrifos shows very high affinity to interact with Leu 478, Ser 476, Asn 477, Ile 454, Trp 502, Val 489, Pro 475 and Ile 487. All of these residues are involved in binding to the detected cavity in the recombinant InaV/N-DFPase.

Discussion
One of the most important problems in biodegradation of OPs is the subject of mass transfer through the cell membrane. To overcome this substrate transport barrier, different strategies have been used such as cell surface display and protein secretion [25]. The applications of anchoring domains have been proven to display a foreign protein on different host cells by other researchers [26][27][28]. One of the main domains in exhibition of foreign proteins is the truncated form of InaV (Nterminal domain of INP, InaV/N) and InaQ (N-terminal domain of INP, InaQ/N) which is known as an applicable and functional strategy to display the foreign proteins [29,30]. Qianqian and coworkers demonstrated that the N-terminal domain of InaQ is responsible for the transmembrane conveyance and activity of InaQ/N in membrane [31]. Although no distinct signal sequences were recognized, the beginning amino acids (18 residues) of InaQ-N were discovered essential to protein secretion. Molecular modeling and bioinformatics tools were employed to determine whether InaV-N could be used as an applicable anchor to display DFPase. Alto and co-workers used some bioinformatics approaches to design potent and selective anchoring peptide antagonist of protein kinase A [32]. Luo and co-workers used in silico screening methods such as molecular docking to analyze potential inhibitors of Sortase enzyme A of streptococcus mutans [33]. Here we arranged a new construct of InaV/N-dfpase that theoretically is suited for the expression of DFPase on the cell surface of bacteria. Some issues can disturb the expression of foreign genes in bacterial systems including messenger RNA instability [34], premature polyadenylation [35], abnormal splicing [36] and improper codon usage. The efficacy of heterologous protein expression can be diminished by biased codon usage. To overcome this problem, several approaches are suggested including sitedirected mutagenesis to remove rare codons or adding of rare codon tRNAs to host expression system. Recently, developments in synthetic genes technology can be resolved codon usage bias and desired genes can be produced cost-effectively [8]. Codons are rarely used in E. coli such as AGG/AGA (arginine), AUA (isoleucine), CGG (arginine), CCC (proline), CUA (leucine) and GGA (glycine) which were avoided in the synthetic gene [37]. The possibility of high protein expression is correlated to the value of the Codon Adaptation Index [18]. A CAI of >0.8 is rated as good for expression in the desired expression system. The codon optimization tool has improved the sequence to permit a CAI of greater than 0.8 (CAI of 0.83) and getting the chance of high-level protein expression. Any peaks outside of the ideal range of GC content shows that adversely disrupt transcriptional and translational efficiency from 30% to 70%. The GC content of the optimized InaV/N-dfpase gene sequence is 51.45% which should lead to the stability of the mRNA. The GC distribution was balanced in the optimized sequence of gene and this has been reported to be associated with high mRNA stability and expression in E. coli. The percentage distribution of 100 is established for the codon with the highest usage frequency for a given amino acid in the desired expression system. Expression efficiency can be disrupted with codons values lower than 30.
To increase the mRNA stability, DNA motifs might contribute to mRNA instability in E. coli which they were excluded from the synthetic gene. The synthetic DNA encoded the mature chimeric gene and was designed based on the codon usage of highly expressed genes in E. coli. To verify the potential folding of chimeric mRNA, comparative sequence analysis was combined with genetic algorithm-based RNA secondary structure prediction. The 5' ends of the gene were folded in the way typical of all bacterial gene structures. The prediction of RNA secondary structure and the graphical depiction of predicted minimum free energy for the synthetic gene showed that the average energy minimization was -215.45 kcal/mol which they confirmed the mRNA stability for efficient translation.
As SOPMA analysis showed, nearly one-third of chimeric protein length has extended secondary structure. Formation of these structures in proteins occurs by the accumulation of secondary structures, in direct interaction with the atoms of central chain. The group of amino acids established in this manner are supported together by means of broad arrange of hydrogen bonds, which can set up accumulations much larger than separate secondary structures with equivalent stability. Due to the formation of very stable aggregates, extended secondary structures significantly increase the conformational stability of proteins. The presence of extended secondary structures in a

48
ILNS Volume 75 significant proportion of the final composition of the predicted structure represents good protein stability and can help to achieve ultimate folding. The investigation of the secondary structure of chimeric protein showed that the alpha helix motif is formed only in an anchor piece of recombinant protein. In this regard, it can be important that there is no alpha helix in the natural structure of the wild DFPase, so the creation of alpha helix may make significant changes in protein folding and its function.
The Ramachandran plot results (>90% favored allowed) established the structural stability of the chimeric protein. The profile of energy minimization was -5923.671 Kcal/mol which indicated that the chimeric protein had an acceptable stability. Comparison of the desired synthetic gene with the original one displayed no major difference between these two molecules and their structures were well-matched with each other.
The chlorpyrifos in comparison with DFP has bulk side chain and helps to characterize the active site cavity better than a small molecule such as DFP. For this reason, before docking performance to DFP, docking of chlorpyrifos was carried out to chimeric InaV/N -DFPase. Almost all of the cavity residues interacting with chlorpyrifos are involved in binding to DFP efficiently.

Conclusion
In this research, bioinformatics tools were applied to predict the efficient expression of the InaV/N-dfpase chimeric gene and production of functional DFPase enzyme on the cell surface of E. coli. Comparative modelling of chimeric protein was used to comprehend the docking sites and structural conformation domains. The in silico model proved that all of these findings were effective in targeting DFPase. This model can be considered as a suitable vision for expression of the enzyme at the cell surface before entering the laboratory phase and it saves time and money. However, these results need to be verified experimentally to determine if this construct can be used in degrading OP which accumulated in the soil.