Prediction of Indels and SNP’s in coding regions of glutathione peroxidases – an important enzyme in redox homeostasis of plants

Plant glutathione peroxidases are an important class of enzymes which play key roles in the stress adaptability of plants both in context of biotic and abiotic stress pathways. They have been over the years much studied in animals since the catalytic residues are comprised of selenocysteine a variant amino acid which is ribosomally encoded with the help of an RNA structural element known as SECIS. Various workers over the years have shown that plant glutathione peroxidases play active roles in ROS sequestration, lipid hydroperoxidation as well as regulate glutathione levels. However, each plant has various patterns of glutathione peroxidase expression and action and in some plants certain isoforms have not been detected at all. This work focuses on the prediction and identification of single nucleotide polymorphisms (SNPs) and INDELs in the coding regions of plant glutathione peroxidases, with the help of a Bayesian based algorithm subsequently validated. A large number of informative sites were detected 279 of which had variant frequency of ≥ 50 %. This data should be beneficial for future studies involving genetic manipulation and population based breeding experiments.


INTRODUCTION
Single nucleotide polymorphisms (SNPs) and segmental insertions and deletions (indels) represent two major classes of molecular markers, which have attained a large amount of importance in plant population genetics studies.These two classes of markers along with the differences in tandem repeats at a particular locus (microsatellites, SSR's /ISSR's) comprise the three major groups of allelic variations within a particular genome.Among the three major groups SNP's have generated a lot of attraction owing to the fact that they are stable and are the most frequent type of genetic polymorphisms (Syvanen 2001).Various studies have been performed using SNP data, for the analyses of genetic diversity (Varshney 2008), deciphering substructures in populations (Garris et.al 2003, Rakshit 2007, Caicedo 2007); identifying linkage disequilibrium in genomes (Mather 2007, Agrama 2008); and various other screening efforts.
Glutathione peroxidases in plants have been identified to be involved in abiotic stress an responsive pathway which aims to maintain a redox hoemostasis.These enzymes (E.C > 1.11.1.9)have a broad substrate specificity; however, their main affinity is towards H 2 O 2 .
The main reactions that they catalyze are the reduction of H 2 O 2 and lipid hydroperoxide to water and alcohol.Chen et.al. (2004) has also reported the role of GpXs in controlling oxidative burst and programmed cell death in Arabidopsis.Phospholipid hydroperoxide glutathione peroxidases (PHGPx) is an unique member of this family of enzymes as it has the ability to catalyze the reduction of phospholipid hydroperoxide and other complex hydroperoxilipids -components of the lipid bilayer.
Computational identification of SNP's in glutathione peroxidases was attempted keeping in mind the importance of this enzyme family as a key abiotic stress regulator which has the potential to serve as an important biomarker for differentiating the levels of ROShomeostasis in plants.Apart from this several plants with agronomic values can be bred in such a way that their GpX load in the genome is maintained and they can serve as stress tolerant genotypes (STGs).

MATERIAL AND METHOD
Sequences were retrieved from the NCBI -GenBank collection and were subsequently curated for obtaining the complete sequences.Partial, hypothetical and incomplete sequences were not considered.Following that an extensive Bayesian based algorithm was used taking into account the depth of the alignment, associated base composition in the region and a standardized priori polymorphism rate.Once the predictions were made the results were validated using the Geneious Pro suite.

RESULTS AND DISCUSSION
The final curated sets of sequences were 400 in number which possessed the complete coding sequences.The results indicated 2210 informative sites out of which 1186 were attributable to SNPs whereas 1024 sites were classified as Indels.A total of 279 sites were found to have a variant frequency of greater than equal to 50 out of which 129 were SNPs and 150 sites were Indels.Plant SNP data in context of glutathione peroxidase is very limited in the standard archives such as dbSNP of NCBI; and is at this point restricted to information from Arabidopsis thaliana with only 94 entries at this point of time.
Different plants are enriched in different subset of genes and more importantly a same plant may exhibit variant responses in two different stress conditions both biotic and abiotic.Reverse genetics strategies such as post transcriptional gene silencing, insertional mutagenesis, TILLING etc. have been successfully used for identifying single nucleotide polymorphisms and production of adaptable cultivars (Henikoff 2003).Thus the identification of SNPs and correlating that variation with an important agronomic or stress adaptable trait is important for production of better crop species as well as to understand the genetic strategies of the different plant genomes.
The identification of these SNPs and variants such as Indels should be validated in the wet lab through sequencing techniques and subsequent in silico analyses.Molecular modelling and subsequent in silico mutagenesis (Ganguli et.al. 2013) should also be useful for detecting whether the SNP creates any structural or functional anomaly in the 3D structure of the protein.

50
International Letters of Natural Sciences Vol. 7

CONCLUSIONS
A large number of informative sites were identified in the present study conforming to SNP positions as well as Indels.These informative sites predicted are all in the coding region of the genes and thus possess the ability to alter the function of the encoded protein.Thus these should be validated and reported using NGS methods and subsequent computational analyses.International Letters of Natural Sciences Vol. 7

Table 1 .
Data showing those informative sites which have variant frequencies of ≥ 50 %.