CysDuF: Database to decipher Cysteine post-translational modifications of Domain of Unknown Function (DUFs) proteins.
CysDuF Database
Experimental characterization of amino acid functions of Domains of Unknown Function (DUF) proteins is expensive, tedious, and time-consuming which could be complemented by computational methods. Cysteine (Cys), the second most reactive amino acid at the catalytic sites of enzymes, deserves functional annotation of DUF proteins which are abundant in the electron transport chain (ETC). Earlier we reported functional annotation of Cys on DUF proteins belonging to the Cytochrome C Oxidase Subunit II (COX II) family. However, holistic characterization of Cys functions on DUF proteins is not known, to the best of our knowledge. Here, we annotate and characterize Cysteine functions and protein microenvironments in different biochemical pathways and diseases, across taxonomic kingdoms. The information on uncharacterized DUF proteins was initially obtained from literature and the final updated protein list was retrieved from the SCOPe database using those DUF IDs. The sequence, structure, pathways, taxonomy, and disease information were retrieved from the given SCOPe and DUF IDs. Further structural analysis yielded the protein microenvironments (MENV) around Cys amino acids involved in different pathways. The Cysteine functional annotation was performed using the in-house Cys-function prediction server, DeepCys. The consolidated information was stored in the database. Input to the database are DUF ID, PFAM ID, or PDB ID. The output, downloadable in CSV, Excel, JSON, or text format, includes the following parameters: PFAM ID, DUF ID, Name of the DUF, SCOPe Family, SCOPe SuperFamily, Species, Pathway, Chain ID, PDB ID, MENV, and Cys post-translational modifications. The biochemical pathways included in this database are Electron Transport Chain, Glutathione Metabolism, Fe-S-Cluster Biogenesis, Fatty Acid Synthesis, Photosynthesis, Kreb’s Cycle and Pentose phosphate pathway. The taxonomic kingdoms included in this work are bacteria, archaebacteria, eukaryotes, and viruses. This is the first database reporting the protein microenvironment (MENV) around Cys with predicted post-translational modifications on DUF proteins.