Prediction of new nΕ-acetylation sites in the human proteome based on molecular multilevel neighborhoods of atoms descriptors

封面

如何引用文章

全文:

开放存取 开放存取
受限制的访问 ##reader.subscriptionAccessGranted##
受限制的访问 订阅或者付费存取

详细

The Nε-acetylation of lysine residues is one of the most common processes of post-translational protein modification. As a result of the reaction between the ε-amino group of the side chain of Lys and the activated acetyl, an amide bond is formed, which leads to a change in the charge of the protein in the region of the modification. The growing interest in such sites is due to the influence of Nε-acetylation of Lys residues on the regulation of cellular activity, the disruption of which can lead to pathological conditions. Furthermore, the prediction of the Nε-acetylation sites of Lys residues serves as a tool for planning an experiment design in modern proteomics, since the presence of a forecast simplifies the choice of proteolysis strategy, the interpretation of controversial mass spectra and the selection of proteotypic peptides. Here, were propose a new approach for predicting the Nε-acetylation sites of Lys residues in human proteins using machine learning techniques. A feature of the approach is the use of structural formulas of peptides containing a potential Nε-acetylation site and their description in the form of Multilevel Neighborhoods of Atoms (MNA) descriptors. Such descriptors are recursively generated for each atom of the molecule. A level zero descriptor represents the atom itself, the first level descriptor includes the atom and all atoms one bond away from it, and so on. Classification models for predicting Nε-acetylation sites of Lys residues were built using the previously developed MultiPASS program based on the analysis of more than 23 000 sites from the PhosphoSitePlus database. The best model was obtained when the peptide length of 35 amino acid residues and using level 9 MNA descriptors. In fivefold cross-validation, the sensitivity, specificity, and ROC-AUC of the developed model were 0.71, 0.74, and 0.82, respectively. The model identified 1,136 previously unknown potential sites in 418 proteins of the human reference proteome at a classification threshold defined as the difference in the probabilities of site assignment to positive (Pa) and negative (Pi) classes, (Pa – Pi) ≥ 0.7. The obtained data can serve as a basis for further proteomic studies aimed at identifying and functionally annotating new Nε-acetylation sites of Lys in human proteins.

作者简介

N. Lebedev

Pirogov Russian National Research Medical University

Email: lebedev_nv@rsmu.ru
Moscow, 117997 Russia

D. Filimonov

Institute of Biomedical Chemistry

Moscow, 119121 Russia

V. Poroikov

Institute of Biomedical Chemistry

Moscow, 119121 Russia

A. Lagunin

Pirogov Russian National Research Medical University; Institute of Biomedical Chemistry

Moscow, 117997 Russia; Moscow, 119121 Russia

参考

  1. Jensen O.N. (2006) Interpreting the protein language using proteomics. Nat. Rev. Mol. Cell Biol. 7, 391–403.
  2. Li Z., Li S., Luo M., Jhong J.H., Li W., Yao L., Pang Y., Wang Z., Wang R., Ma R., Yu J., Huang Y., Zhu X., Cheng Q., Feng H., Zhang J., Wang C., Hsu J.B., Chang W.C., Wei F.X., Huang H.D., Lee T.Y. (2022) dbPTM in 2022: an updated database for exploring regulatory networks and functional associations of protein post-translational modifications. Nucleic Acids Res. 50, D471–D479.
  3. Ree R., Varland S., Arnesen T. (2018) Spotlight on protein N-terminal acetylation. Exp. Mol. Med. 50, 1–13.
  4. Narita T., Weinert B.T., Choudhary C. (2019) Functions and mechanisms of non-histone protein acetylation. Nat. Rev. Mol. Cell Biol. 20, 156–174.
  5. Shvedunova M., Akhtar A. (2022) Modulation of cellular processes by histone and non-histone protein acetylation. Nat. Rev. Mol. Cell Biol. 23, 329–349.
  6. Changjun Mu, Heng Liu, Guo-Chang Zheng (2007) Модификации и варианты гистонов: их роль в организации хроматина. Молекуляр. биология. 41, 395–407.
  7. Lei Z., Song X., Zheng X., Wang Y., Wang Y., Wu Z., Fan T., Dong S., Cao H., Zhao Y., Xia Z., Gao L., Shang Q., Mei S. (2024) Identification of two novel heterozygous variants of SMC3 with Cornelia de Lange syndrome. Mol. Genet. Genomic Med. 12, e2447.
  8. Pasqualucci L., Dominguez‒Sola D., Chiarenza A., Fabbri G., Grunn A., Trifonov V., Kasper L.H., Lerach S., Tang H., Ma J., Rossi D., Chadburn A., Murty V.V., Mullighan C.G., Gaidano G., Rabadan R., Brindle P.K., Dalla-Favera R. (2011) Inactivating mutations of acetyltransferase genes in B-cell lymphoma. Nature. 471, 189–195.
  9. You L., Nie J., Sun W.J., Zheng Z.Q., Yang X.J. (2012) Lysine acetylation: enzymes, bromodomains and links to different diseases. Essays Biochem. 52, 1–12.
  10. Park E., Kim Y., Ryu H., Kowall N.W., Lee J., Ryu H. (2014) Epigenetic mechanisms of Rubinstein‒Taybi syndrome. Neuromolecular Med. 16, 16–24.
  11. Xia Z., Kon N., Gu A.P., Tavana O., Gu W. (2022) Deciphering the acetylation code of p53 in transcription regulation and tumor suppression. Oncogene. 41, 3039–3050.
  12. Basith S., Chang H.J., Nithiyanandam S., Shin T.H., Manavalan B., Lee G. (2022) Recent trends on the development of machine learning approaches for the prediction of lysine acetylation sites. Curr. Med. Chem. 29, 235–250.
  13. Hornbeck P.V., Kornhauser J.M., Latham V., Murray B., Nandhikonda V., Nord A., Skrzypek E., Wheeler T., Zhang B., Gnad F. (2019) 15 years of PhosphoSitePlus(R): integrating post-translationally modified sites, disease variants and isoforms. Nucleic Acids Res. 47, D433–D441.
  14. Svinkina T., Gu H., Silva J.C., Mertins P., Qiao J., Fereshetian S., Jaffe J.D., Kuhn E., Udeshi N.D., Carr S.A. (2015) Deep, quantitative coverage of the lysine acetylome using novel anti-acetyl-lysine antibodies and an optimized proteomic workflow. Mol. Cell Proteomics. 14, 2429–2440.
  15. O’Shea J.P., Chou M.F., Quader S.A., Ryan J.K., Church G.M., Schwartz D. (2013) pLogo: a probabilistic approach to visualizing sequence motifs. Nat. Methods. 10, 1211–1212.
  16. Dalby A., Nourse J.G., Hounshell W.D., Gushurst A.K.I., Grier D.L., Leland B.A., Laufer J. (1992) Description of several chemical structure file formats used by computer programs developed at Molecular Design Limited. J. Chem. Inform. Comp. Sci. 32, 244–255.
  17. Филимонов Д.А., Дружиловский Д.С., Лагунин А.А., Глориозова Т.А., Рудик А.В., Дмитриев А.В., Погодин П.В., Поройков В.В. (2018) Компьютерное прогнозирование спектров биологической активности химических соединений: возможности и ограничения. Biomedical Chemistry: Research and Methods. 1(1), e00004. doi: 10.18097/BMCRM00004
  18. Smirnov A.S., Rudik A.V., Filimonov D.A., Lagunin A.A. (2023) TCR-Pred: a new web-application for prediction of epitope and MHC specificity for CDR3 TCR sequences using molecular fragment descriptors. Immunology. 169, 447–453.
  19. Zadorozhny A., Smirnov A., Filimonov D., Lagunin A. (2023) Prediction of pathogenic single amino acid substitutions using molecular fragment descriptors. Bioinformatics. 39(8), btad484.
  20. Zhuravleva S.I., Zadorozhny A.D., Shilov B.V., Lagunin A.A. (2023) Prediction of amino acid substitutions in ABL1 protein leading to tumor drug resistance based on “structure‒property” relationship classification models. Life (Basel). 13(9), 1807.
  21. Карасев Д.А., Савосина П.И., Соболев Б.Н., Филимонов Д.А., Лагунин А.А. (2017) Использование молекулярных дескрипторов для распознавания сайтов фосфорилирования в аминокислотных последовательностях. Биомед. химия. 63, 423–427.
  22. Chang A., Jeske L., Ulbrich S., Hofmann J., Koblitz J., Schomburg I., Neumann‒Schaal M., Jahn D., Schomburg D. (2021) BRENDA, the ELIXIR core data resource in 2021: new developments and updates. Nucleic Acids Res. 49, D498–D508.
  23. Mistry J., Chuguransky S., Williams L., Qureshi M., Salazar G.A., Sonnhammer E.L.L., Tosatto S.C.E., Paladin L., Raj S., Richardson L.J., Finn R.D., Bateman A. (2021) Pfam: the protein families database in 2021. Nucleic Acids Res. 49, D412–D419.
  24. Harris M.A., Clark J., Ireland A., Lomax J., Ashbu- rner M., Foulger R., Eilbeck K., Lewis S., Marshall B., Mungall C., Richter J., Rubin G.M., Blake J.A., Bult C., Dolan M., Drabkin H., Eppig J.T., Hill D.P., Ni L., Ringwald M., Balakrishnan R., Cherry J.M., Christie K.R., Costanzo M.C., Dwight S.S., Engel S., Fisk D.G., Hirschman J.E., Hong E.L., Nash R.S., Sethuraman A., Theesfeld C.L., Botstein D., Dolinski K., Feierbach B., Berardini T., Mundodi S., Rhee S.Y., Apweiler R., Barrell D., Camon E., Dimmer E., Lee V., Chisholm R., Gaudet P., Kibbe W., Kishore R., Schwarz E.M., Sternberg P., Gwinn M., Hannick L., Wortman J., Berriman M., Wood V., de la Cruz N., Tonellato P., Jaiswal P., Seigfried T., White R., Gene Ontology Consortium (2004) The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res. 32, D258–261.
  25. UniProt Consortium (2023) UniProt: the Universal Protein Knowledgebase in 2023. Nucleic Acids Res. 51, D523–D531.
  26. Yu G. (2024) Thirteen years of clusterProfiler. Innovation (Camb.). 5, 100722.
  27. Lovci M.T., Bengtson M.H., Massirer K.B. (2016) Post-translational modifications and RNA-binding proteins. Adv. Exp. Med. Biol. 907, 297–317.
  28. Chen P.J., Huang Y.S. (2012) CPEB2–eEF2 interaction impedes HIF-1α RNA translation. EMBO J. 31, 959–971.
  29. Jacob A.L., Lund J., Martinez P., Hedin L. (2001) Acetylation of steroidogenic factor 1 protein regulates its transcriptional activity and recruits the coactivator GCN5. J. Biol. Chem. 276, 37659–37664.
  30. Shang S., Liu J., Hua F. (2022) Protein acylation: mechanisms, biological functions and therapeutic targets. Signal Transduct. Target. Ther. 7, 396.
  31. Chu C.W., Hou F., Zhang J., Phu L., Loktev A.V., Kirkpatrick D.S., Jackson P.K., Zhao Y., Zou H. (2011) A novel acetylation of β-tubulin by San modulates microtubule polymerization via down-regulating tubulin incorporation. Mol. Biol. Cell. 22, 448–456.
  32. Thygesen C., Boll I., Finsen B., Modzel M., Larsen M.R. (2018) Characterizing disease-associated changes in post-translational modifications by mass spectrometry. Expert Rev. Proteomics. 15, 245–258.
  33. Landrum M.J., Lee J.M., Benson M., Brown G., Chao C., Chitipiralla S., Gu B., Hart J., Hoffman D., Hoover J., Jang W., Katz K., Ovetsky M., Riley G., Sethi A., Tully R., Villamarin-Salomon R., Rubinstein W., Maglott D.R. (2016) ClinVar: public archive of interpretations of clinically relevant variants. Nucleic Acids Res. 44, D862–D868.
  34. Rüegsegger L., Schanz U., Seipel K., Pabst T., Schwegler J., Schmidt E., Schmidt A. (2022) Emberger syndrome – a family history over 3 generations. Healthbook TIMES Oncology. Hematology. 14, 34–41.
  35. Bresnick E.H., Jung M.M., Katsumura K.R. (2020) Human GATA2 mutations and hematologic disease: how many paths to pathogenesis? Blood Adv. 4, 4584–4592.
  36. Su M.G., Weng J.T., Hsu J.B., Huang K.Y., Chi Y.H., Lee T.Y. (2017) Investigation and identification of functional post-translational modification sites associated with drug binding and protein–protein interactions. BMC Syst. Biol. 11, 132.

补充文件

附件文件
动作
1. JATS XML

版权所有 © Russian Academy of Sciences, 2025