Genelens: A Python Package Implementing Monte Carlo Machine Learning and Network Analysis Methods for Biomarker Discovery and Gene Functional Annotation
- Autores: Osmak G.Z.1,2, Pisklova M.V.1,2
-
Afiliações:
- Chazov National Medical Research Center for Cardiology
- Pirogov Russian National Research Medical University
- Edição: Volume 59, Nº 5 (2025)
- Páginas: 845-854
- Seção: БИОИНФОРМАТИКА
- URL: https://genescells.com/0026-8984/article/view/696392
- DOI: https://doi.org/10.31857/S0026898425050096
- ID: 696392
Citar
Texto integral
Resumo
We present GeneLens, a Python package for comprehensive analysis of differentially expressed genes and biomarker discovery. The package consists of two core modules: FSelector for biomarker identification by utilizing Monte Carlo simulations of L1-regularized models, and NetAnalyzer for functional prediction of selected gene sets based on the topology of their protein-protein interaction networks.The FSelector includes: (1) automated gene selection through iterative bootstrap sampling; (2) calculation of gene significance weights taking into account ROC-AUC model performance and their number in simulations; (3) adaptive thresholding for feature space reduction. NetAnalyzer performs pathway enrichment analysis while integrating significance weights from FSelector. Implemented as a PIP module, GeneLens provides standardized algorithms for applying machine learning and network analysis methods in differential gene expression studies, along with automated model hyperparameter tuning and visualization tools.
Palavras-chave
Sobre autores
G. Osmak
Chazov National Medical Research Center for Cardiology; Pirogov Russian National Research Medical University
Email: german.osmak@gmail.com
Moscow, 121552 Russia; Moscow, 117997 Russia
M. Pisklova
Chazov National Medical Research Center for Cardiology; Pirogov Russian National Research Medical UniversityMoscow, 121552 Russia; Moscow, 117997 Russia
Bibliografia
- Altman N., Krzywinski M. (2018) The curse of dimensionality. Nat. Methods. 15, 399–400.
- Altman N., Krzywinski M. (2017) Ensemble methods: bagging and random forests. Nat. Methods. 14, 933–935.
- Осьмак Г., Писклова М. (2025) Транскриптомика и “проклятие размерности”: Монте-Карло симуляции классификационных моделей как инструмент анализа многомерных данных в задачах поиска маркеров биологических процессов. Молекуляр. биология. 59, 143–149.
- Pisklova M., Osmak G. (2024) Unveiling miRNA-124 as a biomarker in hypertrophic cardiomyopathy: an innovative approach using machine learning and intelligent data analysis. Int. J. Cardiol. 410, 132220.
- Osmak G., Kiselev I., Baulina N., Favorova O. (2020) From miRNA target gene network to miRNA function: miR-375 might regulate apoptosis and actin dynamics in the heart muscle via Rho-GTPases-dependent pathways. Int. J. Mol. Sci. 21, 9670.
- Tibshirani R. (1996) Regression shrinkage and selection via the lasso. J. R. Stat. Soc.: Ser. B (Methodological). 58, 267–288.
- Hastie T., Tibshirani R., Friedman J.H., Friedman J.H. (2009) The elements of statistical learning: data mining, inference, and prediction. N.Y.: Springer.
Arquivos suplementares



