Попцова Мария Сергеевна
Факультет компьютерных наук
Профессиональные интересы
Должности
- Директор центра — Факультет компьютерных наук, Институт искусственного интеллекта и цифровых наук, Центр биомедицинских исследований и технологий
- Доцент — Факультет компьютерных наук, Департамент больших данных и информационного поиска
- Академический руководитель образовательной программы — Анализ данных в биологии и медицине
Био
- · Начала работать в НИУ ВШЭ в 2016 году.
- · Научно-педагогический стаж: 17 лет.
Образование
- 2004 · Кандидат физико-математических наук: МГУ имени М.В. Ломоносова, специальность 01.00.00 «Физико-математические науки» и 03.01.02 «Биофизика», тема диссертации: Трансформация автоволн в локально неоднородных активных средах
- 1995 · Специалитет: Московский государственный университет им. М.В. Ломоносова, специальность «Физика», квалификация «Физик»
Опыт работы
- · 09/16-09/17 ,
- · доцент
- · факультет бизнеса и менеджмента, Высшая Школа Экономики
- · 10/12-н вр,
- · Старший научный сотрудник
- · кафедра биофизики физического факультета МГУ
- · 01/10-05/11
- · Научный сотрудник
- · Факультет Патологии и Лабораторной Медицины, Институт Вычислительной Биомедицины, Медицинский колледж Уэйлла-Корнелла, Корнелльский университет (Department of Pathology and Laboratory Medicine, Institute for Computational Biomedicine, Weill Cornell Medical College)
- · Краткая информация: работала в лаборатории, занимающейся исследованиями рака простаты. Разработала алгоритм и написала программу по вычислению степени влияния CNVs на биологические пути (в процессе публикации). Занималась анализом данных технологий секвенирования второго поколения с целью найти эндогенные причины разрыва генома при агрессивных формах опухоли.
- · 4/09-12/09, 2/05-1/08
- · Научный сотрудник
- · Факультет молекулярной и клеточной биологии, Коннектикутский университет (Molecular and Cell Biology Department, University of Connecticut)
- · Краткая информация: работала по гранту НАСА в рамках программы Applied Information Systems Research (AISR) program (http://aisrp.nasa.gov/ ). Участвовала в разработке алгоритмов по обработке больших массивов данных (в применении к биологическим системам) и реализации данных алгоритмов методом параллельных вычислений на кластерных системах (параллельных суперкомпьютерах) на основе Unix.
- · основатель и совладелец
- · Janussys, Ltd. (www.janussys.ru)
- · Компьютерно-лингвистическая компания по разработке программного обеспечения
- · Краткая информация: компания, работающая в области математической лингвистики, в частности, занимающаяся разработкой алгоритмов машинного перевода и созданием многоязычных словарей. Издатель мультимедийного англо-русского иллюстрированного словаря «Янус» (2002). В настоящее время ведет поиск инвесторов в проект создания системы машинного перевода нового поколения.
Награды и поощрения
- · Благодарность проректора НИУ ВШЭ (ноябрь 2025)
- · Почетная грамота НИУ ВШЭ (май 2025)
- · Благодарность проректора НИУ ВШЭ (октябрь 2024)
- · Благодарность проректора НИУ ВШЭ (декабрь 2023)
- · Благодарность первого проректора НИУ ВШЭ (декабрь 2023)
- · Благодарственное письмо первого проректора НИУ ВШЭ (февраль 2023)
- · Благодарность НИУ ВШЭ (май 2022)
- · Благодарность Факультета компьютерных наук НИУ ВШЭ (сентябрь 2019)
- · Благодарность проректора НИУ ВШЭ (май 2019)
- · Надбавка за публикацию в журнале из Списка А (и приравненном к нему научном издании) (2025–2026, 2024–2025, 2023–2024)
- · Надбавка за публикацию в международном рецензируемом научном издании (2022–2023, 2021–2022, 2020–2022, 2017–2019)
- · Лучший преподаватель — 2021
- · Лучший академический руководитель в номинации «Цифровые навыки студентов» — 2024–2025
- · Лучший академический руководитель в номинации «Удовлетворенность студентов качеством образовательной программы» — 2025
- · Лучший академический руководитель в номинации «Межфакультетское взаимодействие» — 2023–2024
- · Лучший академический руководитель в номинации «Работа студентов с внешними заказчиками» — 2023
- · Лучший академический руководитель в номинации «Привлечение студентов» — 2023
Гранты и проекты
- — · на соискание учёной степени кандидата наук
Идентификаторы исследователя
- ORCID:
0000-0002-7198-8234 - ResearcherID:
G-6985-2014 - SPIN РИНЦ:
1361-1087 - Google Scholar: https://scholar.google.com/citations?hl=en&user=9MoA58MAAAAJ
- Scopus AuthorID:
16177766600
Публикации (57)
Z-DNA formation in promoters conserved between human and mouse are associated with increased transcription reinitiation rates.
2024 · ARTICLE · en
A long-standing question concerns the role of Z-DNA in transcription. Here we use a deep learning approach DeepZ that predicts Z-flipons based on DNA sequence, structural properties of nucleotides and omics data. We examined Z-flipons that are conserved between human and mouse genomes after generating whole-genome Z-flipon maps and then validated them by orthogonal approaches based on high resolution chemical mapping of Z-DNA and the transformer algorithm Z-DNABERT. For human and mouse, we revealed similar pattern of transcription factors, chromatin remodelers, and histone marks associated with conserved Z-flipons. We found significant enrichment of Z-flipons in alternative and bidirectional promoters associated with neurogenesis genes. We show that conserved Z-flipons are associated with increased experimentally determined transcription reinitiation rates compared to promoters without Z-flipons, but without affecting elongation or pausing. Our findings support a model where Z-flipons engage Transcription Factor E and impact phenotype by enabling the reset of preinitiation complexes when active, and the suppression of gene expression when engaged by repressive chromatin complexes.
Machine learning models for predicting risks of MACEs for myocardial infarction patients with different VEGFR2 genotypes
2024 · ARTICLE · en
Background: The development of prognostic models for the identification of high-risk myocardial infarction (MI) patients is a crucial step toward personalized medicine. Genetic factors are known to be associated with an increased risk of cardiovascular diseases; however, little is known about whether they can be used to predict major adverse cardiac events (MACEs) for MI patients. This study aimed to build a machine learning (ML) model to predict MACEs in MI patients based on clinical, imaging, laboratory, and genetic features and to assess the influence of genetics on the prognostic power of the model. Methods: We analyzed the data from 218 MI patients admitted to the emergency department at the Surgut District Center for Diagnostics and Cardiovascular Surgery, Russia. Upon admission, standard clinical measurements and imaging data were collected for each patient. Additionally, patients were genotyped for VEGFR-2 variation rs2305948 (C/C, C/T, T/T genotypes with T being the minor risk allele). The study included a 9-year follow-up period during which major ischemic events were recorded. We trained and evaluated various ML models, including Gradient Boosting, Random Forest, Logistic Regression, and AutoML. For feature importance analysis, we applied the sequential feature selection (SFS) and Shapley’s scheme of additive explanation (SHAP) methods. Results: The CatBoost algorithm, with features selected using the SFS method, showed the best performance on the test cohort, achieving a ROC AUC of 0.813. Feature importance analysis identified the dose of statins as the most important factor, with the VEGFR-2 genotype among the top 5. The other important features are coronary artery lesions (coronary artery stenoses ≥70%), left ventricular (LV) parameters such as lateral LV wall and LV mass, diabetes, type of revascularization (CABG or PCI), and age. We also showed that contributions are additive and that high risk can be determined by cumulative negative effects from different prognostic factors. Conclusion: Our ML-based approach demonstrated that the VEGFR-2 genotype is associated with an increased risk of MACEs in MI patients. However, the risk can be significantly reduced by high-dose statins and positive factors such as the absence of coronary artery lesions, absence of diabetes, and younger age.
Conserved microRNAs and Flipons Shape Gene Expression during Development by Altering Promoter Conformations
2023 · ARTICLE · en
The classical view of gene regulation draws from prokaryotic models, where responses to environmental changes involve operons regulated by sequence-specific protein interactions with DNA, although it is now known that operons are also modulated by small RNAs. In eukaryotes, pathways based on microRNAs (miR) regulate the readout of genomic information from transcripts, while alternative nucleic acid structures encoded by flipons influence the readout of genetic programs from DNA. Here, we provide evidence that miR- and flipon-based mechanisms are deeply connected. We analyze the connection between flipon conformation and the 211 highly conserved human miR that are shared with other placental and other bilateral species. The direct interaction between conserved miR (c-miR) and flipons is supported by sequence alignments and the engagement of argonaute proteins by experimentally validated flipons as well as their enrichment in promoters of coding transcripts important in multicellular development, cell surface glycosylation and glutamatergic synapse specification with significant enrichments at false discovery rates as low as 10−116. We also identify a second subset of c-miR that targets flipons essential for retrotransposon replication, exploiting that vulnerability to limit their spread. We propose that miR can act in a combinatorial manner to regulate the readout of genetic information by specifying when and where flipons form non-B DNA (NoB) conformations, providing the interactions of the conserved hsa-miR-324-3p with RELA and the conserved hsa-miR-744 with ARHGAP5 genes as examples.
Unsupervised Domain Adaptation Methods for Cross-Species Transfer of Regulatory Code Signals
2023 · ARTICLE · en
Due to advances in NGS technologies whole-genome maps of various functional genomic elements were generated for a dozen of species, however experiments are still expensive and are not available for many species of interest. Deep learning methods became the state-of-the-art computational methods to analyze the available data, but the focus is often only on the species studied. Here we take advantage of the progresses in Transfer Learning in the area of Unsupervised Domain Adaption (UDA) and tested nine UDA methods for prediction of regulatory code signals for genomes of other species. We tested each deep learning implementation by training the model on experimental data from one species, then refined the model using the genome sequence of the target species for which we wanted to make predictions. Among nine tested domain adaptation architectures non-adversarial methods Minimum Class Confusion (MCC) and Deep Adaptation Network (DAN) significantly outperformed others. Conditional Domain Adversarial Network (CDAN) appeared as the third best architecture. Here we provide an empirical assessment of each approach using real world data. The different approaches were tested on ChIP-seq data for transcription factor binding sites and histone marks on human and mouse genomes, but is generalizable to any cross-species transfer of interest. We tested the efficiency of each method using species where experimental data was available for both. The results allows us to assess how well each implementation will work for species for which only limited experimental data is available and will inform the design of future experiments in these understudied organisms. Overall, our results proved the validity of UDA methods for generation of missing experimental data for histone marks and transcription factor binding sites in various genomes and highlights how robust the various approaches are to data that is incomplete, noisy and susceptible to analytic bias.
DeepZ: A Deep Learning Approach for Z-DNA Prediction
2023 в печати · CHAPTER · en
Here we describe an approach that uses deep learning neural networks such as CNN and RNN to aggregate information from DNA sequence; physical, chemical, and structural properties of nucleotides; and omics data on histone modifications, methylation, chromatin accessibility, and transcription factor binding sites and data from other available NGS experiments. We explain how with the trained model one can perform whole-genome annotation of Z-DNA regions and feature importance analysis in order to define key determinants for functional Z-DNA regions.
Z-flipon variants reveal the many roles of Z-DNA and Z-RNA in health and disease
2023 · ARTICLE · en
Identifying roles for Z-DNA remains challenging given their dynamic nature. Here, we perform genome-wide interrogation with the DNABERT transformer algorithm trained on experimentally identified Z-DNA forming sequences (Z-flipons). The algorithm yields large performance enhancements (F1 = 0.83) over existing approaches and implements computational mutagenesis to assess the effects of base substitution on Z-DNA formation. We show Z-flipons are enriched in promoters and telomeres, overlapping quantitative trait loci for RNA expression, RNA editing, splicing, and disease-associated variants. We cross-validate across a number of orthogonal databases and define BZ junction motifs. Surprisingly, many effects we delineate are likely mediated through Z-RNA formation. A shared Z-RNA motif is identified in SCARF2, SMAD1, and CACNA1 transcripts, whereas other motifs are present in noncoding RNAs. We provide evidence for a Z-RNA fold that promotes adaptive immunity through alternative splicing of KRAB domain zinc finger proteins. An analysis of OMIM and presumptive gnomAD loss-of-function datasets reveals an overlap of Z-flipons with disease-causing variants in 8.6% and 2.9% of Mendelian disease genes, respectively, greatly extending the range of phenotypes mapped to Z-flipons.
Z-flipon variants reveal the many roles of Z-DNA and Z-RNA in health and disease
2023 в печати · ARTICLE · en
Identifying roles for Z-DNA remains challenging given their dynamic nature. Here, we perform genome-wide interrogation with the DNABERT transformer algorithm trained on experimentally identified Z-DNA forming sequences (Z-flipons). The algorithm yields large performance enhancements (F1 = 0.83) over existing approaches and implements computational mutagenesis to assess the effects of base substitution on Z-DNA formation. We show Z-flipons are enriched in promoters and telomeres, overlapping quantitative trait loci for RNA expression, RNA editing, splicing, and disease-associated variants. We cross-validate across a number of orthogonal databases and define BZ junction motifs. Surprisingly, many effects we delineate are likely mediated through Z-RNA formation. A shared Z-RNA motif is identified in SCARF2, SMAD1, and CACNA1 transcripts, whereas other motifs are present in noncoding RNAs. We provide evidence for a Z-RNA fold that promotes adaptive immunity through alternative splicing of KRAB domain zinc finger proteins. An analysis of OMIM and presumptive gnomAD loss-of-function datasets reveals an overlap of Z-flipons with disease-causing variants in 8.6% and 2.9% of Mendelian disease genes, respectively, greatly extending the range of phenotypes mapped to Z-flipons.
Regulatory potential of flipons revealed by deep learning.
2023 · CHAPTER · en
Flipons – non-B DNA conformations – have been shown to play an important role in various genomic processes. Flipons identification and localization is difficult due to their dynamic nature. We developed deep learning approaches to identify non-B DNA secondary structures using available information from thousands of omics data sets. We created DeepZ models based on CNN and RNN, and Z-DNABERT model based on transformer algorithm to predict Z-flipons at the genome-wide scale. We showed Z-flipon enrichment in promoters and telomeres and overlap quantitative trait loci for RNA expression, RNA editing, splicing and disease associated variants. We applied the same approach to quadruplexes and triplexes and generated whole-genome predictions. We detected that miR- and flipon-based mechanisms are deeply connected. We found direct interaction of conserved miR and engagement of argonaute proteins with experimentally validated flipons. Evidences where flipon variants affect phenotype are provided by case studies.
Консервативные Z-флипоны и ассоциированные с ними омиксные факторы, в геномах мыши и человека.
2023 · CHAPTER · ru
Разработанный нами ранее подход DeepZ [1], основанный на глубинных нейронных сетях и использующий как данные о последовательности, так и омиксные данные, был использован для генерации полногеномных аннотаций генома мыши и человека участками Z-ДНК. В данной работе мы использовали подход DeepZ для изучения консервативных Z-флипонов и консервативных транскрипционных факторов и гистоновых меток, которые обогащены Z-флипонами в обоих геномах. Мы отобрали более 500 одинаковых признаков (более 400 транскрипционных факторов, более 50 гистоновых меток) и обучили модель DeepZ на данных ChIP-seq для геномов мыши и человека, используя одинаковый набор омиксных данных. Анализ на ассоциацию омиксных признаков с Z-флипонами показал значимое обогащение модификаций ацетилирования, ассоциированных с областями активной транскрипции, у человека и у мыши, таких как H3K14ac, H3K36ac, H4K12ac, H4K8ac, а также метки бимодального хроматина, ассоциированные с процессами развития H2A.Z и H3.3. Среди ортологичных транскрипционных факторов, обогащенных Z-флипонами в обоих геномах в первые 10 входят MYC, MAX, BRD4, ESR1, EP300, RUNX1, ERG, FOXA1, SMARCA4, KDM2B. Большинство факторов транскрипции, локализованные вместе с Z-флипонами, задействованы в организации и ремоделировании хроматина, перепрограммировании гистоновых меток, а также регуляции транскрипции. Исследование распределения флипонов по геномным областям показало обогащение в промоторных областях, в частности, в альтернативных и двунаправленных промоторах. Таким образом, характерные паттерны консервативных омиксных признаков, обогащенные Z-флипонами из консервативных областей генома человека и мыши, подтверждают регуляторный потенциал Z-ДНК.
Unsupervised domain adaptation methods for cross-species transfer of regulatory code signals
2023 · CHAPTER · en
Курсы (9)
-
Биоинформатика ДНК, РНК и белков · 4 раза
2025/2026, 2024/2025, 2023/2024, 2022/2023 · Майнор · рус
-
Медицинская биоинформатика · 4 раза
2025/2026, 2024/2025, 2023/2024, 2022/2023 · Майнор · рус
-
Machine Learning in Bioinformatics · 4 раза
2025/2026, 2024/2025, 2023/2024, 2022/2023 · Магистратура / Маго-лего · Анг
-
Методы машинного обучения в биоинформатике
2024/2025 · Маго-лего · рус
-
Сравнительная геномика
2024/2025 · Магистратура / Маго-лего · рус
-
Биоинформатика · 2 раза
2022/2023, 2021/2022 · Бакалавриат · рус
-
Research Seminar "Data Analysis in the Natural Sciences"
2022/2023 · Бакалавриат · Анг
-
Молекулярная эволюция
2021/2022 · Магистратура · рус
-
Modern Methods of Data Analysis
2021/2022 · Магистратура · Анг