Шайтан Алексей Константинович

Факультет компьютерных наук

Профиль на hse.ru ↗ тел.: 27338

Публикаций

Языков

Наград

Конференций

Профиль Публикации (19) Курсы (2)

Профессиональные интересы

эпигенетикануклеосомыструктурная биоинформатика

Должности

Ведущий научный сотрудник — Факультет компьютерных наук, Институт искусственного интеллекта и цифровых наук, Центр биомедицинских исследований и технологий
Профессор — Факультет компьютерных наук, Департамент больших данных и информационного поиска

Био

· Начал работать в НИУ ВШЭ в 2021 году.
· Научно-педагогический стаж: 17 лет.

Образование

2022 · Ученое звание: Профессор РАН
2022 · Член-корреспондент РАН
2021 · Доктор физико-математических наук: Московский государственный университет им. М.В. Ломоносова
2010 · Кандидат физико-математических наук
2007 · Специалитет: Московский государственный университет им. М.В. Ломоносова, специальность «Физика конденсированного состояния вещества», квалификация «Физик»

Опыт работы

· 02.2021-Н/В
· Научный сотрудник
· Международная лаборатория биоинформатики, Высшая Школа Экономики
· Visiting fellow
· Национальный центр биотехнологической информации, Национальные институты здоровья, США
· Ведущий научный сотрудник
· МГУ имени М.В. Ломоносова, Биологический факультет, кафедра биоинженерии
· Научный сотрудник
· МГУ имени М.В. Ломоносова, Биологический факультет, кафедра биоинженерии
· Ученый секретарь
· Технологическая платформа «Стратегические информационные технологии»
· Ведущий инженер
· МГУ имени М.В. Ломоносова, Биологический факультет, кафедра биоинженерии
· Научный сотрудник
· Университет Ульма, Германия, Институт Полимерных Наук
· Учитель
· ГОУ СОШ 1326/Лицей 1586, предметы: компьютерное моделирование (10-11 классы), компьютерные методы в биологии и физике (7-9 классы)

Награды и поощрения

· Персональная надбавка ректора (2021–2022)
· Надбавка за публикацию в журнале из Списка А (и приравненном к нему научном издании) (2025–2026)

Гранты и проекты

— · В Китае дан старт работе Российско-китайского института фундаментальных исследований. В его состав вошли исследовательские центры по математике, физике, химии, науках о жизни и науках о Земле, в их работе будут участвовать ученые НИУ ВШЭ. Также в рамках конференции был представлен проект «Россия и Китай: математика» по изданию 100 учебников и монографий в течение десяти лет. Членами редколлегии стали представители НИУ ВШЭ Иван Аржанцев и Сергей Ландо.

Идентификаторы исследователя

ORCID: 0000-0003-0312-938X
ResearcherID: D-7306-2012
SPIN РИНЦ: 6275-5481
Google Scholar: https://scholar.google.com/citations?user=Ug0JSXEAAAAJ
Scopus AuthorID: 26635808600

Публикации (19)

(Re)defining the human chromatome: an integrated meta-analysis of localization, function, abundance, physical properties, and domain composition of chromatin proteins

2026 · ARTICLE · en

The full complement of chromatin-associated proteins—collectively referred to as the chromatome—enables genome functioning in eukaryotes by participating in a wide range of physico-chemical processes. These include mediating diverse specific and nonspecific intermolecular interactions, catalyzing in situ synthesis and modification of macromolecules, facilitating ATP-dependent chromatin remodeling, etc. Despite considerable progress in epigenomics and the structural characterization of many nuclear proteins and their complexes, our understanding of chromatin organization at the proteome scale remains incomplete. This gap hinders the development of a holistic view of genome regulation. In this study, we present a state-of-the-art characterization of the human chromatome based on an integrative meta-analysis of diverse data sources describing the composition, abundance, and sub-nuclear localization of chromatin proteins. This effort is complemented by original analyses of their physico-chemical properties, domain architectures, and interaction patterns. To support and streamline these analyses, we developed a reference dataset of chromatin proteins, integrated with an empirical, function-based classification ontology and an associated interactive web resource—SimChrom—accessible at https://simchrom.intbio.org/. The reference dataset was carefully curated by reconciling data among protein databases, localization, and mass spectrometry-based experimental studies. Sequence-based and AI-assisted structural analyses revealed previously unannotated domains within chromatin proteins that warrant experimental validation, as well as the widespread use of multivalent interaction strategies that underpin chromatin organization. Together, our findings establish a robust framework for future studies aimed at elucidating genome function through detailed analysis of protein–protein and protein–nucleic acid interactions within chromatin.

DOI ↗ PDF ↗

The need to implement FAIR principles in biomolecular simulations

2025 · ARTICLE · en

The communities that embraced data archiving efforts decades ago are now, in the era of data-driven biology, gaining the most from the AI revolution. The structural biology community was a pioneer in this regard, establishing the Protein Data Bank in 1971 and making data accessible using the FAIR principles even before these were articulated1,2. The genomics and bioinformatics community has followed the example, establishing many widely used databases3,4. By contrast, molecular simulation has been anchored in usage paradigms dating back to the seventies, when molecular dynamics (MD) simulation was first applied to study biomacromolecules5. At that time, MD was used by theoretical physicists and chemists in proof-of-concept simulations, but 50 years later, MD has evolved into a cornerstone molecular biology technique that can provide accurate, quantitative analysis and property prediction. MD is now employed by tens of thousands of researchers worldwide, accounting for roughly 15% of global supercomputer usage. Unfortunately, these rich and costly data are not systematically maintained, and when further analyses are required, simulations have to be rerun — an unacceptable situation from scientific, environmental and sustainability standpoints. In this letter, we argue for a collaborative endeavor to archive MD simulation data and describe ongoing efforts to establish cost-effective and sustainable data archiving strategies.

DOI ↗ PDF ↗

iPiDA-LGE: a local and global graph ensemble learning framework for identifying piRNA-disease associations

2025 · ARTICLE · en

Exploring piRNA-disease associations can help discover candidate diagnostic or prognostic biomarkers and therapeutic targets. Several computational methods have been presented for identifying associations between piRNAs and diseases. However, the existing methods encounter challenges such as over-smoothing in feature learning and overlooking specific local proximity relationships, resulting in limited representation of piRNA-disease pairs and insufficient detection of association patterns. In this study, we propose a novel computational method called iPiDA-LGE for piRNA-disease association identification. iPiDA-LGE comprises two graph convolutional neural network modules based on local and global piRNA-disease graphs, aimed at capturing specific and general features of piRNA-disease pairs. Additionally, it integrates their refined and macroscopic inferences to derive the final prediction result. The experimental results show that iPiDA-LGE effectively leverages the advantages of both local and global graph learning, thereby achieving more discriminative pair representation and superior predictive performance.

DOI ↗ PDF ↗

AI in drug development: advances in response, combination therapy, repositioning, and molecular design

2025 · ARTICLE · en

Artificial intelligence (AI) is revolutionizing the field of drug development, particularly in addressing key challenges such as drug response prediction, drug combination design, drug repositioning, and drug molecule generation. Traditional drug discovery is hindered by long timelines, high costs, and low success rates, necessitating innovative technologies to accelerate the process. AI technologies, such as deep learning, graph neural networks, and generative models, have demonstrated significant potential in enhancing the accuracy of drug response predictions, optimizing drug combination strategies, identifying opportunities for drug repositioning, and generating drug molecules with specific biological activities. These advancements not only accelerate the drug development process but also open up new possibilities for precision medicine. This review discusses the latest applications and developments of AI in drug discovery, highlighting the breakthroughs and challenges AI addresses in drug development. By summarizing the current research progress, this study provides theoretical support and practical guidance for further applications of AI in drug development.

DOI ↗ PDF ↗

Transcriptome Analysis of Bone Marrow Plasma Cells in Multiple Myeloma Patients before Treatment

2025 · ARTICLE · en

Multiple myeloma (MM) is a malignant lymphoproliferative disorder associated with accumulation of terminally differentiated B lymphocytes (plasma cells) in the bone marrow, monoclonal expression of pathologic immunoglobulin, anemia, renal damage, hypercalcemia, and bone lesions. Despite considerable attention to the study of ММ pathogenesis and the development of new drugs, this disease remains incurable. Omics technologies are contributing significantly to the understanding of the molecular mechanisms of plasma cell neoplastic transformation in MM and may lead to the identification of novel therapeutic targets. In this work, the authors performed comparative gene expression analysis in CD138+ cell samples obtained from bone marrow aspirates of 46 MM patients and seven healthy donors using high-throughput RNA sequencing technology. Differential expression analysis identified 1230 genes with statistically significant expression changes in MM patient samples compared to donor samples. Functional analysis of the transcriptome revealed that pathogenetic changes in MM were associated with groups related to growth factors and intracellular signaling (DKK1, BMP4, HGF, TGFB2, FGF), extracellular matrix modification and regulation of cell adhesion (VCAM1, MMP16, LAMP5), ion channel activity (GRIA3, CLCNKA, GABRB2), regulation of immune functions, chromatin organization, cytoskeleton, and Ca2+ signaling. A significant proportion of genes from the ion channel category were associated with the regulation of neuronal transmission. The last category is poorly characterized, which could provide a new direction for MM therapy. The presented functional analysis of differentially expressed genes helps to elucidate the molecular mechanisms of MM, which will contribute to the development of new treatment approaches.

DOI ↗ PDF ↗

Анализ белок-белковых взаимодействий пионерных транскрипционных факторов человека на основе предсказаний структур их комплексов

2025 · ARTICLE · ru

Пионерные транскрипционные факторы из «коктейля Яманаки» осуществляют репрограммирование клеток. Эти белки связываются с гетерохроматином, способствуют его открытию и привлекают белки для изменения эпигенетического ландшафта и активации транскрипции. Однако в литературе нет информации о том, какие белки хроматина привлекаются через прямые белок-белковые взаимодействия с пионерными транскрипционными факторами. В данной работе мы с помощью нейросетевых алгоритмов предсказали потенциальные белок-белковые комплексы трех белков из «коктейля Яманаки» (SOX2, ОСТ4 и KLF4), провели структурный и функциональный анализ полученных комплексов.

DOI ↗ PDF ↗

The Role of Changes in Structure and Dynamics of Chromatin due to COVID-19

2024 в печати · ARTICLE · en

Пандемия COVID-19 стала серьезным вызовом для системы здравоохранения и экономики многих государств, а понимание молекулярных механизмов патогенеза этого заболевания явилось значительным вызовом для современной науки. В то же время, ученым впервые был доступен ряд высокоточных и высокопроизводительных методов анализа молекулярных процессов, включая технологии исследования изменений в хроматине на геномном уровне. В данном обзоре мы обсуждаем различные современные методы, которые применялись или могут быть применены для изучения изменений в структуре и динамике хроматина при инфицировании SARS-CoV-2, излагаем результаты имеющихся на данный момент исследований о роли этих изменений в патогенезе COVID-19 и в заключении обозреваем известные на сегодняшний день молекулярные механизмы модуляции работы хроматина, возникающие при инфицировании SARS-CoV-2.

DOI ↗ PDF ↗

Molecular Dynamics Simulations of Nucleosomes Are Coming of Age

2024 · ARTICLE · en

Understanding the function of eukaryotic genomes, including the human genome, is undoubtedly one of the major scientific challenges of the 21st century. The cornerstone of eukaryotic genome organization is nucleosomes—elementary building blocks of chromatin about 10 nm in size that wrap DNA around an octamer of histone proteins. Nucleosomes are integral players in all genomic processes, including transcription, DNA replication and repair. They mediate genome regulation at the epigenetic level, bridging the discrete nature of the genetic information encoded in DNA with the analog physical nature of the intermolecular interactions required to access that information. Due to their relatively large size and dynamic nature, nucleosomes are difficult objects for experimental characterization. Molecular dynamics (MD) simulations have emerged over the years as a useful tool to complement experimental studies. Particularly in recent years, advances in computing power, refinement of MD force fields and codes have opened up new frontiers in terms of simulation timescales and quality for nucleosomes and related systems. It has become possible to elucidate in atomistic detail their functional dynamics modes such as DNA unwrapping and sliding, to characterize the effects of epigenetic modifications, DNA and protein sequence variation on nucleosome structure and stability, to describe the mechanisms governing nucleosome interactions with chromatin-associated proteins and the formation of supranucleosome structures. In this review, we systematically analyzed all-atom MD simulation studies of nucleosomes and related structures published since 2018 and discussed their relevance in the context of older studies, experimental data, and related coarse-grained and multiscale studies.

DOI ↗ PDF ↗

The Methylation Level of a CpG Site in the Human Interleukin-1β Promoter Reflects Both Current and Past Inflammation

2024 · ARTICLE · en

Viral infections, including SARS-CoV-2, are accompanied by signs of systemic inflammation, which can cause long-term sequela for the patient. Time-stable changes in the organism may be caused by epigenetic shifts inherited in a series of cell divisions, in particular, by changes in the DNA methylation profile in cells of various organs and tissues in response to proinflammatory cytokines. IL1B is a key inflammatory factor, and it was shown that CpG methylation level in its promoter can change upon pro-inflammatory stimuli, and that it was associated with significant increase in IL1B expression. In particular, a specific CpG site in the promoter of the IL1B gene located 299 bp upstream from the transcription start site (CpG3) was previously shown to be an important player in these processes. In this study, we examined methylation/demethylation levels of this CpG3 in publicly available genome-wide methylation studies. A total of 15 dataset were analyzed that comprised data from stromal cells in normal and inflammation-associated states, immune cells of healthy young and aging donors, patients during COVID-19 and after recovery. The level of CpG3 demethylation was found to be higher in osteoarthritis samples of cartilage as compared to healthy donors in one dataset. In blood samples of patients with rheumatoid arthritis CpG3 demethylation was also found to be statistically higher than in healthy donors. In COVID-19 studies, blood samples obtained from patients with severe symptoms had higher CpG3 demethylation levels compared to samples obtained from patients with mild symptoms and controls. The level of CpG3 demethylation increased with age in healthy people as judged by whole blood samples. The same dependency was seen for in vitro cultures of mesenchymal cells obtained from healthy donors. Taken together we showed that demethylation level of a single CpG site in IL1B promoter increases in several cell types due to conditions associated with local and systemic inflammation, including SARS-CoV-2 infection, and in aging. These data suggest a possibility that a history of conditions associated with inflammation within an organism may be recorded, preserved, and encoded in its DNA methylation pattern. While the specificity of these “records of inflammation” is an open question, decoding the history of pathological events associated with inflammation that had been faced by the organism is an intriguing possibility.

DOI ↗ PDF ↗

High-throughput computational design of protein binders for complex targets using deep learning models

2023 · CHAPTER · en

Методы компьютерного проектирования белков изменили структурную биоинформатику, преодолев множество экспериментальных ограничений. Ранее для создания связывающих белков использовались экспериментальные методы, такие как направленная эволюция. Многие достижения в компьютерном дизайне белков позволили создавать связывающие вещества de novo исключительно на основе информации о целевой структуре и последовательности. Однако, несмотря на недавний прогресс, разработка белковых связующих de novo по-прежнему представляет трудности, поскольку средний показатель успешности экспериментальных испытаний остается относительно низким (1). Подходы глубокого обучения показали себя многообещающими в решении этой проблемы, особенно после успеха модели AlphaFold в задаче предсказания структуры белка (2). Целью данного исследования является объединение множества различных подходов геометрических и генеративных нейронных сетей в единый полуавтоматический конвейер для разработки белковых связующих. Предлагаемый конвейер включает в себя методы структурного анализа и прогнозирования интерфейса связывания, генерацию основной цепи связывания и последовательности, а также модель AlphaFold 2 в качестве основного инструмента для проверки. Во многих исследованиях аналогичные методы применялись для создания связующих для хорошо известных белков-мишеней, некоторые из которых могут иметь ограниченную геометрическую сложность. Однако в данном конкретном случае конвейер применяется к более сложным ландшафтам крупных белковых комплексов. Мы создаем несколько сотен проектов, изображаем плюсы и минусы различных подходов к созданию связующих, оцениваем их производительность и потребление вычислительных ресурсов. Разработанный подход может служить основой для высокопроизводительного проектирования кремниевых связующих, а также эталонным тестом для аналогичных инструментов проектирования белков.

PDF ↗

Курсы (2)

Молекулярное моделирование · 3 раза

2025/2026, 2024/2025, 2023/2024 · Магистратура / Маго-лего · рус
Биоинформатика

2021/2022 · Бакалавриат · рус