Спирин Сергей Александрович

Факультет компьютерных наук

Профиль на hse.ru ↗ тел.: 27267

Публикаций

Языков

Наград

Конференций

Профиль Публикации (8) Курсы (6)

Профессиональные интересы

биоинформатикаанализ данных в биологии и медицине

Должности

Доцент — Факультет компьютерных наук, Департамент больших данных и информационного поиска, Базовая кафедра Института проблем передачи информации им. А.А. Харкевича РАН

Био

· Начал работать в НИУ ВШЭ в 2018 году.
· Научно-педагогический стаж: 39 лет.

Образование

2001 · Кандидат физико-математических наук
1985 · Специалитет: Московский государственный университет им. М.В. Ломоносова, специальность «Математика», квалификация «Специалист»

Опыт работы

· Ведущий научный сотрудник НИИ физико-химической биологии им. А.Н.Белозерского МГУ им. М.В.Ломоносова

Награды и поощрения

· Надбавка за публикацию в международном рецензируемом научном издании (2022–2023, 2021–2022)
· Лучший преподаватель — 2022–2025

Гранты и проекты

— · на соискание учёной степени кандидата наук

Идентификаторы исследователя

ORCID: 0000-0002-8041-1808
ResearcherID: C-8414-2012
Google Scholar: https://scholar.google.com/citations?hl=ru&user=ownFE6MAAAAJ
Scopus AuthorID: 6602297394

Публикации (8)

Calibrated weighted permutation test detects ancient language connections in the Circumpolar area (Chukotian-Nivkh and Yukaghir-Samoyedic)

2025 · ARTICLE · en

Relationships between universally recognized language families represent a hotly debated topic in historical linguistics, and the same is true for correlation between signals of genetic and linguistic relatedness. We developed a weighted permutation test which represents the classical permutation tests with weights introduced for individual Swadesh concepts according to their typological stability. Further, the obtained values were calibrated on a negative control group to override non-uniform distribution of phonemes within the Swadesh wordlist. We applied the calibrated permutation test to the basic vocabularies of nine languages and reconstructed proto-languages to show that three groups of circumpolar language families in the Northern Hemisphere show evidence of relationship through common descent or borrowing in the basic vocabulary: [Chukotko-Kamchatkan, Nivkh]; [Yukaghir, Samoyedic]; and [Yeniseian, Na-Dene, Burushaski]. The former two pairs showed the most significant signals of language relationship. Our findings further support some hypotheses on long-distance language relationships previously put forward based on linguistic methods but lacking universal acceptance.

DOI ↗

PhyloBench: A Benchmark for Evaluating Phylogenetic Programs

2024 · ARTICLE · en

Phylogenetic inference based on protein sequence alignment is a widely used procedure. Numerous phylogenetic algorithms have been developed, most of which have many parameters and options. Choosing a program, options, and parameters can be a nontrivial task. No benchmark for comparison of phylogenetic programs on real protein sequences was publicly available. We have developed PhyloBench, a benchmark for evaluating the quality of phylogenetic inference, and used it to test a number of popular phylogenetic programs. PhyloBench is based on natural, not simulated, protein sequences of orthologous evolutionary domains. The measure of accuracy of an inferred tree is its distance to the corresponding species tree. A number of tree-to-tree distance measures were tested. The most reliable results were obtained using the Robinson–Foulds distance. Our results confirmed recent findings that distance methods are more accurate than maximum likelihood (ML) and maximum parsimony. We tested the bayesian program MrBayes on natural protein sequences and found that, on our datasets, it performs better than ML, but worse than distance methods. Of the methods we tested, the Balanced Minimum Evolution method implemented in FastME yielded the best results on our material. Alignments and reference species trees are available at https://mouse.belozersky.msu.ru/tools/phylobench/ together with a web-interface that allows for a semi-automatic comparison of a user’s method with a number of popular programs.

DOI ↗ PDF ↗

Эволюция систем рестрикции-модификации, содержащих одну эндонуклеазу рестрикции и две ДНК-метилтрансферазы.

2023 · ARTICLE · ru

Некоторые системы рестрикции-модификации содержат две ДНК-метилтрансферазы. В настоящей работе проведена классификация таких систем по присутствующим в белках систем каталитическим доменам, характерным для эндонуклеаз рестрикции и ДНК-метилтрансфераз. Подробно исследована эволюция белков из систем рестрикции-модификации, содержащих эндонуклеазный домен семейства NOV_C и две ДНК-метилтрансферазы, обе с доменами семейства DNA_methylase. Выяснено, что ДНК-метилтрансферазы таких систем разделяются на филогенетическом дереве на две клады так, что ферменты одной системы оказываются в разных кладах, что свидетельствует о независимой эволюции двух метилтрансфераз. Обнаружены свидетельства множественных межвидовых горизонтальных переносов систем в целом, а также случаи переноса генов между системами.

DOI ↗ PDF ↗

Насколько часто фильтрация колонок выравнивания улучшает реконструкцию филогении двухдоменных белков?

2022 · ARTICLE · ru

Реконструкция филогении белков обычно проводится по множественному выравниванию их аминокислотных последовательностей. Одной из проблем является наличие в таких выравниваниях участков различной консервативности, в том числе таких, где качество выравнивания сомнительно. Для решения этой проблемы часто применяется фильтрация колонок выравнивания, для чего разработано специальное программное обеспечение. В данной работе исследованы различные подходы к реконструкции филогении на примере белков с двумя эволюционными доменами. Последовательности таких белков заведомо неоднородны по консервативности благодаря наличию как эволюционных доменов, так и линкеров между доменами, а также N- и C-концевых частей. Показано, что фильтрация колонок выравнивания в среднем улучшает качество реконструкции только при использовании полноразмерных последовательностей и только при работе с эукариотическими белками. Показано также, что ограничение выравнивания на эволюционные домены с отбрасыванием менее консервативных линкеров и концевых последовательностей в среднем ухудшает качество филогенетической реконструкции.

DOI ↗ PDF ↗

Is there any intron sliding in mammals?

2020 · ARTICLE · en

Background Eukaryotic protein-coding genes consist of exons and introns. Exon–intron borders are conserved between species and thus their changes might be observed only on quite long evolutionary distances. One of the rarest types of change, in which intron relocates over a short distance, is called "intron sliding", but the reality of this event has been debated for a long time. The main idea of a search for intron sliding is to use the most accurate genome annotation and genome sequence, as well as high-quality transcriptome data. We applied them in a search for sliding introns in mammals in order to widen knowledge about the presence or absence of such phenomena in this group. Results We didn’t find any significant evidence of intron sliding in the primate group (human, chimpanzee, rhesus macaque, crab-eating macaque, green monkey, marmoset). Only one possible intron sliding event supported by a set of high quality transcriptomes was observed between EIF1AX human and sheep gene orthologs. Also, we checked a list of previously observed intron sliding events in mammals and showed that most likely they are artifacts of genome annotations and are not shown in subsequent annotation versions as well as are not supported by transcriptomic data. Conclusions We assume that intron sliding is indeed a very rare evolutionary event if it exists at all. Every case of intron sliding needs a lot of supportive data for detection and confirmation.

DOI ↗

PQ, a new program for phylogeny reconstruction

2018 · ARTICLE · en

Background. Many algorithms and programs are available for phylogenetic reconstruction of families of proteins. Methods used widely at present use either a number of distance-based principles or character-based principles of maximum parsimony or maximum likelihood. Results. We developed a novel program, named PQ, for reconstructing protein and nucleic acid phylogenies following a new character-based principle. Being tested on natural sequences PQ improves upon the results of maximum parsimony and maximum likelihood. Working with alignments of 10 and 15 sequences, it also outperforms the FastME program, which is based on one of the distance-based principles. Among all tested programs PQ is proved to be the least susceptible to long branch attraction. FastME outperforms PQ when processing alignments of 45 sequences, however. We confirm a recent result that on natural sequences FastME outperforms maximum parsimony and maximum likelihood. At the same time, both PQ and FastME are inferior to maximum parsimony and maximum likelihood on simulated sequences. PQ is open source and available to the public via an online interface. Conclusions. The software we developed offers an open-source alternative for phylogenetic reconstruction for relatively small sets of proteins and nucleic acids, with up to a few tens of sequences.

DOI ↗ PDF ↗

Avoidance of recognition sites of restriction-modification systems is a widespread but not universal anti-restriction strategy of prokaryotic viruses

2018 · ARTICLE · en

Background. Restriction-modification (R-M) systems protect bacteria and archaea from attacks by bacteriophages and archaeal viruses. An R-M system specifically recognizes short sites in foreign DNA and cleaves it, while such sites in the host DNA are protected by methylation. Prokaryotic viruses have developed a number of strategies to overcome this host defense. The simplest anti-restriction strategy is the elimination of recognition sites in the viral genome: no sites, no DNA cleavage. Even a decrease of the number of recognition sites can help a virus to overcome this type of host defense. Recognition site avoidance has been a known anti-restriction strategy of prokaryotic viruses for decades. However, recognition site avoidance has not been systematically studied with the currently available sequence data. We analyzed the complete genomes of almost 4000 prokaryotic viruses with known host species and more than 17,000 restriction endonucleases with known specificities in terms of recognition site avoidance. Results. We observed considerable limitations of recognition site avoidance as an anti-restriction strategy. Namely, the avoidance of recognition sites is specific for dsDNA and ssDNA prokaryotic viruses. Avoidance is much more pronounced in the genomes of non-temperate bacteriophages than in the genomes of temperate ones. Avoidance is not observed for the sites of Type I and Type IIG systems and is very rarely observed for the sites of Type III systems. The vast majority of avoidance cases concern recognition sites of orthodox Type II restriction-modification systems. Even under these constraints, complete or almost complete elimination of sites is observed for approximately one-tenth of viral genomes and a significant under-representation for approximately one-fourth of them. Conclusions. Avoidance of recognition sites of restriction-modification systems is a widespread but not universal anti-restriction strategy of prokaryotic viruses.

DOI ↗

СРАВНЕНИЕ МЕТОДОВ ОБНАРУЖЕНИЯ ИСКЛЮЧИТЕЛЬНЫХ ПОСЛЕДОВАТЕЛЬНОСТЕЙ В ГЕНОМАХ ПРОКАРИОТ

2018 · ARTICLE · ru

Многие белки узнают определенные последовательности ДНК для осуществления своих функций. Число и распределение в геноме таких сайтов узнавания может иметь биологическое значение. Например, количество сайтов рестрикции часто снижено в геномах бактериофагов и прокариот, что приводит к уменьшению вероятности гидролиза ДНК эндонуклеазами рестрикции. Последовательность называют исключительной, если частота ее встречаемости в геноме значительно отличается от предсказанной на основании некой математической модели. Исключительная последовательность может быть либо перепредставленной, либо недопредставленной, в зависимости от того, выше или ниже частота ее встречаемости по сравнению с предсказанной частотой. Для исключительных последовательностей можно предположить биологическую значимость, например, они могут соответствовать сайтам связывания белков либо входить в состав часто встречающихся повторов. Существует несколько методов предсказания частоты короткой последовательности в геноме на основе наблюдаемых частот ее подпоследовательностей. Наиболее популярные методы основаны на марковских моделях. Однако всестороннее сравнение методов на большом материале ранее не проводилось. Сравнены три наиболее используемых метода предсказания частоты короткой последовательности: метод, основанный на марковской модели максимального порядка; метод, использующий среднее геометрическое обобщенных марковских оценок, и метод, учитывающий частоты всех подпоследовательностей, включая разрывные. На примере сайтов рестрикции в полных геномах 2500 видов прокариот показано, что результат предсказания сильно зависит от используемого метода: списки 5% наиболее недопредставленных сайтов содержат вплоть до 50% различающихся предсказаний. Метод, учитывающий частоты всех подпоследовательностей, предложенный Burge et al. в 1992 г., показал более высокую точность, чем два других метода, как в случае прокариотических геномов, так и при компьютерной имитации отбора в случайных последовательностях. По результатам сравнения дана рекомендация использовать этот метод для обнаружения исключительных последовательностей в прокариотических геномах.

DOI ↗

Курсы (6)

Алгоритмы в биоинформатике · 4 раза

2025/2026, 2024/2025, 2023/2024, 2022/2023 · Магистратура / Маго-лего · рус
Прикладная статистика · 5 раза

2025/2026, 2024/2025, 2023/2024, 2022/2023, 2021/2022 · Магистратура / Маго-лего · рус
Семинар наставника "Исследование медико-биологических данных"

2023/2024 · Магистратура · рус
Scientific and Research Seminar ''Computational Phylogenetics''

2022/2023 · Магистратура · Анг
Научно-исследовательский семинар "Вычислительная филогенетика"

2021/2022 · Магистратура · рус
Современные методы принятия решений: Алгоритмы в биоинформатике

2021/2022 · Магистратура · рус