Игнатов Дмитрий Игоревич

Факультет компьютерных наук

Профиль на hse.ru ↗ тел.: +7(495) 772-9590 | 27276 | +7(926) 381-8033

Публикаций

Языков

Наград

Конференций

Профиль Публикации (31) Курсы (6)

Профессиональные интересы

машинное обучениемашинное обучение и разработка данных (data mining)анализ формальных понятий

Должности

Доцент — Факультет компьютерных наук, Департамент анализа данных и искусственного интеллекта
Заведующий лабораторией — Факультет компьютерных наук, Научно-учебная лаборатория моделей и методов вычислительной прагматики

Био

· Начал работать в НИУ ВШЭ в 2006 году.
· Научно-педагогический стаж: 28 лет.

Образование

2010 · Кандидат наук: Государственный университет – Высшая школа экономики, специальность 05.13.18 «Математическое моделирование, численные методы и комплексы программ», тема диссертации: Модели, алгоритмы и программные средства бикластеризации на основе замкнутых множеств
2008 · Магистратура: Государственный университет – Высшая школа экономики, факультет: Бизнес-информатики, Отделение прикладной математики и информатики, специальность «Прикладная математика и информатика», квалификация «Магистр»
2004 · Специалитет: Коломенский государственный педагогический институт, факультет: Физико-математический, специальность «Физика», квалификация «Учитель физики и математики»

Опыт работы

· В разные годы приходилось заниматься разными делами. Краткий, но не официальный перечень ниже. Вожатый (МБОУ СОШ №15 г.), старший вожатый Коломна, ДОЛ "Метеор"), оператор ЭВМ (ВЦ КГПИ), учитель физики и математики (МБОУ СОШ №12 г. Коломна), программист компьютерных игр и приложений автоматизации офиса (FMC Computers), системный администратор (Международные диализные центры) и администратор баз данных ("Новые технологии"), веб-дизайнер (фотостудия "Контраст"), инженер, старший преподаватель, доцент, научный сотрудник, зам. зав. департамента АДиИИ, зав. лаб. ММВП (ГУ-ВШЭ [позднее НИУ ВШЭ]), научный сотрудник ПОМИ РАН, лектор совместной программы SAP & Лаборатории инновации ЭФ МГУ, DataScience фрилансер.

Награды и поощрения

· Научная премия им. И. Сегаловича (2019): Лауреат номинации «Научные руководители» за вклад в развитие научного сообщества и подготовку молодых учёных.
· Почетная грамота НИУ ВШЭ (январь 2026)
· Благодарность Высшей школы экономики (декабрь 2022)
· Благодарность проректора НИУ ВШЭ (май 2019)
· Благодарность проректора НИУ ВШЭ (апрель 2019)
· Благодарность первого проректора НИУ ВШЭ (январь 2018)
· Благодарность Факультета компьютерных наук НИУ ВШЭ (август 2017)
· Надбавка за академическую работу (2010–2011)
· Надбавка за публикацию в журнале из Списка А (и приравненном к нему научном издании) (2025–2026, 2024–2025)
· Надбавка за публикацию в международном рецензируемом научном издании (2020–2021, 2019–2020, 2017–2019)
· Надбавка за регулярные публикации в международных рецензируемых научных изданиях (2021–2026)
· Надбавка за статью в зарубежном рецензируемом журнале (2015–2017, 2013–2015, 2011–2013)
· Лучший преподаватель — 2021, 2018, 2016, 2011
· Группа высокого профессионального потенциала (кадровый резерв НИУ ВШЭ)Категория "Будущие профессора" (2013–2014)Категория "Новые преподаватели" (2009–2010)

Гранты и проекты

— · на соискание учёной степени кандидата наук

Конференции (3)

Показать все

· 2016: The 13th International Conference on Concept Lattices and Their Applications (CLA2016) (Москва). Доклад: A Lattice-Based Consensus Clustering Algorithm
· 2015: 6th International Conference on Knowledge Engineering and Semantic Web, KESW 2015 (Moscow). Доклад: Pattern Mining and Machine Learning for Demographic Sequences.
· 2015: 24th International Joint Conference on Artificial Intelligence, IJCAI 2015 (Buenos Aires). Доклад: RAPS: A Recommender Algorithm Based on Pattern Structures.

Идентификаторы исследователя

ORCID: 0000-0002-6584-8534
ResearcherID: D-9558-2014
SPIN РИНЦ: 3622-5421
Google Scholar: https://scholar.google.com/citations?user=iExWnWsAAAAJ&hl=en
Scopus AuthorID: 56926915100

Публикации (31)

What Drives Multi-Chain Crypto Forecasting: Model Choice, Feature Selection, and Transferability

2026 · ARTICLE · en

Increasingly shaped by heterogeneous on-chain activity rather than a single shared market process, this study investigates 7-day-ahead forecasting using 147 market and on-chain indicators across eight major blockchain ecosystems from October 2023 to April 2025. We benchmark statistical, deep-learning, and foundation-model baselines under multiple feature-selection pipelines using both error metrics and Diebold–Mariano tests. TiRex achieves the best average MAPE (0.0428) in a univariate setting without additional optimized covariates, while TFT remains slightly weaker even under its best feature-input configuration (MAPE: 0.0435; 𝑝=0.9359 versus TiRex), suggesting a persistent practical advantage for TiRex. Importantly, TiRex’s zero-shot nature confers a substantial efficiency edge: by bypassing feature selection, it delivers comparable accuracy at a fraction of the computational cost. At the same time, feature selection materially affects many model families, with Boruta chosen in roughly 71.7% of best configurations. Taken together, the evidence supports a selective-feature principle: robust forecasting depends on validated, chain-specific features rather than larger feature sets. Feature-importance and overlap analyses further indicate a mixed structure of transferability, where broad market proxies provide baseline context while chain-specific variables drive marginal gains. Overall, this study highlights that effective multi-chain forecasting is primarily a feature selection problem under statistical uncertainty, while also showing that zero-shot designs like TiRex can achieve state-of-the-art accuracy with unmatched efficiency, offering practical implications for building leaner, more robust trading systems.

DOI ↗

Identifying Top-Performing Students via VKontakte Social Media Communities Using Advanced NLP Techniques

2025 · ARTICLE · en

Identifying potentially high-performing students is crucial for universities aiming to enhance educational outcomes, for companies seeking to recruit top talents early, and for advertising platforms looking to optimize targeted marketing. This paper introduces an algorithm designed to identify students with exceptional academic performance by analyzing their subscriptions to communities on the social network VKontakte. The study examines a sample of 4445 students from Tomsk State University with publicly accessible VK profiles. The research methodology involves generating vector representations for each community based on embeddings, topic modeling, sentiment and emotion analysis, as well as text complexity metrics. To generate the embeddings, a separate model was trained and made publicly available on HuggingFace. The integration of diverse features was achieved using attention mechanisms, allowing the model to dynamically weigh their importance and capture intricate interrelations. These representations are then used to construct a digital user profile, capturing the students’ interests as reflected in their community subscriptions. Additionally, the machine learning pipeline incorporated stacking to combine predictions from multiple models, enhancing robustness and classification performance. Through a series of experiments, we developed a machine learning algorithm that effectively distinguishes between high- and low-performing students based on these profiles. This approach also enabled the identification and interpretation of key factors differentiating high-performing students from their lower-performing peers. Additionally, we investigated the factors positively and negatively associated with academic performance.

DOI ↗ PDF ↗

Bitcoin Ordinals: Bitcoin Price and Transaction Fee Rate Predictions

2025 · ARTICLE · en

Ordinals, a method for creating unique digital assets on the Bitcoin blockchain, has significantly impacted the blockchain over the past year, yet there is a notable lack of research on it. This study is the first to demonstrate that Bitcoin Ordinals-related data are crucial features for predicting Bitcoin transaction fee rates and prices. Our main contributions are threefold. 1) Dataset Construction: We construct a dataset that includes Bitcoin chain data, Ordinals index data, and Ordinals market data, as well as a dataset excluding Ordinals-related data. Our findings reveal that the fluctuation in the number of Ordinals inscriptions tends to correlate with market activity. When the Ordinals market is active, the share of Ordinals inscribed fees and the average Bitcoin transaction fee rate remain high. We argue that the upgrades of SegWit and Taproot drove the creation and development of Bitcoin Ordinals. Combined with users’ interest in Ordinals, this in turn affected the Bitcoin blockchain and its price; 2) Prediction: Using three metrics (MAE, RMSE, and MAPE) and the TemporalFusionTransformer model as a baseline, our comparative experiments show that Bitcoin Ordinals-related data is essential for predicting Bitcoin transaction fee rates and prices. This finding aids investors and participants in the Bitcoin Ordinals market in avoiding losses and leveraging congestion-related arbitrage opportunities, thus enabling more accurate decision-making in the cryptocurrency market; 3) Chronos Model: Additionally, the fine-tuned Chronos model achieves metrics comparable to or better than those of the TemporalFusionTransformer for shorter time intervals, especially in low-noise environments. With its outstanding zero-shot prediction performance, fast execution, and easy cloud deployment, the Chronos model allows investors and market participants to quickly obtain high-quality predictions without requiring complex data features.

DOI ↗

Predicting Student Dropout Through Text and Media Content Analysis of VKontakte Profiles

2025 · ARTICLE · en

This paper presents a novel approach to predicting student dropout by analyzing publicly available data from VKontakte social network profiles. Unlike traditional methods that primarily rely on academic and institutional data, our method leverages publicly available content, including photos, videos, music preferences, and textual posts. Image and video content were analyzed using scene recognition models (Places365) and contextual alignment models (CLIP), while text data were processed using the BERTopic model for topic modeling and a pre-trained model for emotion analysis. Music tracks were categorized by genre. Additionally, we developed and implemented the Weighted Neighborhood Feature Aggregation method, which incorporates weighted features from a student’s friends to make classification models graph-aware, leveraging information from the friendship graph to enrich feature representation. Using this data, a comprehensive digital student profile was constructed and utilized to predict dropout risk through machine learning techniques. The CatBoost model achieved the highest prediction accuracy, with a ROC-AUC of 0.802 and a balanced accuracy of 0.749. Key predictors included indirect indicators of academic and extracurricular engagement, the emotional tone of posts, and leisure activity preferences. The findings demonstrate that social media data can effectively aid in the early identification of students at risk of dropping out, allowing for timely interventions. While this approach depends on publicly available VK data and may be limited by privacy settings, it still provides useful insights. This approach also holds promise for applications such as student loan risk assessment.

DOI ↗ PDF ↗

TimeGPT’s Potential in Cryptocurrency Forecasting: Efficiency, Accuracy, and Economic Value

2025 · ARTICLE · en

Accurate and efficient cryptocurrency price prediction is vital for investors in the volatile crypto market. This study comprehensively evaluates nine models—including baseline, zero-shot, and deep learning architectures—on 21 major cryptocurrencies using daily and hourly data. Our multi-dimensional evaluation assesses models based on prediction accuracy (MAE, RMSE, MAPE), speed, statistical significance (Diebold–Mariano test), and economic value (Sharpe Ratio). Our research found that the optimally fine-tuned TimeGPT model (without variables) demonstrated superior performance across both Daily and Hourly datasets, with its statistical leadership confirmed by the Diebold–Mariano test. Fine-tuned Chronos excelled in daily predictions, while TFT was a close second to TimeGPT for hourly forecasts. Crucially, zero-shot models like TimeGPT and Chronos were tens of times faster than traditional deep learning models, offering high accuracy with superior computational efficiency. A key finding from our economic analysis is that a model’s effectiveness is highly dependent on market characteristics. For instance, TimeGPT with variables showed exceptional profitability in the volatile ETH market, whereas the zero-shot Chronos model was the top performer for the cyclical BTC market. This also highlights that variables have asset-specific effects with TimeGPT: improving predictions for ICP, LTC, OP, and DOT, but hindering UNI, ATOM, BCH, and ARB. Recognizing that prior research has overemphasized prediction accuracy, this study provides a more holistic and practical standard for model evaluation by integrating speed, statistical significance, and economic value. Our findings collectively underscore TimeGPT’s immense potential as a leading solution for cryptocurrency forecasting, offering a top-tier balance of accuracy and efficiency. This multi-dimensional approach provides critical, theoretical, and practical guidance for investment decisions and risk management, proving especially valuable in real-time trading scenarios.

DOI ↗

Diagnosis of the Severity of Depression Using Speech Recording Analysis

2024 · CHAPTER · en

More than 3% of people worldwide experience depression. This diagnosis is established through interviews and clinical observations, which is a time- and money-demanding process. Additionally, there are a variety of symptoms associated with depression that are difficult to capture due to the limited capabilities of a human being. Many studies propose methods of automatic mental disorder recognition (MDR) using machine learning methods that are based on acoustic or linguistic feature extraction followed by a complex process of selection of the most suitable characteristics. Nevertheless, the data-collecting process is difficult; thus, the solution for MDR must be able to handle limited data and avoid complicated and uninterpretable feature engineering processes. Hereby, we propose four methods based on the fine-tuned Wav2Vec-2.0 model. These approaches overcome the mentioned limitations since this transformer model is able to capture information from both acoustic and linguistic modalities and does not require a big collection of labelled data. Moreover, three of the proposed methods are novel approaches to long audio classification problems and allow us to evaluate the capabilities of acoustic transformer models to deal with long speech recordings.

DOI ↗ PDF ↗

Применение трансформеров для определения профильного врача на основе запросов пользователей

2024 · ARTICLE · ru

Представлен новый подход, использующий модель RuBERT для классификации пользовательских запросов в области медицинских консультаций с учетом специализации эксперта. Входе исследования был собран обширный набор данных, который использовался для дообучения модели RuBERT. Метрика качества полученной модели F1-score составила более 91,8% как при использовании блоковой кросс-валидации, так и при разделении набора данных на обучающую и тестовую выборки. Подход демонстрирует высокую обобщающую способность для различных медицинских подобластей, таких как кардиология, неврология и дерматология. Предложенный подход позволяет сократить время на определение наиболее подходящего специалиста и тем самым повышает качество консультации и медицинской помощи.

DOI ↗ PDF ↗

Introductory Remarks to the Special Issue Devoted to DAMDID/RCDL-2023

2024 · ARTICLE · en

This special issue of Automation and Remote Control contains some scientific results presented at the 25th International Conference on Data Analytics and Management in Data-Intensive Domains (DAMDID/RCDL-2023). The conference was held on October 24–27, 2023, at National Research University Higher School of Economics (HSE) in Moscow, Russia. The reader is offered the full texts of selected DAMDID/RCDL-2023 papers relevant to the journal’s topics.

PDF ↗

Transformer-Based Classification of User Queries for Medical Consultancy

2024 · ARTICLE · en

The need for skilled medical support is growing in the era of digital healthcare. This research presents an innovative strategy, utilizing the RuBERT model, for categorizing user inquiries in the field of medical consultation with a focus on expert specialization. By harnessing the capabilities of transformers, we fine-tuned the pretrained RuBERT model on a varied dataset, which facilitates precise correspondence between queries and particular medical specialisms. Using a comprehensive dataset, we have demonstrated our approach’s superior performance with an Fl-score of over 91.8%, calculated through both cross-validation and the traditional split of test and train datasets. Our approach has shown excellent generalization across medical domains such as cardiology, neurology and dermatology. This methodology provides practical benefits by directing users to appropriate specialists for prompt and targeted medical advice. It also enhances healthcare system efficiency, reduces practitioner burden, and improves patient care quality. In summary, our suggested strategy facilitates the attainment of specific medical knowledge, offering prompt and precise advice within the digital healthcare field.

DOI ↗ PDF ↗

Time-Dependent Next-Basket Recommendations

2023 · CHAPTER · en

here are various real-world applications for next-basket recommender systems. One of them is guiding a website user who wants to buy anything toward a collection of items. Recent works demonstrate that methods based on the frequency of prior purchases outperform other deep learning algorithms in terms of performance. These techniques, however, do not consider timestamps and time intervals between interactions. Additionally, they often miss the time period that passes between the last known basket and the prediction time. In this study, we explore whether such knowledge could improve current state-of-the-art next- basket recommender systems. Our results on three real-world datasets show how such enhancement may increase prediction quality. These findings might pave the way for important research directions in the field of next-basket recommendations.

DOI ↗ PDF ↗

Курсы (6)

Recommender Systems · 3 раза

2025/2026, 2024/2025, 2023/2024 · Магистратура / Маго-лего · Анг
Modern Methods of Data Analysis · 5 раза

2025/2026, 2024/2025, 2023/2024, 2022/2023, 2021/2022 · Магистратура / Маго-лего · Анг
Введение в Hadoop и MapReduce

2024/2025 · Бакалавриат · рус
01.04.02. Прикладная математика и информатика

2022/2023 · Магистратура · Анг
Введение в Data Science

2021/2022 · Бакалавриат · рус
Научно-исследовательский семинар

2021/2022 · Магистратура · рус