DSA Faculty
API
← к списку преподавателей

Браславский Павел Исаакович

Факультет компьютерных наук

Публикаций
50
Языков
3
Наград
2
Конференций
0
Профиль Публикации (50) Курсы (2)

Должности

  • Старший научный сотрудникФакультет компьютерных наук, Научно-учебная лаборатория моделей и методов вычислительной прагматики
  • ДоцентФакультет компьютерных наук, Департамент больших данных и информационного поиска

Био

  • · Начал работать в НИУ ВШЭ в 2020 году.

Образование

  • 2000 · Кандидат наук
  • 1997 · Специалитет: Уральский государственный технический университет г. Екатеринбурга, специальность «Вычислительные машины, комплексы, системы и сети», квалификация «Инженер-системотехник»

Опыт работы

  • · Яндекс, СКБ Контур, JetBrains Research

Награды и поощрения

  • · Надбавка за публикацию в журнале из Списка А (и приравненном к нему научном издании) (2025–2026, 2023–2024)
  • · Надбавка за публикацию в международном рецензируемом научном издании (2022–2023, 2021–2022, 2020–2021)

Гранты и проекты

  • · на соискание учёной степени кандидата наук

Идентификаторы исследователя

Публикации (50)

What Drives Multi-Chain Crypto Forecasting: Model Choice, Feature Selection, and Transferability

2026 · ARTICLE · en

Increasingly shaped by heterogeneous on-chain activity rather than a single shared market process, this study investigates 7-day-ahead forecasting using 147 market and on-chain indicators across eight major blockchain ecosystems from October 2023 to April 2025. We benchmark statistical, deep-learning, and foundation-model baselines under multiple feature-selection pipelines using both error metrics and Diebold–Mariano tests. TiRex achieves the best average MAPE (0.0428) in a univariate setting without additional optimized covariates, while TFT remains slightly weaker even under its best feature-input configuration (MAPE: 0.0435; 𝑝=0.9359 versus TiRex), suggesting a persistent practical advantage for TiRex. Importantly, TiRex’s zero-shot nature confers a substantial efficiency edge: by bypassing feature selection, it delivers comparable accuracy at a fraction of the computational cost. At the same time, feature selection materially affects many model families, with Boruta chosen in roughly 71.7% of best configurations. Taken together, the evidence supports a selective-feature principle: robust forecasting depends on validated, chain-specific features rather than larger feature sets. Feature-importance and overlap analyses further indicate a mixed structure of transferability, where broad market proxies provide baseline context while chain-specific variables drive marginal gains. Overall, this study highlights that effective multi-chain forecasting is primarily a feature selection problem under statistical uncertainty, while also showing that zero-shot designs like TiRex can achieve state-of-the-art accuracy with unmatched efficiency, offering practical implications for building leaner, more robust trading systems.

Bitcoin Ordinals: Bitcoin Price and Transaction Fee Rate Predictions

2025 · ARTICLE · en

Ordinals, a method for creating unique digital assets on the Bitcoin blockchain, has significantly impacted the blockchain over the past year, yet there is a notable lack of research on it. This study is the first to demonstrate that Bitcoin Ordinals-related data are crucial features for predicting Bitcoin transaction fee rates and prices. Our main contributions are threefold. 1) Dataset Construction: We construct a dataset that includes Bitcoin chain data, Ordinals index data, and Ordinals market data, as well as a dataset excluding Ordinals-related data. Our findings reveal that the fluctuation in the number of Ordinals inscriptions tends to correlate with market activity. When the Ordinals market is active, the share of Ordinals inscribed fees and the average Bitcoin transaction fee rate remain high. We argue that the upgrades of SegWit and Taproot drove the creation and development of Bitcoin Ordinals. Combined with users’ interest in Ordinals, this in turn affected the Bitcoin blockchain and its price; 2) Prediction: Using three metrics (MAE, RMSE, and MAPE) and the TemporalFusionTransformer model as a baseline, our comparative experiments show that Bitcoin Ordinals-related data is essential for predicting Bitcoin transaction fee rates and prices. This finding aids investors and participants in the Bitcoin Ordinals market in avoiding losses and leveraging congestion-related arbitrage opportunities, thus enabling more accurate decision-making in the cryptocurrency market; 3) Chronos Model: Additionally, the fine-tuned Chronos model achieves metrics comparable to or better than those of the TemporalFusionTransformer for shorter time intervals, especially in low-noise environments. With its outstanding zero-shot prediction performance, fast execution, and easy cloud deployment, the Chronos model allows investors and market participants to quickly obtain high-quality predictions without requiring complex data features.

TimeGPT’s Potential in Cryptocurrency Forecasting: Efficiency, Accuracy, and Economic Value

2025 · ARTICLE · en

Accurate and efficient cryptocurrency price prediction is vital for investors in the volatile crypto market. This study comprehensively evaluates nine models—including baseline, zero-shot, and deep learning architectures—on 21 major cryptocurrencies using daily and hourly data. Our multi-dimensional evaluation assesses models based on prediction accuracy (MAE, RMSE, MAPE), speed, statistical significance (Diebold–Mariano test), and economic value (Sharpe Ratio). Our research found that the optimally fine-tuned TimeGPT model (without variables) demonstrated superior performance across both Daily and Hourly datasets, with its statistical leadership confirmed by the Diebold–Mariano test. Fine-tuned Chronos excelled in daily predictions, while TFT was a close second to TimeGPT for hourly forecasts. Crucially, zero-shot models like TimeGPT and Chronos were tens of times faster than traditional deep learning models, offering high accuracy with superior computational efficiency. A key finding from our economic analysis is that a model’s effectiveness is highly dependent on market characteristics. For instance, TimeGPT with variables showed exceptional profitability in the volatile ETH market, whereas the zero-shot Chronos model was the top performer for the cyclical BTC market. This also highlights that variables have asset-specific effects with TimeGPT: improving predictions for ICP, LTC, OP, and DOT, but hindering UNI, ATOM, BCH, and ARB. Recognizing that prior research has overemphasized prediction accuracy, this study provides a more holistic and practical standard for model evaluation by integrating speed, statistical significance, and economic value. Our findings collectively underscore TimeGPT’s immense potential as a leading solution for cryptocurrency forecasting, offering a top-tier balance of accuracy and efficiency. This multi-dimensional approach provides critical, theoretical, and practical guidance for investment decisions and risk management, proving especially valuable in real-time trading scenarios.

KoWit-24: A Richly Annotated Dataset of Wordplay in News Headlines

2025 · CHAPTER · en

We present KoWit-24, a dataset with fine-grained annotation of wordplay in 2,700 Russian news headlines. KoWit-24 annotations include the presence of wordplay, its type, wordplay anchors, and words/phrases the wordplay refers to. Unlike the majority of existing humor collections of canned jokes, KoWit-24 provides wordplay contexts – each headline is accompanied by the news lead and summary. The most common type of wordplay in the dataset is the transformation of collocations, idioms, and named entities – the mechanism that has been underrepresented in previous humor datasets. Our experiments with five LLMs show that there is ample room for improvement in wordplay detection and interpretation tasks. The dataset and evaluation scripts are available at https://github.com/Humor-Research/KoWit-24

NEREL: a Russian information extraction dataset with rich annotation for nested entities, relations, and wikidata entity links

2024 · ARTICLE · en

This paper describes NEREL—a Russian news dataset suited for three tasks: nested named entity recognition, relation extraction, and entity linking. Compared to flat entities, nested named entities provide a richer and more complete annotation while also increasing the coverage of relations annotation and entity linking. Relations between nested named entities may cross entity boundaries to connect to shorter entities nested within longer ones, which makes it harder to detect such relations. NEREL is currently the largest Russian dataset annotated with entities and relations: it comprises 29 named entity types and 49 relation types. At the time of writing, the dataset contains 56 K named entities and 39 K relations annotated in 933 person-oriented news articles. NEREL is annotated with relations at three levels: (1) within nested named entities, (2) within sentences, and (3) with relations crossing sentence boundaries. We provide benchmark evaluation of current state-of-the-art methods in all three tasks. The dataset is freely available at https://github.com/nerel-ds/NEREL.

Calculating Similarity of Javadoc Comments

2024 · ARTICLE · en

Code comments are an essential part of software documentation. Many software projects suffer from the problem of low-quality comments that are often produced by copy-paste. In case of similar methods, classes, etc. copy-pasted comments with minor modifications are justified. However, in many cases this approach leads to degraded documentation quality and, subsequently, to problematic maintenance and development of the project. In this study, we address the problem of near-duplicate code comments detection, which can potentially improve software documentation. We have conducted a thorough evaluation of traditional string similarity metrics and modern machine learning methods. In our experiment, we use a collection of Javadoc comments from four industrial open-source Java projects. We have found out that LCS (Longest Common Subsequence) is the best similarity algorithm taking into account both quality (Precision 94%, Recall 74%) and performance.

Needle in a Haystack: Finding Suitable Idioms Based on Text Descriptions

2024 · CHAPTER · en

Idioms are an important part of natural languages and are often used to enhance expressiveness and fluency of speech. However, it can be difficult to find a contextually appropriate idiom when writing an essay or crafting a headline for a news article, especially for non-native speakers. This gives rise to the idea of an automated system that is able to recommend an idiom for an input sentence. The goal of this study is to develop and compare methods that would make such a system possible. We used an existing collection of idioms and employed several configurations based on models from the Sentence-BERT family. We also automatically expanded the initial dataset and fine-tuned a pre-trained Sentence-BERT model on the idiom/context matching task. This approach achieved the highest MRR score of 0.507. The data and the trained model are publicly available.

The Impact of Cross-Lingual Adjustment of Contextual Word Representations on Zero-Shot Transfer

2023 · CHAPTER · en

Large multilingual language models such as mBERT or XLM-R enable zero-shot cross-lingual transfer in various IR and NLP tasks. Cao et al. [8] proposed a data- and compute-efficient method for cross-lingual adjustment of mBERT that uses a small parallel corpus to make embeddings of related words across languages similar to each other. They showed it to be effective in NLI for five European languages. In contrast we experiment with a topologically diverse set of languages (Spanish, Russian, Vietnamese, and Hindi) and extend their original implementations to new tasks (XSR, NER, and QA) and an additional training regime (continual learning). Our study reproduced gains in NLI for four languages, showed improved NER, XSR, and cross-lingual QA results in three languages (though some cross-lingual QA gains were not statistically significant), while mono-lingual QA performance never improved and sometimes degraded. Analysis of distances between contextualized embeddings of related and unrelated words (across languages) showed that fine-tuning leads to “forgetting” some of the cross-lingual alignment information. Based on this observation, we further improved NLI performance using continual learning. Our software is publicly available https://github.com/pefimov/cross-lingual-adjustment.

Consumer Health Question Answering Using Off-the-Shelf Components

2023 · CHAPTER · en

In this paper, we address the task of open-domain health question answering (QA). The quality of existing QA systems heavily depends on the annotated data that is often difficult to obtain, especially in the medical domain. To tackle this issue, we opt for PubMed and Wikipedia as trustworthy document collections to retrieve evidence. The questions and retrieved passages are passed to off-the-shelf question answering models, whose predictions are then aggregated into a final score. Thus, our proposed approach is highly data-efficient. Evaluation on 113 health-related yes/no question and answer pairs demonstrates good performance achieving AUC of 0.82.

NEREL-BIO: A Dataset of Biomedical Abstracts Annotated with Nested Named Entities

2023 · ARTICLE · en

Motivation This paper describes NEREL-BIO – an annotation scheme and corpus of PubMed abstracts in Russian and smaller number of abstracts in English. NEREL-BIO extends the general domain dataset NEREL (Loukachevitch et al., 2021) by introducing domain-specific entity types. NEREL-BIO annotation scheme covers both general and biomedical domains making it suitable for domain transfer experiments. NEREL-BIO provides annotation for nested named entities as an extension of the scheme employed for NEREL. Nested named entities may cross entity boundaries to connect to shorter entities nested within longer entities, making them harder to detect. Results NEREL-BIO contains annotations for 700+ Russian and 100+ English abstracts. All English PubMed annotations have corresponding Russian counterparts. Thus, NEREL-BIO comprises the following specific features: annotation of nested named entities, it can be used as a benchmark for cross-domain (NEREL → NEREL-BIO) and cross-language (English → Russian) transfer. We experiment with both transformer-based sequence models and machine reading comprehension (MRC) models and report their results. Availability The dataset and annotation guidelines are freely available at https://github.com/nerel-ds/NEREL-BIO.

Курсы (2)