Рябинин Максим Константинович

Факультет компьютерных наук

Профиль на hse.ru ↗ тел.: +7 (495) 531-00-00 | 28841

Публикаций

Языков

Наград

Конференций

Профиль Публикации (14) Курсы (4)

Профессиональные интересы

глубинное обучениеобработка естественного языка

Должности

Старший преподаватель — Факультет компьютерных наук, Департамент больших данных и информационного поиска, Базовая кафедра Яндекс

Био

· Начал работать в НИУ ВШЭ в 2017 году.

Образование

2021 · Магистратура: Национальный исследовательский университет "Высшая школа экономики", специальность «Прикладная математика и физика», квалификация «Магистр»
2019 · Бакалавриат: Национальный исследовательский университет "Высшая школа экономики", специальность «Прикладная математика и информатика», квалификация «Бакалавр»

Опыт работы

· Яндекс, исследователь

Награды и поощрения

· Персональная надбавка ректора (2021–2022)
· Надбавка за публикацию в журнале из Списка А (и приравненном к нему научном издании) (2024–2025, 2023–2024)
· Надбавка за публикацию в международном рецензируемом научном издании (2022–2023)

Идентификаторы исследователя

ORCID: 0000-0001-9600-3692
ResearcherID: ABF-1742-2021
Google Scholar: https://scholar.google.com/citations?user=930PERsAAAAJ&hl=en

Публикации (14)

AutoJudge: Judge Decoding Without Manual Annotation

2025 · CHAPTER · en

We introduce AutoJudge, a method that accelerates large language model (LLM) inference with task-specific lossy speculative decoding. Instead of matching the original model output distribution token-by-token, we identify the generated tokens that affect the downstream quality of the response, relaxing the distribution match guarantee so that the "unimportant" tokens can be generated faster.Our approach relies on a semi‑greedy search algorithm to test which of the mismatches between target and draft models should be corrected to preserve quality and which ones may be skipped.We then train a lightweight classifier based on existing LLM embeddings to predict, at inference time, which mismatching tokens can be safely accepted without compromising the final answer quality.We evaluate AutoJudge with multiple draft/target model pairs on mathematical reasoning and programming benchmarks, achieving significant speedups at the cost of a minor accuracy reduction. Notably, on GSM8K with the Llama 3.1 70B target model, our approach achieves up to speedup \textit{over speculative decoding} at the cost of a drop in accuracy.When applied to the LiveCodeBench benchmark, AutoJudge automatically detects programming-specific important tokens, accepting tokens per speculation cycle at a drop in Pass@1. Our approach requires no human annotation and is easy to integrate with modern LLM inference frameworks.

DOI ↗

Mind Your Format: Towards Consistent Evaluation of In-Context Learning Improvements

2024 · CHAPTER · en

Large language models demonstrate a remarkable capability for learning to solve new tasks from a few examples. The prompt template, or the way the input examples are formatted to obtain the prompt, is an important yet often overlooked aspect of in-context learning. In this work, we conduct a comprehensive study of the template format’s influence on the in-context learning performance. We evaluate the impact of the prompt template across 21 models (from 770M to 70B parameters) and 4 standard classification datasets. We show that a poor choice of the template can reduce the performance of the strongest models and inference methods to a random guess level. More importantly, the best templates do not transfer between different setups and even between models of the same family. Our findings show that the currently prevalent approach to evaluation, which ignores template selection, may give misleading results due to different templates in different works. As a first step towards mitigating this issue, we propose Template Ensembles that aggregate model predictions across several templates. This simple test-time augmentation boosts average performance while being robust to the choice of random set of templates.

DOI ↗

SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient

2023 · CHAPTER · en

Many deep learning applications benefit from using large models with billions of parameters. Training these models is notoriously expensive due to the need for specialized HPC clusters. In this work, we consider alternative setups for training large models: using cheap "preemptible" instances or pooling existing resources from multiple regions. We analyze the performance of existing model-parallel algorithms in these conditions and find configurations where training larger models becomes less communication-intensive. Based on these findings, we propose SWARM parallelism, a model-parallel training algorithm designed for poorly connected, heterogeneous and unreliable devices. SWARM creates temporary randomized pipelines between nodes that are rebalanced in case of failure. We empirically validate our findings and compare SWARM parallelism with existing large-scale training approaches. Finally, we combine our insights with compression strategies to train a large Transformer language model with 1B shared parameters (approximately 13B before sharing) on preemptible T4 GPUs with less than 200Mb/s network.

DOI ↗ PDF ↗

Is This Loss Informative? Faster Text-to-Image Customization by Tracking Objective Dynamics

2023 · CHAPTER · en

DOI ↗

Training Transformers Together

2022 · CHAPTER · en

DOI ↗ PDF ↗

Secure Distributed Training at Scale

2022 · CHAPTER · en

PDF ↗

RuCoLA: Russian Corpus of Linguistic Acceptability

2022 · CHAPTER · en

Linguistic acceptability (LA) attracts the attention of the research community due to its many uses, such as testing the grammatical knowledge of language models and filtering implausible texts with acceptability classifiers. However, the application scope of LA in languages other than English is limited due to the lack of high-quality resources. To this end, we introduce the Russian Corpus of Linguistic Acceptability (RuCoLA), built from the ground up under the well-established binary LA approach. RuCoLA consists of k in-domain sentences from linguistic publications and k out-of-domain sentences produced by generative models. The out-of-domain set is created to facilitate the practical use of acceptability for improving language generation. Our paper describes the data collection protocol and presents a fine-grained analysis of acceptability classification experiments with a range of baseline approaches. In particular, we demonstrate that the most widely used language models still fall behind humans by a large margin, especially when detecting morphological and semantic errors. We release RuCoLA, the code of experiments, and a public leaderboard (rucola-benchmark.com) to assess the linguistic competence of language models for Russian.

PDF ↗

Distributed Methods with Compressed Communication for Solving Variational Inequalities, with Theoretical Guarantees

2022 · CHAPTER · en

PDF ↗

It’s All in the Heads: Using Attention Heads as a Baseline for Cross-Lingual Transfer in Commonsense Reasoning

2021 · CHAPTER · en

DOI ↗ PDF ↗

Scaling Ensemble Distribution Distillation to Many Classes with Proxy Targets

2021 · CHAPTER · en

PDF ↗

Курсы (4)

Эффективные системы глубинного обучения · 5 раза

2025/2026, 2024/2025, 2023/2024, 2022/2023, 2021/2022 · Бакалавриат / Маго-лего · рус
Глубинное обучение · 2 раза

2022/2023, 2021/2022 · Бакалавриат · рус
Научно-исследовательский семинар "Анализ Интернет-данных"

2022/2023 · Магистратура · рус
Научно-исследовательский семинар "Прикладные задачи анализа данных"

2022/2023 · Магистратура · рус