Ветров Дмитрий Петрович

Факультет компьютерных наук

Профиль на hse.ru ↗ тел.: +7 (495) 772-95-90 | 27252

Публикаций

Языков

Наград

Конференций

Профиль Публикации (86) Курсы (2)

Должности

Научный руководитель — Факультет компьютерных наук, Институт искусственного интеллекта и цифровых наук
Профессор-исследователь — Факультет компьютерных наук, Департамент больших данных и информационного поиска

Био

· Начал работать в НИУ ВШЭ в 2014 году.
· Научно-педагогический стаж: 15 лет.

Образование

2007 · Кандидат физико-математических наук: Московский государственный университет им. М.В. Ломоносова, специальность 01.01.09 «Дискретная математика и математическая кибернетика», тема диссертации: Влияние устойчивости алгоритмов классификации на точность их работы
2003 · Специалитет: Московский государственный университет им. М.В. Ломоносова, специальность «Прикладная математика и информатика», квалификация «Математик. Системный программист»

Опыт работы

· 2017-н.в.: : руководитель центра глубинного обучения и байесовских методов (НИУ ВШЭ, факультет компьютерных наук)
· 2018-2020: : руководитель лаборатории компании Самсунг (НИУ ВШЭ, факультет компьютерных наук)
· 2016-н.в.: : профессор-исследователь (НИУ ВШЭ, факультет компьютерных наук)
· 2016-н.в.: : профессор-ислледователь (НИУ ВШЭ, факультет компьютерных наук)
· 2016-2018: : Яндекс, ведущий исследователь (полставки)
· 2015-2016: : Сколтех, доцент
· 2014-2016: : НИУ ВШЭ, факультет компьютерных наук, доцент (неполная ставка)
· 2014-2015: : МГУ, факультет вычислительной математики и кибернетики, доцент
· 2011-2014: : МГУ, факультет вычислительной математики и кибернетики, ассистент
· 2010-2012: : Курчатовский институт, НБИК-центр, зав. лабораторией (полставки)
· 2007-2011: : МГУ, факультет вычислительной математики и кибернетики, научный сотрудник
· 2005: Лето
· 2006: : Валлийский университет, Бангор, стажер
· 2000-2007: : Вычислительный центр им. А.А. Дородницына РАН, математик (полставки)

Награды и поощрения

· Благодарность НИУ ВШЭ (март 2024)
· Благодарственное письмо первого проректора НИУ ВШЭ (февраль 2023)
· Почетное звание "Почетный работник сферы образования Российской Федерации" (ноябрь 2022)
· Почетная грамота НИУ ВШЭ (февраль 2022)
· Почетная грамота НИУ ВШЭ (декабрь 2015)
· Золотая медаль Российского отделения Европейской академии за цикл научных работ по байесовской регуляризации и выводу в графических моделях (декабрь 2012)
· Стипендия Президента РФ для ведущих молодых ученых (июнь 2012)
· Надбавка за публикации, вносящие особый вклад в международную научную репутацию НИУ ВШЭ (2022–2025, 2021–2024)
· Надбавка за публикацию в международном рецензируемом научном издании (2019–2021, 2017–2019)
· Надбавка за статью в зарубежном рецензируемом журнале (2015–2017)
· Лучший преподаватель — 2019–2020, 2019

Гранты и проекты

— · на соискание учёной степени кандидата наук

Конференции (1)

Показать все

· 2016: Advances in Neural Information Processing Systems 2016 (Барселона). Доклад: PerforatedCNNs: Acceleration through Elimination of Redundant Convolutions

Идентификаторы исследователя

ORCID: 0000-0001-6863-9028
ResearcherID: H-4870-2015
SPIN РИНЦ: 4339-7570
Google Scholar: https://scholar.google.ru/citations?user=7HU0UoUAAAAJ&hl=ru
Scopus AuthorID: 8382687000

Публикации (86)

Uncertainty Estimation via Stochastic Batch Normalization

2018 · CHAPTER · en

In this work, we investigate Batch Normalization technique and propose its probabilistic interpretation. We propose a probabilistic model and show that Batch Normalization maximazes the lower bound of its marginalized log-likelihood. Then, according to the new probabilistic model, we design an algorithm which acts consistently during train and test. However, inference becomes computationally inefficient. To reduce memory and computational cost, we propose Stochastic Batch Normalization -- an efficient approximation of proper inference procedure. This method provides us with a scalable uncertainty estimation technique. We demonstrate the performance of Stochastic Batch Normalization on popular architectures (including deep convolutional architectures: VGG-like and ResNets) for MNIST and CIFAR-10 datasets.

PDF ↗

Variational Dropout via Empirical Bayes

2018 · PREPRINT · en

We study the Automatic Relevance Determination procedure applied to deep neural networks. We show that ARD applied to Bayesian DNNs with Gaussian approximate posterior distributions leads to a variational bound similar to that of variational dropout, and in the case of a fixed dropout rate, objectives are exactly the same. Experimental results show that the two approaches yield comparable results in practice even when the dropout rates are trained. This leads to an alternative Bayesian interpretation of dropout and mitigates some of the theoretical issues that arise with the use of improper priors in the variational dropout model. Additionally, we explore the use of the hierarchical priors in ARD and show that it helps achieve higher sparsity for the same accuracy.

PDF ↗

Bayesian Sparsification of Gated Recurrent Neural Networks

2018 · CHAPTER · en

Bayesian methods have been successfully applied to sparsify weights of neural networks and to remove structure units from the networks, e. g. neurons. We apply and further develop this approach for gated recurrent architectures. Specifically, in addition to sparsification of individual weights and neurons, we propose to sparsify preactivations of gates and information flow in LSTM. It makes some gates and information flow components constant, speeds up forward pass and improves compression. Moreover, the resulting structure of gate sparsity is interpretable and depends on the task.

PDF ↗

Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs

2018 · CHAPTER · en

The loss functions of deep neural networks are complex and their geometric properties are not well understood. We show that the optima of these complex loss functions are in fact connected by simple curves over which training and test accuracy are nearly constant. We introduce a training procedure to discover these high-accuracy pathways between modes. Inspired by this new geometric insight, we also propose a new ensembling method entitled Fast Geometric Ensembling (FGE). Using FGE we can train high-performing ensembles in the time required to train a single model. We achieve improved performance compared to the recent state-of-the-art Snapshot Ensembles, on CIFAR-10, CIFAR-100, and ImageNet.

PDF ↗

Averaging Weights Leads to Wider Optima and Better Generalization

2018 · CHAPTER · en

Deep neural networks are typically trained by optimizing a loss function with an SGD variant, in conjunction with a decaying learning rate, until convergence. We show that simple averaging of multiple points along the trajectory of SGD, with a cyclical or constant learning rate, leads to better generalization than conventional training. We also show that this Stochastic Weight Averaging (SWA) procedure finds much broader optima than SGD, and approximates the recent Fast Geometric Ensembling (FGE) approach with a single model. Using SWA we achieve notable improvement in test accuracy over conventional SGD training on a range of state-of-the-art residual networks, PyramidNets, DenseNets, and ShakeShake networks on CIFAR-10, CIFAR-100, and ImageNet. In short, SWA is extremely easy to implement, improves generalization, and has almost no computational overhead.

PDF ↗

Entangled Conditional Adversarial Autoencoder for de Novo Drug Discovery

2018 · ARTICLE · en

Modern computational approaches and machine learning techniques accelerate the invention of new drugs.Generative models can discover novel molecular structures within hours, while conventional drug discovery pipelines require months of work. In this article, we propose a new generative architecture, entangled conditional adversarial autoencoder, that generates molecular structures based on various properties, such as activity against a specific protein, solubility, or ease of synthesis. We apply the proposed model to generate a novel inhibitor of Janus kinase 3, implicated in rheumatoid arthritis, psoriasis, and vitiligo. The discovered molecule was tested in vitro and showed good activity and selectivity.

DOI ↗ PDF ↗

Probabilistic adaptive computation time

2018 · ARTICLE · en

We present a probabilistic model with discrete latent variables that control the computation time in deep learning models such as ResNets and LSTMs. A prior on the latent variables expresses the preference for faster computation. The amount of computation for an input is determined via amortized maximum a posteriori (MAP) inference. MAP inference is performed using a novel stochastic variational optimization method. The recently proposed adaptive computation time mechanism can be seen as an ad-hoc relaxation of this model. We demonstrate training using the general-purpose concrete relaxation of discrete variables. Evaluation on ResNet shows that our method matches the speed-accuracy trade-off of adaptive computation time, while allowing for evaluation with a simple deterministic procedure that has a lower memory footprint.

DOI ↗ PDF ↗

Conditional Generators of Words Definitions

2018 · CHAPTER · en

We explore recently introduced definition modeling technique that provided the tool for evaluation of different distributed vector representations of words through modeling dictionary definitions of words. In this work, we study the problem of word ambiguities in definition modeling and propose a possible solution by employing latent variable modeling and soft attention mechanisms. Our quantitative and qualitative evaluation and analysis of the model shows that taking into account words ambiguity and polysemy leads to performance improvement.

PDF ↗

Spatially Adaptive Computation Time for Residual Networks

2017 · CHAPTER · en

This paper proposes a deep learning architecture based on Residual Network that dynamically adjusts the number of executed layers for the regions of the image. This architecture is end-to-end trainable, deterministic and problem-agnostic. It is therefore applicable without any modifications to a wide range of computer vision problems such as image classification, object detection and image segmentation. We present experimental results showing that this model improves the computational efficiency of Residual Networks on the challenging ImageNet classification and COCO object detection datasets. Additionally, we evaluate the computation time maps on the visual saliency dataset cat2000 and find that they correlate surprisingly well with human eye fixation positions.

DOI ↗ PDF ↗

Устойчивый к шуму метод обучения вариационного автокодировщика

2017 · ARTICLE · ru

Вариационный автокодировщик (ВАК) - вероятностный метод обучения без учителя, использующий глубинное обучение. В статье предлагается устойчивый к шуму метод обучения ВАК, основанный на модификации функции правдоподобия. Предлагаются и анализируются две нижние оценки в качестве целевых функций для ВАК. Эффективность метода продемонстрирована в экспериментах с искусственно добавленными шумовыми объектами.

PDF ↗

Курсы (2)

Байесовские методы в машинном обучении · 5 раза

2025/2026, 2024/2025, 2023/2024, 2022/2023, 2021/2022 · Бакалавриат / Бакалавриат направление: 38.03.01 Экономика / Дисциплина общефакультетского пула / Магистратура / Маго-лего · рус
Нейробайесовские методы в машинном обучении

2021/2022 · Магистратура · рус