DSA Faculty
API
← к списку преподавателей

Ветров Дмитрий Петрович

Факультет компьютерных наук

Профиль на hse.ru ↗ тел.: +7 (495) 772-95-90 | 27252
Публикаций
86
Языков
1
Наград
11
Конференций
1
Профиль Публикации (86) Курсы (2)

Должности

  • Научный руководительФакультет компьютерных наук, Институт искусственного интеллекта и цифровых наук
  • Профессор-исследовательФакультет компьютерных наук, Департамент больших данных и информационного поиска

Био

  • · Начал работать в НИУ ВШЭ в 2014 году.
  • · Научно-педагогический стаж: 15 лет.

Образование

  • 2007 · Кандидат физико-математических наук: Московский государственный университет им. М.В. Ломоносова, специальность 01.01.09 «Дискретная математика и математическая кибернетика», тема диссертации: Влияние устойчивости алгоритмов классификации на точность их работы
  • 2003 · Специалитет: Московский государственный университет им. М.В. Ломоносова, специальность «Прикладная математика и информатика», квалификация «Математик. Системный программист»

Опыт работы

  • · 2017-н.в.: : руководитель центра глубинного обучения и байесовских методов (НИУ ВШЭ, факультет компьютерных наук)
  • · 2018-2020: : руководитель лаборатории компании Самсунг (НИУ ВШЭ, факультет компьютерных наук)
  • · 2016-н.в.: : профессор-исследователь (НИУ ВШЭ, факультет компьютерных наук)
  • · 2016-н.в.: : профессор-ислледователь (НИУ ВШЭ, факультет компьютерных наук)
  • · 2016-2018: : Яндекс, ведущий исследователь (полставки)
  • · 2015-2016: : Сколтех, доцент
  • · 2014-2016: : НИУ ВШЭ, факультет компьютерных наук, доцент (неполная ставка)
  • · 2014-2015: : МГУ, факультет вычислительной математики и кибернетики, доцент
  • · 2011-2014: : МГУ, факультет вычислительной математики и кибернетики, ассистент
  • · 2010-2012: : Курчатовский институт, НБИК-центр, зав. лабораторией (полставки)
  • · 2007-2011: : МГУ, факультет вычислительной математики и кибернетики, научный сотрудник
  • · 2005: Лето
  • · 2006: : Валлийский университет, Бангор, стажер
  • · 2000-2007: : Вычислительный центр им. А.А. Дородницына РАН, математик (полставки)

Награды и поощрения

  • · Благодарность НИУ ВШЭ (март 2024)
  • · Благодарственное письмо первого проректора НИУ ВШЭ (февраль 2023)
  • · Почетное звание "Почетный работник сферы образования Российской Федерации" (ноябрь 2022)
  • · Почетная грамота НИУ ВШЭ (февраль 2022)
  • · Почетная грамота НИУ ВШЭ (декабрь 2015)
  • · Золотая медаль Российского отделения Европейской академии за цикл научных работ по байесовской регуляризации и выводу в графических моделях (декабрь 2012)
  • · Стипендия Президента РФ для ведущих молодых ученых (июнь 2012)
  • · Надбавка за публикации, вносящие особый вклад в международную научную репутацию НИУ ВШЭ (2022–2025, 2021–2024)
  • · Надбавка за публикацию в международном рецензируемом научном издании (2019–2021, 2017–2019)
  • · Надбавка за статью в зарубежном рецензируемом журнале (2015–2017)
  • · Лучший преподаватель — 2019–2020, 2019

Гранты и проекты

  • · на соискание учёной степени кандидата наук

Конференции (1)

Показать все
  • · 2016: Advances in Neural Information Processing Systems 2016 (Барселона). Доклад: PerforatedCNNs: Acceleration through Elimination of Redundant Convolutions

Идентификаторы исследователя

Публикации (86)

Deterministic Decoding for Discrete Data in Variational Autoencoders

2020 · CHAPTER · en

MARS: Masked Automatic Ranks Selection in Tensor Decompositions

2020 · PREPRINT · en

Tensor decomposition methods have recently proven to be efficient for compressing and accelerating neural networks. However, the problem of optimal decomposition structure determination is still not well studied while being quite important. Specifically, decomposition ranks present the crucial parameter controlling the compression-accuracy trade-off. In this paper, we introduce MARS - a new efficient method for the automatic selection of ranks in general tensor decompositions. During training, the procedure learns binary masks over decomposition cores that "select" the optimal tensor structure. The learning is performed via relaxed maximum a posteriori (MAP) estimation in a specific Bayesian model. The proposed method achieves better results compared to previous works in various tasks.

Semi-Conditional Normalizing Flows for Semi-Supervised Learning

2019 · ARTICLE · en

This paper proposes a semi-conditional normalizing flow model for semi-supervised learning. The model uses both labeled and unlabeled data to learn an explicit model of joint distribution over objects and labels. Semi-conditional architecture of the model allows us to efficiently compute a value and gradients of the marginal likelihood for unlabeled objects. The conditional part of the model is based on a proposed conditional coupling layer. We demonstrate performance of the model for semi-supervised classification problem on different datasets. The model outperforms the baseline approach based on variational auto-encoders on MNIST dataset.

The Deep Weight Prior

2019 · CHAPTER · en

Bayesian inference is known to provide a general framework for incorporating prior knowledge or specific properties into machine learning models via carefully choosing a prior distribution. In this work, we propose a new type of prior distributions for convolutional neural networks, deep weight prior (DWP), that exploit generative models to encourage a specific structure of trained convolutional filters e.g., spatial correlations of weights. We define DWP in the form of an implicit distribution and propose a method for variational inference with such type of implicit priors. In experiments, we show that DWP improves the performance of Bayesian neural networks when training data are limited, and initialization of weights with samples from DWP accelerates training of conventional convolutional neural networks.

Efficient Language Modeling with Automatic Relevance Determination in Recurrent Neural Networks

2019 · CHAPTER · en

Reduction of the number of parameters is one of the most important goals in Deep Learning. In this article we propose an adaptation of Doubly Stochastic Variational Inference for Automatic Relevance Determination (DSVI-ARD) for neural networks compression. We find this method to be especially useful in language modeling tasks, where large number of parameters in the input and output layers is often excessive. We also show that DSVI-ARD can be applied together with encoder-decoder weight tying allowing to achieve even better sparsity and performance. Our experiments demonstrate that more than 90% of the weights in both encoder and decoder layers can be removed with a minimal quality loss.

Structured Sparsification of Gated Recurrent Neural Networks

2019 · CHAPTER · en

Recently, a lot of techniques were developed to sparsify the weights of neural networks and to remove structure units from the networks, e. g. neurons. We further develop this approach for the gated recurrent architectures. Specifically, in addition to the sparsification of weights and neurons, we propose sparsifying the preactivations of gates. This makes some gates constant and simplifies LSTM structure. We test our approach on the text classification and language modeling tasks. We observe that the resulting structure of gate sparsity depends on the task and connect the learned structure to the specifics of the particular tasks. Our method also improves neuron-wise compression of the model in most of the tasks.

Doubly Semi-Implicit Variational Inference

2019 · CHAPTER · en

We extend the existing framework of semi-implicit variational inference (SIVI) and introduce doubly semi-implicit variational inference (DSIVI), a way to perform variational inference and learning when both the approximate posterior and the prior distribution are semi-implicit. In other words, DSIVI performs inference in models where the prior and the posterior can be expressed as an intractable infinite mixture of some analytic density with a highly flexible implicit mixing distribution. We provide a sandwich bound on the evidence lower bound (ELBO) objective that can be made arbitrarily tight. Unlike discriminator-based and kernel-based approaches to implicit variational inference, DSIVI optimizes a proper lower bound on ELBO that is asymptotically exact. We evaluate DSIVI on a set of problems that benefit from implicit priors. In particular, we show that DSIVI gives rise to a simple modification of VampPrior, the current state-of-the-art prior for variational autoencoders, which improves its performance.

Variance Networks: When Expectation Does Not Meet Your Expectations

2019 · CHAPTER · en

Ordinary stochastic neural networks mostly rely on the expected values of their weights to make predictions, whereas the induced noise is mostly used to capture the uncertainty, prevent overfitting and slightly boost the performance through test-time averaging. In this paper, we introduce variance layers, a different kind of stochastic layers. Each weight of a variance layer follows a zero-mean distribution and is only parameterized by its variance. We show that such layers can learn surprisingly well, can serve as an efficient exploration tool in reinforcement learning tasks and provide a decent defense against adversarial attacks. We also show that a number of conventional Bayesian neural networks naturally converge to such zero-mean posteriors. We observe that in these cases such zero-mean parameterization leads to a much better training objective than conventional parameterizations where the mean is being learned.

Semi-Conditional Normalizing Flows for Semi-Supervised Learning

2019 · CHAPTER · en

Low-variance Gradient Estimates for the Plackett-Luce Distribution

2019 · PREPRINT · en

Learning models with discrete latent variables using stochastic gradient descent remains a challenge due to the high variance of gradients. Modern variance reduction techniques mostly consider categorical distributions and have limited applicability when the number of possible outcomes becomes large. In this work, we consider models with latent permutations and propose control variates for the Plackett-Luce distribution. Our proof-of-concept experiment recasts optimization over permutations as a variational optimization w.r.t. the Plackett-Luce distribution and solves it using stochastic gradient descent.

Курсы (2)