Ветров Дмитрий Петрович

Факультет компьютерных наук

Профиль на hse.ru ↗ тел.: +7 (495) 772-95-90 | 27252

Публикаций

Языков

Наград

Конференций

Профиль Публикации (86) Курсы (2)

Должности

Научный руководитель — Факультет компьютерных наук, Институт искусственного интеллекта и цифровых наук
Профессор-исследователь — Факультет компьютерных наук, Департамент больших данных и информационного поиска

Био

· Начал работать в НИУ ВШЭ в 2014 году.
· Научно-педагогический стаж: 15 лет.

Образование

2007 · Кандидат физико-математических наук: Московский государственный университет им. М.В. Ломоносова, специальность 01.01.09 «Дискретная математика и математическая кибернетика», тема диссертации: Влияние устойчивости алгоритмов классификации на точность их работы
2003 · Специалитет: Московский государственный университет им. М.В. Ломоносова, специальность «Прикладная математика и информатика», квалификация «Математик. Системный программист»

Опыт работы

· 2017-н.в.: : руководитель центра глубинного обучения и байесовских методов (НИУ ВШЭ, факультет компьютерных наук)
· 2018-2020: : руководитель лаборатории компании Самсунг (НИУ ВШЭ, факультет компьютерных наук)
· 2016-н.в.: : профессор-исследователь (НИУ ВШЭ, факультет компьютерных наук)
· 2016-н.в.: : профессор-ислледователь (НИУ ВШЭ, факультет компьютерных наук)
· 2016-2018: : Яндекс, ведущий исследователь (полставки)
· 2015-2016: : Сколтех, доцент
· 2014-2016: : НИУ ВШЭ, факультет компьютерных наук, доцент (неполная ставка)
· 2014-2015: : МГУ, факультет вычислительной математики и кибернетики, доцент
· 2011-2014: : МГУ, факультет вычислительной математики и кибернетики, ассистент
· 2010-2012: : Курчатовский институт, НБИК-центр, зав. лабораторией (полставки)
· 2007-2011: : МГУ, факультет вычислительной математики и кибернетики, научный сотрудник
· 2005: Лето
· 2006: : Валлийский университет, Бангор, стажер
· 2000-2007: : Вычислительный центр им. А.А. Дородницына РАН, математик (полставки)

Награды и поощрения

· Благодарность НИУ ВШЭ (март 2024)
· Благодарственное письмо первого проректора НИУ ВШЭ (февраль 2023)
· Почетное звание "Почетный работник сферы образования Российской Федерации" (ноябрь 2022)
· Почетная грамота НИУ ВШЭ (февраль 2022)
· Почетная грамота НИУ ВШЭ (декабрь 2015)
· Золотая медаль Российского отделения Европейской академии за цикл научных работ по байесовской регуляризации и выводу в графических моделях (декабрь 2012)
· Стипендия Президента РФ для ведущих молодых ученых (июнь 2012)
· Надбавка за публикации, вносящие особый вклад в международную научную репутацию НИУ ВШЭ (2022–2025, 2021–2024)
· Надбавка за публикацию в международном рецензируемом научном издании (2019–2021, 2017–2019)
· Надбавка за статью в зарубежном рецензируемом журнале (2015–2017)
· Лучший преподаватель — 2019–2020, 2019

Гранты и проекты

— · на соискание учёной степени кандидата наук

Конференции (1)

Показать все

· 2016: Advances in Neural Information Processing Systems 2016 (Барселона). Доклад: PerforatedCNNs: Acceleration through Elimination of Redundant Convolutions

Идентификаторы исследователя

ORCID: 0000-0001-6863-9028
ResearcherID: H-4870-2015
SPIN РИНЦ: 4339-7570
Google Scholar: https://scholar.google.ru/citations?user=7HU0UoUAAAAJ&hl=ru
Scopus AuthorID: 8382687000

Публикации (86)

Subspace Inference for Bayesian Deep Learning

2019 · CHAPTER · en

Bayesian inference was once a gold standard for learning with neural networks, providing accurate full predictive distributions and well calibrated uncertainty. However, scaling Bayesian inference techniques to deep neural networks is challenging due to the high dimensionality of the parameter space. In this paper, we construct low-dimensional subspaces of parameter space, such as the first principal components of the stochastic gradient descent (SGD) trajectory, which contain diverse sets of high performing models. In these subspaces, we are able to apply elliptical slice sampling and variational inference, which struggle in the full parameter space. We show that Bayesian model averaging over the induced posterior in these subspaces produces accurate predictions and well-calibrated predictive uncertainty for both regression and image classification.

A Prior of a Googol Gaussians: a Tensor Ring Induced Prior for Generative Models

2019 · CHAPTER · en

Generative models produce realistic objects in many domains, including text, image, video, and audio synthesis. Most popular models—Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs)—usually employ a standard Gaussian distribution as a prior. Previous works show that the richer family of prior distributions may help to avoid the mode collapse problem in GANs and to improve the evidence lower bound in VAEs. We propose a new family of prior distributions—Tensor Ring Induced Prior (TRIP)—that packs an exponential number of Gaussians into a high-dimensional lattice with a relatively small number of parameters. We show that these priors improve Fréchet Inception Distance for GANs and Evidence Lower Bound for VAEs. We also study generative models with TRIP in the conditional generation setup with missing conditions. Altogether, we propose a novel plug-and-play framework for generative models that can be utilized in any GAN and VAE-like architectures.

A Simple Baseline for Bayesian Uncertainty in Deep Learning

2019 · CHAPTER · en

We propose SWA-Gaussian (SWAG), a simple, scalable, and general purpose approach for uncertainty representation and calibration in deep learning. Stochastic Weight Averaging (SWA), which computes the first moment of stochastic gradient descent (SGD) iterates with a modified learning rate schedule, has recently been shown to improve generalization in deep learning. With SWAG, we fit a Gaussian using the SWA solution as the first moment and a low rank plus diagonal covariance also derived from the SGD iterates, forming an approximate posterior distribution over neural network weights; we then sample from this Gaussian distribution to perform Bayesian model averaging. We empirically find that SWAG approximates the shape of the true posterior, in accordance with results describing the stationary distribution of SGD iterates. Moreover, we demonstrate that SWAG performs well on a wide variety of tasks, including out of sample detection, calibration, and transfer learning, in comparison to many popular alternatives including variational inference, MC dropout, KFAC Laplace, and temperature scaling.

The Implicit Metropolis-Hastings Algorithm

2019 · CHAPTER · en

Recent works propose using the discriminator of a GAN to filter out unrealistic samples of the generator. We generalize these ideas by introducing the implicit Metropolis-Hastings algorithm. For any implicit probabilistic model and a target distribution represented by a set of samples, implicit Metropolis-Hastings operates by learning a discriminator to estimate the density-ratio and then generating a chain of samples. Since the approximation of density ratio introduces an error on every step of the chain, it is crucial to analyze the stationary distribution of such chain. For that purpose, we present a theoretical result stating that the discriminator loss upper bounds the total variation distance between the target distribution and the stationary distribution. Finally, we validate the proposed algorithm both for independent and Markov proposals on CIFAR-10, CelebA, ImageNet datasets.

Importance Weighted Hierarchical Variational Inference

2019 · CHAPTER · en

Variational Inference is a powerful tool in the Bayesian modeling toolkit, however, its effectiveness is determined by the expressivity of the utilized variational distributions in terms of their ability to match the true posterior distribution. In turn, the expressivity of the variational family is largely limited by the requirement of having a tractable density function. To overcome this roadblock, we introduce a new family of variational upper bounds on a marginal log-density in the case of hierarchical models (also known as latent variable models). We then derive a family of increasingly tighter variational lower bounds on the otherwise intractable standard evidence lower bound for hierarchical variational distributions, enabling the use of more expressive approximate posteriors. We show that previously known methods, such as Hierarchical Variational Models, Semi-Implicit Variational Inference and Doubly Semi-Implicit Variational Inference can be seen as special cases of the proposed approach, and empirically demonstrate superior performance of the proposed method in a set of experiments.

Variational Autoencoder with Arbitrary Conditioning

2019 · CHAPTER · en

We propose a single neural probabilistic model based on variational autoencoder that can be conditioned on an arbitrary subset of observed features and then sample the remaining features in "one shot". The features may be both real-valued and categorical. Training of the model is performed by stochastic variational Bayes. The experimental evaluation on synthetic data, as well as feature imputation and image inpainting problems, shows the effectiveness of the proposed approach and diversity of the generated samples.

A Simple Method to Evaluate Support Size and Non-uniformity of a Decoder-Based Generative Model

2019 · ARTICLE · en

Theoretical analysis in [1] suggested that adversarially trained generative models are naturally inclined to learn distribution with low support. In particular, this effect is caused by the limited capacity of the discriminator network. To verify this claim, [2] proposed a statistical test based on the birthday paradox that partially confirmed the analysis. In this paper, we continue this line of work and develop a parameter-free and straightforward method to estimate the support size of an arbitrary decoder-based generative model. Our approach considers the decoder network from a geometric viewpoint and evaluates the support size as the volume of the manifold containing the generative model samples. Additionally, we propose a method to measure non-uniformity of a generative model that can provide additional insight into the model’s behavior. We then apply these tools to perform a quantitative comparison of common generative models.

DOI ↗ PDF ↗

Predictive Model for the Bottomhole Pressure based on Machine Learning

2018 · ARTICLE · en

The objective of this work is to develop a predictive model for multiphase wellbore flows using the machine learning approach. The artificial neural network is developed and then trained on the dataset generated using the numerical simulator of the full-scale transient wellbore flows. After the training is completed, the neural network is used to predict one of the key parameters of the wellbore flow, namely, the bottomhole pressure. The novelty of this work is related to the application of the neural network to analyze highly transient processes taking place in wellbores. In such processes, most of the parameters of interest can be represented by interdependent time series of variables linked through complex physical phenomena pertinent to the nature of multiphase flows. The proposed neural network with two hidden layers demonstrated the capability to predict the bottomhole pressure within 5% of the normalized root mean squared error for many complex wellbore configurations and flows. It is also shown that relatively higher prediction errors are mainly observed in the case of slug flows where the transient nature of flows is pronounced the most. Finally, the developed model is tested on data affected by noise. It is demonstrated that although the error of prediction slightly increases in contrast to the data without noise, the model captures essential features of the studied transient process. Description of the developed models, analysis of various test use cases, and possible future research directions are outlined.

DOI ↗ PDF ↗

Bayesian Compression for Natural Language Processing

2018 · CHAPTER · en

In natural language processing, a lot of the tasks are successfully solved with recurrent neural networks, but such models have a huge number of parameters. The majority of these parameters are often concentrated in the embedding layer, which size grows proportionally to the vocabulary length. We propose a Bayesian sparsification technique for RNNs which allows compressing the RNN dozens or hundreds of times without time-consuming hyperparameters tuning. We also generalize the model for vocabulary sparsification to filter out unnecessary words and compress the RNN even further. We show that the choice of the kept words is interpretable.

Doubly Semi-Implicit Variational Inference

2018 · PREPRINT · en

We extend the existing framework of semi-implicit variational inference (SIVI) and introduce doubly semi-implicit variational inference (DSIVI), a way to perform variational inference and learning when both the approximate posterior and the prior distribution are semi-implicit. In other words, DSIVI performs inference in models where the prior and the posterior can be expressed as an intractable infinite mixture of some analytic density with a highly flexible implicit mixing distribution. We provide a sandwich bound on the evidence lower bound (ELBO) objective that can be made arbitrarily tight. Unlike discriminator-based and kernel-based approaches to implicit variational inference, DSIVI optimizes a proper lower bound on ELBO that is asymptotically exact. We evaluate DSIVI on a set of problems that benefit from implicit priors. In particular, we show that DSIVI gives rise to a simple modification of VampPrior, the current state-of-the-art prior for variational autoencoders, which improves its performance.

PDF ↗

Курсы (2)

Байесовские методы в машинном обучении · 5 раза

2025/2026, 2024/2025, 2023/2024, 2022/2023, 2021/2022 · Бакалавриат / Бакалавриат направление: 38.03.01 Экономика / Дисциплина общефакультетского пула / Магистратура / Маго-лего · рус
Нейробайесовские методы в машинном обучении

2021/2022 · Магистратура · рус