DSA Faculty
API
← к списку преподавателей

Котельников Аким Константинович

Факультет компьютерных наук

Публикаций
3
Языков
1
Наград
1
Конференций
0
Профиль Публикации (3) Курсы (3)

Профессиональные интересы

анализ данных

Должности

  • ПреподавательФакультет компьютерных наук, Департамент больших данных и информационного поиска, Базовая кафедра Яндекс
  • Младший научный сотрудникФакультет компьютерных наук, Департамент больших данных и информационного поиска, Научно-учебная лаборатория компании Яндекс
  • АспирантФакультет компьютерных наук, Департамент больших данных и информационного поиска, Научно-учебная лаборатория компании Яндекс

Био

  • · Начал работать в НИУ ВШЭ в 2022 году.
  • · Научно-педагогический стаж: 2 года.

Образование

  • 2024 · Магистратура: Национальный исследовательский университет "Высшая школа экономики", специальность «Прикладная математика и информатика», квалификация «Магистр»
  • 2022 · Бакалавриат: Национальный исследовательский университет "Высшая школа экономики", специальность «Прикладная математика и информатика», квалификация «Бакалавр»

Опыт работы

  • · 2021: ML engineer стажер в Яндекс. (июль октябрь 2021) ML research scientist в Yandex.Research и НУЛ Яндекс (февраль
  • · 2021: сейчас)

Награды и поощрения

  • · Надбавка за публикации, вносящие особый вклад в международную научную репутацию НИУ ВШЭ (2024–2027)

Идентификаторы исследователя

Публикации (3)

TabM: Advancing tabular deep learning with parameter-efficient ensembling

2025 · CHAPTER · en

Deep learning architectures for supervised learning on tabular data range from simple multilayer perceptrons (MLP) to sophisticated Transformers and retrieval-augmented methods. This study highlights a major, yet so far overlooked opportunity for substantially improving tabular MLPs: namely, parameter-efficient ensembling -- a paradigm for implementing an ensemble of models as one model producing multiple predictions. We start by developing TabM -- a simple model based on MLP and our variations of BatchEnsemble (an existing technique). Then, we perform a large-scale evaluation of tabular DL architectures on public benchmarks in terms of both task performance and efficiency, which renders the landscape of tabular DL in a new light. Generally, we show that MLPs, including TabM, form a line of stronger and more practical models compared to attention- and retrieval-based architectures. In particular, we find that TabM demonstrates the best performance among tabular DL models. Lastly, we conduct an empirical analysis on the ensemble-like nature of TabM. For example, we observe that the multiple predictions of TabM are weak individually, but powerful collectively. Overall, our work brings an impactful technique to tabular DL, analyses its behaviour, and advances the performance-efficiency trade-off with TabM -- a simple and powerful baseline for researchers and practitioners.

TabR: Tabular Deep Learning Meets Nearest Neighbors

2024 · CHAPTER · en

Deep learning (DL) models for tabular data problems (e.g. classification, regression) are currently receiving increasingly more attention from researchers. However, despite the recent efforts, the non-DL algorithms based on gradient-boosted decision trees (GBDT) remain a strong go-to solution for these problems. One of the research directions aimed at improving the position of tabular DL involves designing so-called retrieval-augmented models. For a target object, such models retrieve other objects (e.g. the nearest neighbors) from the available training data and use their features and labels to make a better prediction. In this work, we present TabR -- essentially, a feed-forward network with a custom k-Nearest-Neighbors-like component in the middle. On a set of public benchmarks with datasets up to several million objects, TabR marks a big step forward for tabular DL: it demonstrates the best average performance among tabular DL models, becomes the new state-of-the-art on several datasets, and even outperforms GBDT models on the recently proposed "GBDT-friendly" benchmark (see Figure 1). Among the important findings and technical details powering TabR, the main ones lie in the attention-like mechanism that is responsible for retrieving the nearest neighbors and extracting valuable signal from them. In addition to the higher performance, TabR is simple and significantly more efficient compared to prior retrieval-based tabular DL models.

TabDDPM: Modelling Tabular Data with Diffusion Models

2023 · CHAPTER · en

Denoising diffusion probabilistic models are becoming the leading generative modeling paradigm for many important data modalities. Being the most prevalent in the computer vision community, diffusion models have recently gained some attention in other domains, including speech, NLP, and graph-like data. In this work, we investigate if the framework of diffusion models can be advantageous for general tabular problems, where data points are typically represented by vectors of heterogeneous features. The inherent heterogeneity of tabular data makes it quite challenging for accurate modeling since the individual features can be of a completely different nature, i.e., some of them can be continuous and some can be discrete. To address such data types, we introduce TabDDPM — a diffusion model that can be universally applied to any tabular dataset and handles any feature types. We extensively evaluate TabDDPM on a wide set of benchmarks and demonstrate its superiority over existing GAN/VAE alternatives, which is consistent with the advantage of diffusion models in other fields.

Курсы (3)