Миркин Борис Григорьевич

Факультет экономических наук

Профиль на hse.ru ↗ тел.: +7(499) 152-0151 | 27776 | +7(963) 723-4021

Публикаций

147

Языков

Наград

Конференций

Профиль Публикации (147) Курсы (6)

Профессиональные интересы

математические методы анализа

Должности

Ведущий научный сотрудник — Факультет экономических наук, Международный центр анализа и выбора решений
Профессор — Факультет компьютерных наук, Департамент анализа данных и искусственного интеллекта

Био

· Начал работать в НИУ ВШЭ в 2008 году.
· Научно-педагогический стаж: 51 год.

Образование

1990 · Доктор наук: Институт системных исследований АН СССР и ГКНТ, специальность 05.13.01 «Системный анализ, управление и обработка информации», тема диссертации: Матрично-аппроксимационный подход к анализу разнотипной информации
1975 · Старший научный сотрудник
1975 · Ученое звание: Доцент
1966 · Кандидат наук: специальность 01.01.06 «Математическая логика, алгебра и теория чисел», тема диссертации: Некоторые построения в алгебре событий
1964 · Специалитет: Саратовский государственный университет им. Н.Г. Чернышевского, специальность «Математика», квалификация «Математик»

Опыт работы

· 1967-1982: Мнс, снс (1968), зав. сектором (1975) Институт Экономики и организации промышленного производства СО АН СССР, г. Новосибирск
· 1982-2000: Снс, внс (1989) Центральный экономико-математический институт РАН РФ, г. Москва
· 2000-2008: Профессор Биркбек колледж Лондонского университета, Лондон, Великобритания
· 2008-н.вр. Профессор Национальный исследовательский университет Высшая школа экономики, г. Москва, РФ
· Совместительство
· 1970-1982: Доцент Кафедра мат. методов в экономике, Новосибирский госуниверситет МВССО РФ, г. Новосибирск
· 1984-1991: Доцент Кафедра высшей математики и статистики, Высшая школа профдвижения ВЦСПС СССР, г. Москва
· 2008-2010: Профессор Биркбек колледж Лондонского университета, г. Лондон, Великобритания
· 2009-2024 г.: Внс Международная лаборатория анализа и выбора решений, НИУ ВШЭ, г. Москва, РФ

Награды и поощрения

· Благодарность НИУ ВШЭ (апрель 2025)
· Почетная грамота Министерства науки и высшего образования Российской Федерации (июль 2023)
· Медаль "Признание - 15 лет успешной работы" НИУ ВШЭ (декабрь 2022)
· Почетный знак II степени Высшей школы экономики (март 2018)
· Почётная грамота Факультета компьютерных наук НИУ ВШЭ (декабрь 2017)
· Почетная грамота Высшей школы экономики (декабрь 2016)
· Премия Высшей школы экономики "Золотая Вышка" (декабрь 2012)
· Благодарность Высшей школы экономики (ноябрь 2012)
· Надбавка за публикацию в международном рецензируемом научном издании (2020–2021, 2018–2020)
· Надбавка за регулярные публикации в международных рецензируемых научных изданиях (2021–2026)
· Надбавка за статью в зарубежном рецензируемом журнале (2015–2017, 2013–2015, 2011–2013)
· Надбавка за статью в зарубежном рецензируемом научном издании (2016–2018)
· Лучший преподаватель — 2013, 2011
· Лауреат премии "Золотая Вышка" 2012 в номинации Достижения в науке

Гранты и проекты

— · на соискание учёной степени кандидата наук

Конференции (3)

Показать все

· 2021: 2021 IEEE International Conference on Artificial Intelligence and Big Data Analytics (Bandung). Доклад: Method for Taxonomic Content-Analysis of Text Collections
· 2020: Научно-практическая конференция "Наука о данных" (Санкт-Петербург). Доклад: Наука о данных и статистика: история, содержание, перспективы
· 2015: Systems Analysis 2015 (Vienna). Доклад: Data Summarization at Clustering and Ranking

Идентификаторы исследователя

ORCID: 0000-0001-5470-8635
ResearcherID: B-6591-2013
SPIN РИНЦ: 3067-5534
Google Scholar: https://scholar.google.com/citations?user=MpNtL58AAAAJ
Scopus AuthorID: 35428028600

Публикации (147)

Агломеративный консенсусный кластер-анализ с автоматическим выбором числа кластеров

2024 · ARTICLE · ru

Представлены теоретические и вычислительные результаты, связанные с оригинальной моделью консенсусного кластерного анализа, основанной на так называемом проективном расстоянии между разбиениями. Это расстояние определяется как сумма квадратов элементов разности бинарной матрицы инциденций одного разбиения и ее ортогональной проекции на подпространство, порождаемое столбцами матрицы инциденций другого разбиения. Оказывается, при достаточном количестве разбиений предлагаемый метод агломеративного кластеринга правильно вычисляет не только консенсусное разбиение, но число кластеров в нем.

DOI ↗ PDF ↗

Inertia-Based Indices to Determine the Number of Clusters in K-Means: An Experimental Evaluation

2024 · ARTICLE · en

This paper gives an experimentally supported review and comparison of several indices based on the conventional K-means inertia criterion for determining the number of clusters, K, in datasets, using the popular Silhouette width index as a benchmark. Our experiments involve a novel version of the Elbow index, defined using values of K two or three steps apart. We also discuss alternative ways of computing the inertia and summarizing its values. Even though there are no overall winners in our experiments, some of our results are very conclusive and can be used as a guide for indices determining the number of clusters in K-means.

DOI ↗ PDF ↗

Community Detection in Feature-Rich Networks Using Gradient Descent Approach

2024 · CHAPTER · en

DOI ↗ PDF ↗

Базовые методы анализа данных : учебник и практикум для вузов

2024 · BOOK · ru

Анализ данных предмет, порожденный компьютерной революцией, приведшей к накоплению огромного количества данных о всевозможных совокупностях объектов, таких как страны и регионы, веб-сайты и теннисные турниры, работодатели и работники, товары и их производители. В отличие от классической математической статистики, анализ данных не пытается непосредственно вывести свойства окружающего мира, исходя из специально собранных данных, а ориентирован на отыскание каких-либо паттернов, структур, закономерностей в тех данных, какие есть. Основная цель анализа данных — обогащение теоретических представлений в той области науки или практики, к которой относятся данные (извлечение и порождение знаний). Исходя из того, что теоретическое знание выражается, прежде всего, через понятия и утверждения об их связи, а понятия выражаются признаками, основное внимание уделяется двум базовым задачам анализа данных. Это суммаризация (агрегирование или порождение признаков) и коррелирование (исследование связей между признаками). Изложение содержит большое количество примеров применения рассматриваемых понятий к анализу реальных данных. Учебник предназначен, прежде всего, для использования в обучении студентов бакалавриата и магистратуры инженерно-технических специальностей, однако он может использоваться и как пособие для самостоятельного изучения.

Versions of least-squares k-means algorithm for interval data

2024 · PREPRINT · en

Recently, k-means clustering has been extended to the so-called interval data. In contrast to conventional data case, the interval data feature values are intervals rather than single reals. This paper further explores the least-squares criterion for k-means clustering to tackle the issue of initialization, that is, finding a proper set of initial cluster centers at interval data clustering. Specifically, we extend, for the interval data, a Pythagorean decomposition of the data scatter in the sum of two items, one being a genuine k-means least-squares criterion, the other, a complementary criterion, requiring the clusters to be numerous and anomalous. Therefore we propose a method for one-byone obtaining anomalous clusters. After a run of the method, we start k-means iterations from the centers of the most numerous of the found anomalous clusters. We test and validate our proposed BIKM algorithm at versions of two newly introduced interval datasets.

PDF ↗

Core–shell clustering approach for detection and analysis of coastal upwelling

2023 · ARTICLE · en

A comprehensive approach is presented to analyze season’s coastal upwelling represented by weekly sea surface temperature (SST) image grids. The proposed model, core–shell clustering, assumes that the season’s upwelling can be divided into shorter periods of stability, time ranges, consisting of constant core and variable shell parts. A one-by-one core–shell clustering algorithm is provided. The algorithm parameters are automatically derived from the least-squares clustering criterion. The approach applies to SST gridded data for sixteen successive years (2004–2019) of coastal upwelling in the western Iberian coast, the northernmost branch of the Canary Current Upwelling System. Our results show that at each season, there are 3 to 5 time intervals, the ranges, at which the upwelling presents stable core patterns of relatively cold water surrounded by somewhat larger shell areas of warmer waters. Based on other experimental computations performed by our team, we conclude that this pattern is not just a purely local phenomenon but has a more global meaning. Inter-annual time series analysis are consistent among themselves and with existing expert domain knowledge.

DOI ↗ PDF ↗

Piece‐wise constant cluster modelling of dynamics of upwelling patterns

2023 · ARTICLE · en

A comprehensive approach is presented to analyse season's coastal upwelling represented by weekly sea surface temperature (SST) image grids. Our three-stage data recovery clustering method assumes that the season's upwelling can be divided into shorter periods of stability, ranges, each to be represented by a constant core and variable shell parts. Corresponding clustering algorithms parameters are automatically derived by using the least-squares clustering criterion. The approach has been successfully applied to real-world SST data covering two distinct regions: Portuguese coast and Morocco coast, for 16 years each.

DOI ↗ PDF ↗

A Straightforward Approach to Chi-Squared Analysis of Associations in Contingency Tables

2023 · CHAPTER · en

In contrast to conventional wisdom that Pearson’s chi-squared at a contingency table is a criterion of statistical independence, rather than a measure of association, this paper establishes operational meaning of the Pearson’s chi-squared as a measure of association. Its normalized version, phi-squared, is the average change of the probability of a category of a feature when a category of the other feature becomes known. Associations between individual categories are captured with Quetelet indexes between them. This allows for operational interpretation of associations between individual categories, which is illustrated at a number of examples from the literature.In contrast to conventional wisdom that Pearson’s chi-squared at a contingency table is a criterion of statistical independence, rather than a measure of association, this paper establishes operational meaning of the Pearson’s chi-squared as a measure of association. Its normalized version, phi-squared, is the average change of the probability of a category of a feature when a category of the other feature becomes known. Associations between individual categories are captured with Quetelet indexes between them. This allows for operational interpretation of associations between individual categories, which is illustrated at a number of examples from the literature.

DOI ↗

Community Partitioning over Feature-Rich Networks Using an Extended K-Means Method

2022 · ARTICLE · en

This paper proposes a meaningful and effective extension of the celebrated K-means algorithm to detect communities in feature-rich networks, due to our assumption of non-summability mode. We least-squares approximate given matrices of inter-node links and feature values, leading to a straightforward extension of the conventional K-means clustering method as an alternating minimization strategy for the criterion. This works in a two-fold space, embracing both the network nodes and features. The metric used is a weighted sum of the squared Euclidean distances in the feature and network spaces. To tackle the so-called curse of dimensionality, we extend this to a version that uses the cosine distances between entities and centers. One more version of our method is based on the Manhattan distance metric. We conduct computational experiments to test our method and compare its performances with those by competing popular algorithms at synthetic and real-world datasets. The cosine-based version of the extended K-means typically wins at the high-dimension real-world datasets. In contrast, the Manhattan-based version wins at most synthetic datasets. View Full-Text

DOI ↗ PDF ↗

Community Detection in Feature-Rich Networks Using Data Recovery Approach

2022 · ARTICLE · en

The problem of community detection in a network with features at its nodes takes into account both the graph structure and node features. The goal is to find relatively dense groups of interconnected entities sharing some features in common. There have been several approaches proposed for that. We apply the so-called data recovery approach to the problem by combining the least-squares recovery criteria for both the graph structure and node features. In this way, we obtain a new clustering criterion and a corresponding algorithm for finding clusters one-by-one. We show that our proposed method is effective on real-world data, as well as on synthetic data involving either only quantitative features or only categorical attributes or both. In the cases at which attributes are categorical, state-of-the-art algorithms are available. Our algorithm appears competitive against them.

DOI ↗ PDF ↗

Курсы (6)

Basics of Data Analysis · 3 раза

2025/2026, 2024/2025, 2023/2024 · Магистратура / Маго-лего · Анг
Основные методы анализа данных · 5 раза

2025/2026, 2024/2025, 2023/2024, 2022/2023, 2021/2022 · Бакалавриат / Маго-лего · рус
Mentor's Seminar · 3 раза

2025/2026, 2024/2025, 2023/2024 · Магистратура · Анг
Data Analysis

2023/2024 · Магистратура / Маго-лего · Анг
01.04.02. Прикладная математика и информатика

2022/2023 · Магистратура · Анг
Modern Methods of Data Analysis · 2 раза

2022/2023, 2021/2022 · Магистратура / Маго-лего · Анг