Миркин Борис Григорьевич
Факультет экономических наук
Профессиональные интересы
Должности
- Ведущий научный сотрудник — Факультет экономических наук, Международный центр анализа и выбора решений
- Профессор — Факультет компьютерных наук, Департамент анализа данных и искусственного интеллекта
Био
- · Начал работать в НИУ ВШЭ в 2008 году.
- · Научно-педагогический стаж: 51 год.
Образование
- 1990 · Доктор наук: Институт системных исследований АН СССР и ГКНТ, специальность 05.13.01 «Системный анализ, управление и обработка информации», тема диссертации: Матрично-аппроксимационный подход к анализу разнотипной информации
- 1975 · Старший научный сотрудник
- 1975 · Ученое звание: Доцент
- 1966 · Кандидат наук: специальность 01.01.06 «Математическая логика, алгебра и теория чисел», тема диссертации: Некоторые построения в алгебре событий
- 1964 · Специалитет: Саратовский государственный университет им. Н.Г. Чернышевского, специальность «Математика», квалификация «Математик»
Опыт работы
- · 1967-1982: Мнс, снс (1968), зав. сектором (1975) Институт Экономики и организации промышленного производства СО АН СССР, г. Новосибирск
- · 1982-2000: Снс, внс (1989) Центральный экономико-математический институт РАН РФ, г. Москва
- · 2000-2008: Профессор Биркбек колледж Лондонского университета, Лондон, Великобритания
- · 2008-н.вр. Профессор Национальный исследовательский университет Высшая школа экономики, г. Москва, РФ
- · Совместительство
- · 1970-1982: Доцент Кафедра мат. методов в экономике, Новосибирский госуниверситет МВССО РФ, г. Новосибирск
- · 1984-1991: Доцент Кафедра высшей математики и статистики, Высшая школа профдвижения ВЦСПС СССР, г. Москва
- · 2008-2010: Профессор Биркбек колледж Лондонского университета, г. Лондон, Великобритания
- · 2009-2024 г.: Внс Международная лаборатория анализа и выбора решений, НИУ ВШЭ, г. Москва, РФ
Награды и поощрения
- · Благодарность НИУ ВШЭ (апрель 2025)
- · Почетная грамота Министерства науки и высшего образования Российской Федерации (июль 2023)
- · Медаль "Признание - 15 лет успешной работы" НИУ ВШЭ (декабрь 2022)
- · Почетный знак II степени Высшей школы экономики (март 2018)
- · Почётная грамота Факультета компьютерных наук НИУ ВШЭ (декабрь 2017)
- · Почетная грамота Высшей школы экономики (декабрь 2016)
- · Премия Высшей школы экономики "Золотая Вышка" (декабрь 2012)
- · Благодарность Высшей школы экономики (ноябрь 2012)
- · Надбавка за публикацию в международном рецензируемом научном издании (2020–2021, 2018–2020)
- · Надбавка за регулярные публикации в международных рецензируемых научных изданиях (2021–2026)
- · Надбавка за статью в зарубежном рецензируемом журнале (2015–2017, 2013–2015, 2011–2013)
- · Надбавка за статью в зарубежном рецензируемом научном издании (2016–2018)
- · Лучший преподаватель — 2013, 2011
- · Лауреат премии "Золотая Вышка" 2012 в номинации Достижения в науке
Гранты и проекты
- — · на соискание учёной степени кандидата наук
Конференции (3)
Показать все
- · 2021: 2021 IEEE International Conference on Artificial Intelligence and Big Data Analytics (Bandung). Доклад: Method for Taxonomic Content-Analysis of Text Collections
- · 2020: Научно-практическая конференция "Наука о данных" (Санкт-Петербург). Доклад: Наука о данных и статистика: история, содержание, перспективы
- · 2015: Systems Analysis 2015 (Vienna). Доклад: Data Summarization at Clustering and Ranking
Идентификаторы исследователя
- ORCID:
0000-0001-5470-8635 - ResearcherID:
B-6591-2013 - SPIN РИНЦ:
3067-5534 - Google Scholar: https://scholar.google.com/citations?user=MpNtL58AAAAJ
- Scopus AuthorID:
35428028600
Публикации (147)
Finding an appropriate generalization for a fuzzy thematic set in taxonomy
2018 · PREPRINT · en
This paper proposes a novel method, referred to as ParGenFS, for finding a most specific generalization of a query set, represented by a fuzzy set of topics assigned to leaves of the rooted tree of a taxonomy. This generalization lifts the query set to one or several head subjects in the higher ranks of the taxonomy. The head subject is supposed to tightly cover the query set, however dispersed that can be over branches of the tree, possibly bringing in some gaps, that are taxonomy nodes covered by the head subject but irrelevant to the set. To balance that, we admit some offshoots, that are nodes belonging to the query set but not covered by the head subject. The method globally minimizes the total number of head subjects and gaps and offshoots, differently weighted. Our algorithm is applied to the structural analysis and description of a collection of 17685 abstracts of research papers published in 17 Springer journals on data science for the 20-years period 1998–2017. Our taxonomy of Data Science (DST) is extracted from the international Association for Computing Machinery Computing Classification System 2012 (ACM-CCS), a six-layer hierarchical taxonomy manually developed by a team of ACM experts. The DST also involves a number of additions detailing the leaves of the ACM-CCS taxonomy and added by ourselves. We find fuzzy clusters of leaf topics over the text collection, with a specially developed machinery. Three of the clusters are thematic indeed, relating to Data Science sub-areas: (a) learning, (b) information retrieval, and (c) clustering. These three clusters are lifted with ParGenFS in the DST, which allows us to make some conclusions of the tendencies of the developments in these areas.
Applying Anomalous Cluster Approach to Spatial Clustering
2017 · CHAPTER · en
The concept of anomalous clustering applies to finding individual clusters on a digital geography map supplied with a single feature such as brightness or temperature. An algorithm derived within the individual anomalous cluster framework extends the so-called region growing algorithms. Yet our approach differs in that the algorithm parameter values are not expert-driven but rather derived from the anomalous clustering model. This novel framework successfully applies to the issue of automatically delineating coastal upwelling from Sea Surface Temperature (SST) maps, a natural phenomenon seasonally occurring in coastal waters.
The Minkowski central partition as a pointer to a suitable distance exponent and consensus partitioning
2017 · ARTICLE · en
The Minkowski weighted K-means (MWK-means) is a recently developed clustering algorithm capable of computing feature weights. The cluster-specific weights in MWK-means follow the intuitive idea that a feature with low variance should have a greater weight than a feature with high variance. The final clustering found by this algorithm depends on the selection of the Minkowski distance exponent. This paper explores the possibility of using the central Minkowski partition in the ensemble of all Minkowski partitions for selecting an optimal value of the Minkowski exponent. The central Minkowski partition appears to be also a good consensus partition. Furthermore, we discovered some striking correlation results between the Minkowski profile, defined as a mapping of the Minkowski exponent values into the average similarity values of the optimal Minkowski partitions, and the Adjusted Rand Index vectors resulting from the comparison of the obtained partitions to the ground truth. Our findings were confirmed by a series of computational experiments involving synthetic Gaussian clusters and real-world data
International Joint Conference on Rough Sets, Springer, Cham.
2017 · BOOK · en
We present a new method for cluster analysis that nds a composite \super- cluster" consisting of two non-overlapping parts: a tight core and a less connected shell. We expand this approach to data that changes over time by assuming that the core is unchangeable, while the shell depends on the time period. We de ne a data recovery approximation model of a dynamic supercluster, and present a suboptimal algorithm for nding superclusters.
Ideal type model and an associated method for relational fuzzy clustering
2017 · CHAPTER · en
The ideal type model by Mirkin and Satarov (1990) expresses data points as convex combinations of some `ideal type' points. However, this model cannot prevent the ideal type points being far away from the observations and, in fact, requires that. Archetypal analysis by Cutler and Breiman (1994) and proportional membership fuzzy clustering by Nascimento et al. (2003) propose different ways of avoiding this entrapment. We propose one more way out - by assuming the ideal types being mutually orthogonal and transforming the model by multiplying it over its transpose. The obtained additive fuzzy clustering model for relational data is akin to that more recently analysed by Mirkin and Nascimento (2012) in a different context. The one-by-one clustering approach to the ideal type model is reformulated here as that naturally leading to a spectral clustering algorithm for finding fuzzy membership vectors. The algorithm is proven to be computationally valid and competitive against popular relational fuzzy clustering algorithms.
Visualization of Dynamic Reference Graphs
2016 · CHAPTER · en
We present a tool for dynamic reference graph visualization. A reference graph is a graph based on key phrases retrieved from a timeindexed natural language text corpus. This tool may be useful for the analysis of connected pairs of latent topics, changes in the significance of these topics as well as in the relationship between them over various time periods
A-Wardpβ: Effective hierarchical clustering using the Minkowski metric and a fast k-means initialisation
2016 · ARTICLE · en
In this paper we make two novel contributions to hierarchical clustering. First, we introduce an anomalous pattern initialisation method for hierarchical clustering algorithms, called A-Ward, capable of substantially reducing the time they take to converge. This method generates an initial partition with a sufficiently large number of clusters. This allows the cluster merging process to start from this partition rather than from a trivial partition composed solely of singletons. Our second contribution is an extension of the Ward and Wardp algorithms to the situation where the feature weight exponent can differ from the exponent of the Minkowski distance. This new method, called A-Wardpβ, is able to generate a much wider variety of clustering solutions. We also demonstrate that its parameters can be estimated reasonably well by using a cluster validity index. We perform numerous experiments using data sets with two types of noise, insertion of noise features and blurring within-cluster values of some features. These experiments allow us to conclude: (i) our anomalous pattern initialisation method does indeed reduce the time a hierarchical clustering algorithm takes to complete, without negatively impacting its cluster recovery ability; (ii) A-Wardpβ provides better cluster recovery than both Ward and Wardp.
A Lattice-based Consensus Clustering Algorithm
2016 · CHAPTER · en
We propose a new algorithm for consensus clustering, FCA-Consensus, based on Formal Concept Analysis. As the input, the algorithm takes T partitions of a certain set of objects obtained by k-means algorithm after T runs from different initialisations. The resulting consensus partition is extracted from an antichain of the concept lattice built on a formal context objects×classes, where the classes are the set of all cluster labels from each initial k-means partition. We compare the results of the proposed algorithm in terms of ARI measure with the state-of-the-art algorithms on synthetic datasets. Under certain conditions, the best ARI values are demonstrated by FCA-Consensus.
Approximate clusters, biclusters and n-clusters in the analysis of binary and general data matrices
2016 · CHAPTER · en
Аpproximate cluster structures are those of formal concepts and n-concepts with added numerical intensity weights. The talk presents theoretical results and computational methods for approximate clustering and n-clustering as extensions of the algebraic-geometrical properties of numerical matrices (SVD and the like) to the situations where one or most of elements of the solutions to be found are expressed by binary vectors. The theory embraces such methods as k-means, consensus clustering, network clustering, biclusters and triclusters and provides natural data analysis criteria, effective algorithms and interpretation tools.
TIED RANKINGS, ORDERED PARTITIONS, AND WEAK ORDERS: DISTANCE AND CONSENSUS
2016 · PREPRINT · en
Ranking is an important part of several areas of contemporary research, including social sciences, decision theory, data analysis and information retrieval. The goal of this project is to align developments in quantitative social sciences and decision theory with the current thought in computer science, including a few novel results. Specifically, we consider binary preference relations, the so-called weak orders that are in one-to-one correspondence with rankings. We show that the conventional symmetric difference distance between weak orders, considered as sets of ordered pairs, coincides with the celebrated Kemeny distance between the corresponding rankings, despite the seemingly much simpler structure of the former. Based on this, we review several properties of the geometric space of weak orders involving the ternary relation “between”, and contingency tables for cross-partitions. Next we reformulate the consensus ranking problem as a variant of finding an optimal linear ordering, given a correspondingly defined consensus matrix. The difference is in a subtracted term, the partition concentration, that depends only on the distribution of the objects in the individual parts. We apply our results to the conventional Likert scale to show that the Kemeny consensus rule is rather insensitive to the data under consideration and, therefore, should be supplemented with more sensitive consensus schemes.
Курсы (6)
-
Basics of Data Analysis · 3 раза
2025/2026, 2024/2025, 2023/2024 · Магистратура / Маго-лего · Анг
-
Основные методы анализа данных · 5 раза
2025/2026, 2024/2025, 2023/2024, 2022/2023, 2021/2022 · Бакалавриат / Маго-лего · рус
-
Mentor's Seminar · 3 раза
2025/2026, 2024/2025, 2023/2024 · Магистратура · Анг
-
Data Analysis
2023/2024 · Магистратура / Маго-лего · Анг
-
01.04.02. Прикладная математика и информатика
2022/2023 · Магистратура · Анг
-
Modern Methods of Data Analysis · 2 раза
2022/2023, 2021/2022 · Магистратура / Маго-лего · Анг