Миркин Борис Григорьевич
Факультет экономических наук
Профессиональные интересы
Должности
- Ведущий научный сотрудник — Факультет экономических наук, Международный центр анализа и выбора решений
- Профессор — Факультет компьютерных наук, Департамент анализа данных и искусственного интеллекта
Био
- · Начал работать в НИУ ВШЭ в 2008 году.
- · Научно-педагогический стаж: 51 год.
Образование
- 1990 · Доктор наук: Институт системных исследований АН СССР и ГКНТ, специальность 05.13.01 «Системный анализ, управление и обработка информации», тема диссертации: Матрично-аппроксимационный подход к анализу разнотипной информации
- 1975 · Старший научный сотрудник
- 1975 · Ученое звание: Доцент
- 1966 · Кандидат наук: специальность 01.01.06 «Математическая логика, алгебра и теория чисел», тема диссертации: Некоторые построения в алгебре событий
- 1964 · Специалитет: Саратовский государственный университет им. Н.Г. Чернышевского, специальность «Математика», квалификация «Математик»
Опыт работы
- · 1967-1982: Мнс, снс (1968), зав. сектором (1975) Институт Экономики и организации промышленного производства СО АН СССР, г. Новосибирск
- · 1982-2000: Снс, внс (1989) Центральный экономико-математический институт РАН РФ, г. Москва
- · 2000-2008: Профессор Биркбек колледж Лондонского университета, Лондон, Великобритания
- · 2008-н.вр. Профессор Национальный исследовательский университет Высшая школа экономики, г. Москва, РФ
- · Совместительство
- · 1970-1982: Доцент Кафедра мат. методов в экономике, Новосибирский госуниверситет МВССО РФ, г. Новосибирск
- · 1984-1991: Доцент Кафедра высшей математики и статистики, Высшая школа профдвижения ВЦСПС СССР, г. Москва
- · 2008-2010: Профессор Биркбек колледж Лондонского университета, г. Лондон, Великобритания
- · 2009-2024 г.: Внс Международная лаборатория анализа и выбора решений, НИУ ВШЭ, г. Москва, РФ
Награды и поощрения
- · Благодарность НИУ ВШЭ (апрель 2025)
- · Почетная грамота Министерства науки и высшего образования Российской Федерации (июль 2023)
- · Медаль "Признание - 15 лет успешной работы" НИУ ВШЭ (декабрь 2022)
- · Почетный знак II степени Высшей школы экономики (март 2018)
- · Почётная грамота Факультета компьютерных наук НИУ ВШЭ (декабрь 2017)
- · Почетная грамота Высшей школы экономики (декабрь 2016)
- · Премия Высшей школы экономики "Золотая Вышка" (декабрь 2012)
- · Благодарность Высшей школы экономики (ноябрь 2012)
- · Надбавка за публикацию в международном рецензируемом научном издании (2020–2021, 2018–2020)
- · Надбавка за регулярные публикации в международных рецензируемых научных изданиях (2021–2026)
- · Надбавка за статью в зарубежном рецензируемом журнале (2015–2017, 2013–2015, 2011–2013)
- · Надбавка за статью в зарубежном рецензируемом научном издании (2016–2018)
- · Лучший преподаватель — 2013, 2011
- · Лауреат премии "Золотая Вышка" 2012 в номинации Достижения в науке
Гранты и проекты
- — · на соискание учёной степени кандидата наук
Конференции (3)
Показать все
- · 2021: 2021 IEEE International Conference on Artificial Intelligence and Big Data Analytics (Bandung). Доклад: Method for Taxonomic Content-Analysis of Text Collections
- · 2020: Научно-практическая конференция "Наука о данных" (Санкт-Петербург). Доклад: Наука о данных и статистика: история, содержание, перспективы
- · 2015: Systems Analysis 2015 (Vienna). Доклад: Data Summarization at Clustering and Ranking
Идентификаторы исследователя
- ORCID:
0000-0001-5470-8635 - ResearcherID:
B-6591-2013 - SPIN РИНЦ:
3067-5534 - Google Scholar: https://scholar.google.com/citations?user=MpNtL58AAAAJ
- Scopus AuthorID:
35428028600
Публикации (147)
Detection of an unspecified number of communities in feature-rich networks
2020 · CHAPTER · en
The problem of community detection in a network with features at its nodes takes into account both the graph structure and node features. The goal is to find relatively dense groups of interconnected entities sharing some features in common. Existing approaches require the number of communities pre-specified. We apply the so-called data recovery approach to allow a relaxation of the criterion for finding communities one-by-one. We show that our proposed method is effective on real-world data, as well as on synthetic data involving either only quantitative features or only categorical attributes or both. In the cases at which attributes are categorical, state-of-the-art algorithms are available. Our algorithm appears competitive against them. © 2020 CEUR-WS. All rights reserved.
Method for Generalization of Fuzzy Sets
2019 · CHAPTER · en
We define and find a most specific generalization of a fuzzy set of topics assigned to leaves of the rooted tree of a taxonomy. This generalization lifts the set to a “head subject” in the higher ranks of the taxonomy, that is supposed to “tightly” cover the query set, possibly bringing in some errors, both “gaps” and “offshoots”. The method globally minimizes a penalty combining head subjects and gaps and offshoots. We apply this to extract research tendencies from a collection of about 18000 research papers published in Springer journals on data science. We consider a taxonomy of Data Science based on the Association for Computing Machinery Classification of Computing System 2012 (ACM-CCS). We find fuzzy clusters of leaf topics over the text collection and use thematic clusters’ head subjects to make some comments on the tendencies of research.
Using Domain Taxonomy to Model Generalization of Thematic Fuzzy Clusters
2019 · CHAPTER · en
We define a most specific generalization of a fuzzy set of topics assigned to leaves of the rooted tree of a domain taxonomy. This generalization lifts the set to its 'head subject' in the higher ranks of the taxonomy tree. The head subject is supposed to 'tightly' cover the query set, possibly bringing in some errors, both 'gaps' and 'offshoots'. Our method globally minimizes a penalty function combining the numbers of head subjects and gaps and offshoots, differently weighted. We apply this to a collection of about 18000 research papers published in Springer journals on Data Science for the past 20 years. We extract a taxonomy of Data Science from the international Association for Computing Machinery Computing Classification System 2012 (ACM-CCS). We find fuzzy clusters of leaf topics over the text collection and use lifted head subjects of the thematic clusters to comment on the tendencies of current research in the corresponding aspects of the domain.
Core Data Analysis: Summarization, Correlation, and Visualization
2019 · BOOK · en
This book concentrates on in-depth explanation of a few methods to address core issues, rather than presentation of a multitude of methods that are popular among the scientists. An added value of this edition is that I am trying to address two features of the brave new world that materialized after the first edition was written in 2010. These features are the emergence of “Data science” and changes in student cognitive skills in the process of global digitalization. The birth of Data science gives me more opportunities in delineating the field of data analysis. An overwhelming majority of both theoreticians and practition-ers are inclined to consider the notions of ‘data analysis” (DA) and “machine learning” (ML) as synonymous. There are, however, at least two differences between the two. First comes the difference in perspectives. ML is to equip computers with methods and rules to see through regularities of the environment - and behave accordingly. DA is to enhance conceptual understanding. These goals are not inconsistent indeed, which explains a huge overlap between DA and ML. However, there are situations in which these perspectives are not consistent. Regarding the current students’ cognitive habits, I came to the conclusion that they prefer to immediately get into the “thick of it”. Therefore, I streamlined the presentation of multidimensional methods. These methods are now organized in four Chapters, one of which presents correlation learning (Chapter 3). Three other Chapters present summarization methods both quantitative (Chapter 2) and categorical (Chapters 4 and 5). Chapter 4 relates to finding and characterizing partitions by using K-means clustering and its extensions. Chapter 5 relates to hierarchical and separative cluster structures. Using encoder-decoder data recovery approach brings forth a number of mathematically proven interrelations between methods that are used for addressing such practical issues as the analysis of mixed scale data, data standardization, the number of clusters, cluster interpretation, etc. An obvious bias towards summarization against correlation can be explained, first, by the fact that most texts in the field are biased in the opposite direction, and, second, by my personal preferences. Categorical summarization, that is, clustering is considered not just a method of DA but rather a model of classification as a concept in knowledge engineering. Also, in this edition, I somewhat relaxed the “presentation/formulation/computation” narrative struc-ture, which was omnipresent in the first edition, to be able do things in one go. Chapter 1 presents the author’s view on the DA mainstream, or core, as well as on a few Data science issues in general. Specifically, I bring forward novel material on the role of DA, including its successes and pitfalls (Section 1.4), and classification as a special form of knowledge (Section 1.5). Overall, my goal is to show the reader that Data science is not a well-formed part of knowledge yet but rather a piece of science-in-the-making.
Using Taxonomy Tree to Generalize a Fuzzy Thematic Cluster
2019 · CHAPTER · en
This paper presents an algorithm, ParGenFS, for generalizing, or “lifting”, a fuzzy set of topics to higher ranks of a hierarchical taxonomy of a research domain. The algorithm ParGenFS finds a globally optimal generalization of the topic set to minimize a penalty function, by balancing the number of introduced “head subjects” and related errors, the “gaps” and “offshoots”, differently weighted. This leads to a generalization of the topic set in the taxonomy. The usefulness of the method is illustrated on a set of 17685 abstracts of research papers on Data Science published in Springer journals for the past 20 years. We extracted a taxonomy of Data Science from the international Association for Computing Machinery Computing Classification System 2012 (ACM-CCS). We find fuzzy clusters of leaf topics over the text collection, lift them in the taxonomy, and interpret found head subjects to comment on the tendencies of current research.
Distance and Consensus for Preference Relations Corresponding to Ordered Partitions
2019 · ARTICLE · en
Ranking is an important part of several areas of contemporary research, including social sciences, decision theory, data analysis and information retrieval. The goal of this project is to align developments in quantitative social sciences and decision theory with the current thought in computer science, including a few novel results. Specifically, we consider binary preference relations, the so-called weak orders that are in one-to-one correspondence with rankings. We show that the conventional symmetric difference distance between weak orders, considered as sets of ordered pairs, coincides with the celebrated Kemeny distance between the corresponding rankings, despite the seemingly much simpler structure of the former. Based on this, we review several properties of the geometric space of weak orders involving the ternary relation “between”, and contingency tables for cross-partitions. Next we reformulate the consensus ranking problem as a variant of finding an optimal linear ordering, given a correspondingly defined consensus matrix. The difference is in a subtracted term, the partition concentration, that depends only on the distribution of the objects in the individual parts. We apply our results to the conventional Likert scale to show that the Kemeny consensus rule is rather insensitive to the data under consideration and, therefore, should be supplemented with more sensitive consensus schemes.
Computational Generalization in Taxonomies Applied to: (1) Analyze Tendencies of Research and (2) Extend User Audiences
2019 · CHAPTER · en
We define a most specific generalization of a fuzzy set of topics assigned to leaves of the rooted tree of a domain taxonomy. This generalization lifts the set to its “head subject” node in the higher ranks of the taxonomy tree. The head subject is supposed to “tightly” cover the query set, possibly bringing in some errors referred to as “gaps” and “offshoots”. Our method, ParGenFS, globally minimizes a penalty function combining the numbers of head subjects and gaps and offshoots, differently weighted. Two applications are considered: (1) analysis of tendencies of research in Data Science; (2) audience extending for programmatic targeted advertising online. The former involves a taxonomy of Data Science derived from the celebrated ACM Computing Classification System 2012. Based on a collection of research papers published by Springer 1998–2017, and applying in-house methods for text analysis retrieval and clustering. The head subjects of these clusters inform us of some general tendencies of the research. The latter involves publicly available IAB Tech Lab Content Taxonomy. Each of about 25 mln users is assigned with a fuzzy profile within this taxonomy, which is generalized offline using ParGenFS. Our experiments show that these head subjects effectively extend the size of targeted audiences at least twice without loosing quality.
Ordinal Equivalence Classes for Parallel Coordinates
2019 · CHAPTER · en
We give a mathematical treatment to the concept of ordinal equivalence defined relative to all m! possible permutations of parallel axes. We prove that the ordinal equivalence is determined by the pair-wise co-monotonicity equivalence relations, thus leading to simple algorithmic procedures for finding the corresponding partition. Each ordinal equivalence class can be visualized as a profile of co-monotone polylines, in this way preventing any clutter at the image. We illustrate our approach with two datasets taken from the literature.
Qualitative Judgement of Research Impact: Domain Taxonomy as a Fundamental Framework for Judgement of the Quality of Research
2018 · ARTICLE · en
The appeal of metric evaluation of research impact has attracted considerable interest in recent times. Although the public at large and administrative bodies are much interested in the idea, scientists and other researchers are much more cautious, insisting that metrics are but an auxiliary instrument to the qualitative peer-based judgement. The goal of this article is to propose availing of such a well positioned construct as domain taxonomy as a tool for directly assessing the scope and quality of research. We first show how taxonomies can be used to analyse the scope and perspectives of a set of research projects or papers. Then we proceed to define a research team or researcher’s rank by those nodes in the hierarchy that have been created or significantly transformed by the results of the researcher. An experimental test of the approach in the data analysis domain is described. Although the concept of taxonomy seems rather simplistic to describe all the richness of a research domain, its changes and use can be made transparent and subject to open discussions.
Braverman’s Spectrum and Matrix Diagonalization Versus iK-Means: A Unified Framework for Clustering .
2018 · CHAPTER · en
In this paper, I discuss current developments in cluster analysis to bring forth earlier developments by E. Braverman and his team. Specifically, I begin by recalling their Spectrum clustering method and Matrix diagonalization criterion. These two include a number of userspecified parameters such as the number of clusters and similarity threshold, which corresponds to the state of affairs as it was at early stages of data science developments; it remains so currently, too. Meanwhile, a data-recovery view of the Principal Component Analysis method admits a natural extension to clustering which embraces two of the most popular clustering methods, K-Means partitioning and Ward agglomerative clustering. To see that, one needs just adjusting the point of view and recognising an equivaent complementary criterion demanding the cluster to be simultaneously “large-sized” and “anomalous”. Moreover, this paradigm shows that the complementary criterion can be reformulated in terms of object-to-object similarities. This criterion appears to be equivalent to the heuristic Matrix diagonalization criterion by Dorofeyuk-Braverman. Moreover, a greedy one-by-one cluster extraction algorithm for this criterion appears to be a version of the Braverman’s Spectrum algorithm – but with automated adjustment of parameters. An illustrative example with mixed scale data completes the presentation.
Курсы (6)
-
Basics of Data Analysis · 3 раза
2025/2026, 2024/2025, 2023/2024 · Магистратура / Маго-лего · Анг
-
Основные методы анализа данных · 5 раза
2025/2026, 2024/2025, 2023/2024, 2022/2023, 2021/2022 · Бакалавриат / Маго-лего · рус
-
Mentor's Seminar · 3 раза
2025/2026, 2024/2025, 2023/2024 · Магистратура · Анг
-
Data Analysis
2023/2024 · Магистратура / Маго-лего · Анг
-
01.04.02. Прикладная математика и информатика
2022/2023 · Магистратура · Анг
-
Modern Methods of Data Analysis · 2 раза
2022/2023, 2021/2022 · Магистратура / Маго-лего · Анг