Миркин Борис Григорьевич

Факультет экономических наук

Профиль на hse.ru ↗ тел.: +7(499) 152-0151 | 27776 | +7(963) 723-4021

Публикаций

147

Языков

Наград

Конференций

Профиль Публикации (147) Курсы (6)

Профессиональные интересы

математические методы анализа

Должности

Ведущий научный сотрудник — Факультет экономических наук, Международный центр анализа и выбора решений
Профессор — Факультет компьютерных наук, Департамент анализа данных и искусственного интеллекта

Био

· Начал работать в НИУ ВШЭ в 2008 году.
· Научно-педагогический стаж: 51 год.

Образование

1990 · Доктор наук: Институт системных исследований АН СССР и ГКНТ, специальность 05.13.01 «Системный анализ, управление и обработка информации», тема диссертации: Матрично-аппроксимационный подход к анализу разнотипной информации
1975 · Старший научный сотрудник
1975 · Ученое звание: Доцент
1966 · Кандидат наук: специальность 01.01.06 «Математическая логика, алгебра и теория чисел», тема диссертации: Некоторые построения в алгебре событий
1964 · Специалитет: Саратовский государственный университет им. Н.Г. Чернышевского, специальность «Математика», квалификация «Математик»

Опыт работы

· 1967-1982: Мнс, снс (1968), зав. сектором (1975) Институт Экономики и организации промышленного производства СО АН СССР, г. Новосибирск
· 1982-2000: Снс, внс (1989) Центральный экономико-математический институт РАН РФ, г. Москва
· 2000-2008: Профессор Биркбек колледж Лондонского университета, Лондон, Великобритания
· 2008-н.вр. Профессор Национальный исследовательский университет Высшая школа экономики, г. Москва, РФ
· Совместительство
· 1970-1982: Доцент Кафедра мат. методов в экономике, Новосибирский госуниверситет МВССО РФ, г. Новосибирск
· 1984-1991: Доцент Кафедра высшей математики и статистики, Высшая школа профдвижения ВЦСПС СССР, г. Москва
· 2008-2010: Профессор Биркбек колледж Лондонского университета, г. Лондон, Великобритания
· 2009-2024 г.: Внс Международная лаборатория анализа и выбора решений, НИУ ВШЭ, г. Москва, РФ

Награды и поощрения

· Благодарность НИУ ВШЭ (апрель 2025)
· Почетная грамота Министерства науки и высшего образования Российской Федерации (июль 2023)
· Медаль "Признание - 15 лет успешной работы" НИУ ВШЭ (декабрь 2022)
· Почетный знак II степени Высшей школы экономики (март 2018)
· Почётная грамота Факультета компьютерных наук НИУ ВШЭ (декабрь 2017)
· Почетная грамота Высшей школы экономики (декабрь 2016)
· Премия Высшей школы экономики "Золотая Вышка" (декабрь 2012)
· Благодарность Высшей школы экономики (ноябрь 2012)
· Надбавка за публикацию в международном рецензируемом научном издании (2020–2021, 2018–2020)
· Надбавка за регулярные публикации в международных рецензируемых научных изданиях (2021–2026)
· Надбавка за статью в зарубежном рецензируемом журнале (2015–2017, 2013–2015, 2011–2013)
· Надбавка за статью в зарубежном рецензируемом научном издании (2016–2018)
· Лучший преподаватель — 2013, 2011
· Лауреат премии "Золотая Вышка" 2012 в номинации Достижения в науке

Гранты и проекты

— · на соискание учёной степени кандидата наук

Конференции (3)

Показать все

· 2021: 2021 IEEE International Conference on Artificial Intelligence and Big Data Analytics (Bandung). Доклад: Method for Taxonomic Content-Analysis of Text Collections
· 2020: Научно-практическая конференция "Наука о данных" (Санкт-Петербург). Доклад: Наука о данных и статистика: история, содержание, перспективы
· 2015: Systems Analysis 2015 (Vienna). Доклад: Data Summarization at Clustering and Ranking

Идентификаторы исследователя

ORCID: 0000-0001-5470-8635
ResearcherID: B-6591-2013
SPIN РИНЦ: 3067-5534
Google Scholar: https://scholar.google.com/citations?user=MpNtL58AAAAJ
Scopus AuthorID: 35428028600

Публикации (147)

Globally Optimal Parsimoniously Lifting a Fuzzy Query Set Over a Taxonomy Tree

2020 · CHAPTER · en

This paper presents a relatively rare case of an optimization problem in data analysis to admit a globally optimal solution by a recursive algorithm. We are concerned with finding a most specific generalization of a fuzzy set of topics assigned to leaves of domain taxonomy represented by a rooted tree. The idea is to “lift” the set to its “head subject” in the higher ranks of the taxonomy tree. The head subject is supposed to “tightly” cover the query set, possibly bringing in some errors, either “gaps” or “offshoots” or both. Our method globally minimizes a penalty function combining the numbers of head subjects and gaps and offshoots, differently weighted. We apply this to a collection of 17645 research papers on Data Science published in 17 Springer journals for the past 20 years. We extract a taxonomy of Data Science (TDS) from the international Association for Computing Machinery Computing Classification System 2012. We find fuzzy clusters of leaf topics over the text collection, optimally lift them to head subjects in TDS, and comment on the tendencies of current research following from the lifting results.

DOI ↗ PDF ↗

A Method for Audience Extending in Programmatic Advertising by Using Parsimonious Generalization of User Segments

2020 · CHAPTER · en

We propose a novel method for efficient target audience augmentation in programmatic digital advertising. This method utilizes a novel ParGenFS algorithm for most adequate generalization in taxonomies which was developed by the authors in a joint work. The ParGenFS extends user segments by parsimoniously lifting them off-line as a fuzzy set over IAB content taxonomy into a higher rank ‘head subject’. This algorithm was initially intended as an intelligent information retrieval tool. Here it is applied to a very different task of targeted advertisement as an effective tool for augmenting audiences.

DOI ↗

Parsimonious Generalization of Fuzzy Thematic Sets in Taxonomies Applied to the Analysis of Tendencies of Research in Data Science

2020 · ARTICLE · en

This paper proposes a novel method, referred to as ParGenFS, for finding a most specific generalization of a query set represented by a fuzzy set of topics assigned to leaves of the rooted tree of a taxonomy. The query set is generalized by “lifting” it to one or more “head subjects” in the higher ranks of the taxonomy. The head subjects should cover the query set, with the possible addition of some “gaps”, taxonomy nodes covered by the head subject but irrelevant to the query set. To decrease the numbers of gaps, we admit some “offshoots”, nodes belonging to the query set but not covered by the head subject. The method globally minimizes the total number of head subjects, gaps and offshoots, each suitably weighted. Our algorithm is applied to the structural analysis and description of a collection of 17685 abstracts of research papers published in 17 Springer journals related to Data Science for the 20-year period 1998-2017. Our taxonomy of Data Science (TDS) is extracted from the Association for Computing Machinery Computing Classification System 2012 (ACM-CCS), a six-level hierarchical taxonomy manually developed by a team of ACM experts. The TDS also includes a number of additional leaves that we added to cater for recent developments not represented in the ACM-CCS taxonomy. We find fuzzy clusters of leaf topics over the text collection, using specially developed machinery. Three of the clusters are indeed thematic, relating to the Data Science sub-areas of (a) learning, (b) information retrieval, and (c) clustering. These three clusters are then lifted in the TDS using ParGenFS, which allows us to draw some conclusions about tendencies in developments in these areas.

DOI ↗ PDF ↗

Core clustering as a tool for tackling noise in cluster labels

2020 · ARTICLE · en

Real-world data sets often contain mislabelled entities. This can be particularly problematic if the data set is being used by a supervised classification algorithm at its learning phase. In this case, the accuracy of this classification algorithm, when applied to unlabelled data, is likely to suffer considerably. In this paper, we introduce a clustering-based method capable of reducing the number of mislabelled entities in data sets. Our method can be summarised as follows: (i) cluster the data set; (ii) select the entities that have the most potential to be assigned to correct clusters; (iii) use the entities of the previous step to define the core clusters and map them to the labels using a confusion matrix; (iv) use the core clusters and our cluster membership criterion to correct the labels of the remaining entities. We perform numerous experiments to validate our method empirically using k-nearest neighbour classifiers as a benchmark. We experiment with both synthetic and real-world data sets with different proportions of mislabelled entities. Our experiments demonstrate that the proposed method produces promising results. Thus, it could be used as a preprocessing data correction step of a supervised machine learning algorithm.

DOI ↗ PDF ↗

A Hybrid Approach to Interpretable Analysis of Research Paper Collections

2020 · CHAPTER · en

We define and find a most specific generalization of a fuzzy set of topics assigned to leaves of the rooted tree of a taxonomy. This generalization lifts the set to a “head subject” in the higher ranks of the taxonomy, that is supposed to “tightly” cover the query set, possibly bringing in some errors, both “gaps” and “offshoots”. Our method involves two more automated analysis techniques: a fuzzy clustering method, FADDIS, involving both additive and spectral properties, and a purely structural string-to-text relevance measure based on suffix trees annotated by frequencies. We apply this to extract research tendencies from two collections of research papers: (a) about 18000 research papers published in Springer journals on data science for 20 years, and (b) about 27000 research papers retrieved from Springer and Elsevier journals in response to data science related queries. We consider a taxonomy of Data Science based on the Association for Computing Machinery Classification of Computing System (ACM-CCS 2012). Our findings allow us to make some comments on the tendencies of research that cannot be derived by using more conventional techniques.

DOI ↗ PDF ↗

Exploring patterns of corporate social responsibility using a complementary K-means clustering criterion

2020 · ARTICLE · en

Companies’ objectives extend beyond mere profitability, to what is generally known as Corporate Social Responsibility (CSR). Empirical research effort of CSR is typically concentrated on a limited number of aspects. We focus on the whole set of CSR activities to identify any structure to that set. In this analysis, we take data from 1850 of the largest international companies via the conventional MSCI database and focus on four major dimensions of CSR: Environment, Social/ Stakeholder, Labor, and Governance. To identify any structure hidden in almost constant average values, we apply the popular technique of K-means clustering. When determining the number of clusters, which is especially difficult in the case at hand, we use an equivalent clustering criterion that is complementary to the squareerror K-means criterion. Our use of this complementary criterion aims at obtaining clusters that are both large and farthest away from the center. We derive from this a method of extracting anomalous clusters one-by-one with a follow-up removal of small clusters. This method has allowed us to discover a rather impressive process of change from predominantly uniform patterns of CSR activities along the four dimensions in 2007 to predominantly single-focus patterns of CSR activities in 2012. This change may reflect the dynamics of increasingly interweaving and structuring CSR activities into business processes that are likely to be extended into the future.

DOI ↗ PDF ↗

A Data Recovery Method for Community Detection in Feature-Rich Networks

2020 · CHAPTER · en

The problem of community detection in a network with features at its nodes takes into account both the graph structure and node features. The goal is to find relatively dense groups of interconnected entities sharing some features in common. We apply the so-called data recovery approach to the problem by combining the least-squares recovery criteria for both, the graph structure and node features. In this way, we obtain a new clustering criterion and a corresponding algorithm for finding clusters one-by-one, so that the process can be interpreted as that of detecting communities indeed. We show that our proposed method is effective on real-world data, as well as on synthetic data involving either only quantitative features or only categorical attributes or both. In the cases at which attributes are categorical, state-of-the-art algorithms are available. Our algorithm appears competitive against them

PDF ↗

A Hybrid Approach to the Analysis of a Collection of Research Papers

2020 · CHAPTER · en

We define and find a most specific generalization of a fuzzy set of topics assigned to leaves of the rooted tree of a taxonomy. This generalization lifts the set to a “head subject” in the higher ranks of the taxonomy, that is supposed to “tightly” cover the query set, possibly bringing in some errors, both “gaps” and “offshoots”. Our hybrid method involves two more automated analysis techniques: a fuzzy clustering method, FADDIS, involving both additive and spectral properties, and a purely structural string-to-text relevance measure based on suffix trees annotated by frequencies. We apply this to extract research tendencies from two collections of research papers: (a) about 18000 research papers published in Springer journals on data science for 20 years, and (b) about 27000 research papers retrieved from Springer and Elsevier journals in response to data science related queries. We consider a taxonomy of Data Science based on the Association for Computing Machinery Classification of Computing System (ACM-CCS 2012). Our findings allow us to make some comments on the tendencies of research that cannot be derived by using more conventional techniques.

DOI ↗ PDF ↗

A One-by-One Method for Community Detection in Attributed Networks

2020 · CHAPTER · en

The problem of community detection in a network with features at its nodes takes into account both the graph structure and node features. The goal is to find relatively dense groups of interconnected entities sharing some features in common. We apply the so-called data recovery approach to the problem by combining the least-squares recovery criteria for both, the graph structure and node features. In this way, we obtain a new clustering criterion and a corresponding algorithm for finding clusters/communities one-by-one. We show that our proposed method is effective on real-world data, as well as on synthetic data involving either only quantitative features or only categorical attributes or both. Our algorithm appears competitive against state-of-the-art algorithms.

DOI ↗ PDF ↗

A Data Recovery Method for Community Detection in Feature-Rich Networks

2020 · CHAPTER · en

DOI ↗ PDF ↗

Курсы (6)

Basics of Data Analysis · 3 раза

2025/2026, 2024/2025, 2023/2024 · Магистратура / Маго-лего · Анг
Основные методы анализа данных · 5 раза

2025/2026, 2024/2025, 2023/2024, 2022/2023, 2021/2022 · Бакалавриат / Маго-лего · рус
Mentor's Seminar · 3 раза

2025/2026, 2024/2025, 2023/2024 · Магистратура · Анг
Data Analysis

2023/2024 · Магистратура / Маго-лего · Анг
01.04.02. Прикладная математика и информатика

2022/2023 · Магистратура · Анг
Modern Methods of Data Analysis · 2 раза

2022/2023, 2021/2022 · Магистратура / Маго-лего · Анг