Миркин Борис Григорьевич
Факультет экономических наук
Профессиональные интересы
Должности
- Ведущий научный сотрудник — Факультет экономических наук, Международный центр анализа и выбора решений
- Профессор — Факультет компьютерных наук, Департамент анализа данных и искусственного интеллекта
Био
- · Начал работать в НИУ ВШЭ в 2008 году.
- · Научно-педагогический стаж: 51 год.
Образование
- 1990 · Доктор наук: Институт системных исследований АН СССР и ГКНТ, специальность 05.13.01 «Системный анализ, управление и обработка информации», тема диссертации: Матрично-аппроксимационный подход к анализу разнотипной информации
- 1975 · Старший научный сотрудник
- 1975 · Ученое звание: Доцент
- 1966 · Кандидат наук: специальность 01.01.06 «Математическая логика, алгебра и теория чисел», тема диссертации: Некоторые построения в алгебре событий
- 1964 · Специалитет: Саратовский государственный университет им. Н.Г. Чернышевского, специальность «Математика», квалификация «Математик»
Опыт работы
- · 1967-1982: Мнс, снс (1968), зав. сектором (1975) Институт Экономики и организации промышленного производства СО АН СССР, г. Новосибирск
- · 1982-2000: Снс, внс (1989) Центральный экономико-математический институт РАН РФ, г. Москва
- · 2000-2008: Профессор Биркбек колледж Лондонского университета, Лондон, Великобритания
- · 2008-н.вр. Профессор Национальный исследовательский университет Высшая школа экономики, г. Москва, РФ
- · Совместительство
- · 1970-1982: Доцент Кафедра мат. методов в экономике, Новосибирский госуниверситет МВССО РФ, г. Новосибирск
- · 1984-1991: Доцент Кафедра высшей математики и статистики, Высшая школа профдвижения ВЦСПС СССР, г. Москва
- · 2008-2010: Профессор Биркбек колледж Лондонского университета, г. Лондон, Великобритания
- · 2009-2024 г.: Внс Международная лаборатория анализа и выбора решений, НИУ ВШЭ, г. Москва, РФ
Награды и поощрения
- · Благодарность НИУ ВШЭ (апрель 2025)
- · Почетная грамота Министерства науки и высшего образования Российской Федерации (июль 2023)
- · Медаль "Признание - 15 лет успешной работы" НИУ ВШЭ (декабрь 2022)
- · Почетный знак II степени Высшей школы экономики (март 2018)
- · Почётная грамота Факультета компьютерных наук НИУ ВШЭ (декабрь 2017)
- · Почетная грамота Высшей школы экономики (декабрь 2016)
- · Премия Высшей школы экономики "Золотая Вышка" (декабрь 2012)
- · Благодарность Высшей школы экономики (ноябрь 2012)
- · Надбавка за публикацию в международном рецензируемом научном издании (2020–2021, 2018–2020)
- · Надбавка за регулярные публикации в международных рецензируемых научных изданиях (2021–2026)
- · Надбавка за статью в зарубежном рецензируемом журнале (2015–2017, 2013–2015, 2011–2013)
- · Надбавка за статью в зарубежном рецензируемом научном издании (2016–2018)
- · Лучший преподаватель — 2013, 2011
- · Лауреат премии "Золотая Вышка" 2012 в номинации Достижения в науке
Гранты и проекты
- — · на соискание учёной степени кандидата наук
Конференции (3)
Показать все
- · 2021: 2021 IEEE International Conference on Artificial Intelligence and Big Data Analytics (Bandung). Доклад: Method for Taxonomic Content-Analysis of Text Collections
- · 2020: Научно-практическая конференция "Наука о данных" (Санкт-Петербург). Доклад: Наука о данных и статистика: история, содержание, перспективы
- · 2015: Systems Analysis 2015 (Vienna). Доклад: Data Summarization at Clustering and Ranking
Идентификаторы исследователя
- ORCID:
0000-0001-5470-8635 - ResearcherID:
B-6591-2013 - SPIN РИНЦ:
3067-5534 - Google Scholar: https://scholar.google.com/citations?user=MpNtL58AAAAJ
- Scopus AuthorID:
35428028600
Публикации (147)
Modeling Generalization in Domain Taxonomies Using a Maximum Likelihood Criterion
2022 · CHAPTER · en
We define a most specific generalization of a fuzzy set of topics assigned to leaves of the rooted tree of a domain taxonomy. This generalization lifts the set to its “head subject” node in the higher ranks of the taxonomy tree. The head subject is supposed to “tightly” cover the query set, possibly involving some errors referred to as “gaps” and “offshoots”. We develop a method to globally maximize the likelihood of a scenario involving gains and losses of the general concept manifested in a fuzzy cluster of leaf nodes of the taxonomy. Probabilities of the gain and loss events are derived from multiple runs of our earlier method of maximum parsimony starting with randomly generated values for the two parameters involved. Supplemented with fuzzy c-means clustering, this allows us to obtain meaningful generalizations for six fuzzy thematic clusters of Data Science topics using over 17000 abstracts from 17 research journals published by Springer.
Метод максимального правдоподобия для обобщения нечетких множеств в таксономиях
2022 · CHAPTER · ru
В работе предлагается новый метод обобщения тематической текстовой коллекции, оснащенной таксономией предметной области. С помощью спектральных методов кластеризации из текстовой коллекции извлекаются нечеткие множества листьев таксономии, соответствующие понятиям, одновременно используемым в статьях коллекции. Эти нечеткие множества обобщаются путем их подъема в дереве таксономии с использованием критерия максимального правдоподобия. Оптимальный подъем подразумевает нахождение вершины или множества вершин в дереве таксономии, наиболее плотно покрывающих листовые понятия из обобщаемого множества. Наш метод включает два основных этапа: (1) извлечение кластеров из текстовой коллекции и (2) обобщение этих кластеров. В данной работе модернизируются оба этапа. Алгоритмы применены к структурному анализу и описанию текстовой коллекции из 17 тыс. аннотаций научных статей в области Наук о данных, опубликованных в журналах издательства Шпрингер. Таксономия Наук о данных, используемая в данной работе, является шестиуровневой иерархической таксономией, разработанной вручную международной Ассоциацией Вычислительной Техники и Вычислительных Систем (ACM-CSS [4])
Progress in Artificial Intelligence: 21st EPIA Conference on Artificial Intelligence, EPIA 2022, Lisbon, Portugal, August 31–September 2, 2022, Proceedings
2022 · BOOK · en
This work proposes a spatiotemporal clustering approach for the analysis of coastal upwelling from Sea Surface Temperature (SST) grid maps derived from satellite images. The algorithm, Core-Shell clustering, models the upwelling as an evolving cluster whose core points are constant during a certain time window while the shell points move through an in-and-out binary sequence. The least squares minimization of clustering criterion allows to derive key parameters in an automated way. The algorithm is initialized with an extension of Seeded Region Growing offering self-tuning thresholding, the STSEC algorithm, that is able to precisely delineate the upwelling region at each SST instant map. Yet, the application of STSEC to the SST grid maps as temporal data puts the business of finding relatively stable “time windows”, here called “time ranges”, for obtaining the core clusters onto an automated footing. The experiments conducted with three yearly collections of SST data of the Portuguese coast shown that the core-shell clusters precisely recognize the upwelling regions taking as ground-truth the STSEC segmentations with Kulczynski similarity score values higher than 98%. Also, the extracted time series of upwelling features presented consistent regularities among the three independent upwelling seasons.
A Method for Community Detection in Networks with Mixed Scale Features at its Nodes
2021 · CHAPTER · en
A Method for Community Detection in Networks with Mixed Scale Features at Its Nodes
2021 · CHAPTER · en
The problem of community detection in a network with features at its nodes takes into account both the graph structure and node features. The goal is to find relatively dense groups of interconnected entities sharing some features in common. Algorithms based on probabilistic community models require the node features to be categorical. We use a data-driven model by combining the least-squares data recovery criteria for both, the graph structure and node features. This allows us to take into account both quantitative and categorical features. After deriving an equivalent complementary criterion to optimize, we apply a greedy-wise algorithm for detecting communities in sequence. We experimentally show that our proposed method is effective on both real-world data and synthetic data. In the cases at which attributes are categorical, we compare our approach with state-of-the-art algorithms. Our algorithm appears competitive against them.
Least-squares community extraction in feature-rich networks using similarity data
2021 · ARTICLE · en
We explore a doubly-greedy approach to the issue of community detection in feature-rich networks. According to this approach, both the network and feature data are straightfor- wardly recovered from the underlying unknown non-overlapping communities, supplied with a center in the feature space and intensity weight(s) over the network each. Our least- squares additive criterion allows us to search for communities one-by-one and to find each community by adding entities one by one. A focus of this paper is that the feature-space data part is converted into a similarity matrix format. The similarity/link values can be used in either of two modes: (a) as measured in the same scale so that one may can meaningfully compare and sum similarity values across the entire similarity matrix (summability mode), and (b) similarity values in one column should not be compared with the values in other columns (nonsummability mode). The two input matrices and two modes lead us to developing four different Iterative Community Extraction from Similarity data (ICESi) algorithms, which determine the number of communities automatically. Our experiments at real-world and synthetic datasets show that these algorithms are valid and competitive.
Summable and nonsummable data‐driven models for community detection in feature‐rich networks
2021 · ARTICLE · en
A feature-rich network is a network whose nodes are characterized by categorical or quantitative features. We propose a data-driven model for finding a partition of the nodes to approximate both the network link data and the feature data. The model involves summary quantitative characteristics of both network links and features. We distinguish between two modes of using the network link data. One mode postulates that the link values are comparable and summable across the network (summability); the other assumption models the case in which different nodes represent different measurement systems so that the link data are neither comparable, nor summable, across different nodes (nonsummability). We derive a Pythagorean decomposition of the combined data scatter involving our data recovery least-squares criterion. We address an equivalent problem of maximizing its complementary part, the contribution of a found partition to the combined data scatter. We follow a doubly greedy strategy in maximizing that. First, communities are found one-by-one, and second, entities are added one-by-one in the process of identifying a community. Our algorithms determine the number of clusters automatically. The nonsummability version proves to have a niche of its own; also, it is faster than the other version. In our experiments, they appear to be competitive over generated synthetic data sets and six real-world data sets from the literature.
An Extension of K-Means for Least-Squares Community Detection in Feature-Rich Networks
2021 в печати · CHAPTER · en
We propose an extension of the celebrated K-means algorithm for community detection in feature-rich networks. Our least-squares criterion leads to a straightforward extension of the conventional batch K-means clustering method as an alternating optimization strategy for the criterion. By replacing the innate squared Euclidean distance with cosine distance we effectively tackle the so-called curse of dimensionality. We compare our proposed methods using synthetic and real-world data with state-of-the-art algorithms from the literature. The cosine distance-based version appears to be the overall winner, especially at larger datasets.
Detecting Communities in Feature-Rich Networks with a K-Means Method
2021 в печати · CHAPTER · en
The main result of this paper is an extension of the K-means algorithm to the issue of community detection in feature-rich networks. This is based on a data-recovery criterion additively combining conventional least-squares criteria for approximation of the network link data and the feature data at network nodes. The dimension of the space at which the method operates is the sum of the number of nodes and the number of features, which may be high indeed. To tackle the so-called curse of dimensionality, we replace the innate Euclidean distance with cosine distance. We experimentally validate our proposed methods and demonstrate their efficiency by comparing them to most popular approaches using both synthetic data and real-world data.
Community detection in feature-rich networks to meet K-means
2021 · CHAPTER · en
We derive two extensions of the celebrated K-means algorithm as a tool for community detection in feature-rich networks. We define a data-recovery criterion additively combining conventional least-squares criteria for approximation of the network link data and the feature data at network nodes by a partition along with its within-cluster "centers". The dimension of the space at which the method operates is the sum of the number of nodes and the number of features, which may be high indeed. To tackle the so-called curse of dimensionality, we may replace the innate Euclidean distance with cosine distance sometimes. We experimentally validate our proposed methods and demonstrate their efficiency by comparing them to most popular approaches.
Курсы (6)
-
Basics of Data Analysis · 3 раза
2025/2026, 2024/2025, 2023/2024 · Магистратура / Маго-лего · Анг
-
Основные методы анализа данных · 5 раза
2025/2026, 2024/2025, 2023/2024, 2022/2023, 2021/2022 · Бакалавриат / Маго-лего · рус
-
Mentor's Seminar · 3 раза
2025/2026, 2024/2025, 2023/2024 · Магистратура · Анг
-
Data Analysis
2023/2024 · Магистратура / Маго-лего · Анг
-
01.04.02. Прикладная математика и информатика
2022/2023 · Магистратура · Анг
-
Modern Methods of Data Analysis · 2 раза
2022/2023, 2021/2022 · Магистратура / Маго-лего · Анг