Николенко Сергей Игоревич
Факультет компьютерных наук
Профессиональные интересы
Должности
- Профессор — Факультет компьютерных наук, Департамент анализа данных и искусственного интеллекта
Био
- · Начал работать в НИУ ВШЭ в 2023 году.
Образование
- 2009 · Кандидат физико-математических наук: Санкт-Петербургский государственный университет, специальность 01.01.06 «Математическая логика, алгебра и теория чисел», тема диссертации: Новые конструкции криптографических примитивов, основанные на полугруппах, группах и линейной алгебре
- 2005 · Специалитет: Санкт-Петербургский государственный университет, специальность «Математика», квалификация «Математик»
Опыт работы
- · 2005-2008: : аспирант, лаборатория математической логики ПОМИ РАН, Санкт-Петербург
- · 2006-2010: : ассистент, СПбГУ ИТМО, Санкт-Петербург
- · 2008-2010: : старший научный сотрудник, Центр речевых технологий, Санкт-Петербург
- · 2011-2012: : старший научный сотрудник, Лаборатория алгоритмической биологии, СПбАУ РАН, Санкт-Петербург
- · 2011-2014: : директор по разработкам, Surfingbird, Москва. 2008-...: доцент, СПбАУ РАН, Санкт-Петербург. 2008-...: научный сотрудник, лаборатория математической логики ПОМИ РАН, Санкт-Петербург
Награды и поощрения
- · Надбавка за публикацию в международном рецензируемом научном издании (2021–2022, 2020–2022, 2018–2020)
- · Надбавка за статью в зарубежном рецензируемом журнале (2015–2017, 2013–2015)
- · Лучший преподаватель — 2020–2021, 2017
Гранты и проекты
- — · на соискание учёной степени кандидата наук
Идентификаторы исследователя
- ORCID:
0000-0001-7787-2251 - ResearcherID:
I-7696-2013 - SPIN РИНЦ:
8186-1253 - Google Scholar: http://scholar.google.ru/citations?&user=_lk95cEAAAAJ
- Scopus AuthorID:
13608710100
Публикации (89)
User Profiling in Text-Based Recommender Systems Based on Distributed Word Representations
2017 · CHAPTER · en
We introduce a novel approach to constructing user profiles for recommender systems based on full-text items such as posts in a social network and implicit ratings (in the form of likes) that users give them. The profiles measure a user’s interest in various topics mined from the full texts of the items. As a result, we get a user profile that can be used for cold start recommendations for items, targeted advertisement, and other purposes. Our experiments show that the method performs on a level comparable with classical collaborative filtering algorithms while at the same time being a cold start approach, i.e., it does not use the likes of an item being recommended.
FDR-controlled metabolite annotation for high-resolution imaging mass spectrometry
2017 · ARTICLE · en
High-mass-resolution imaging mass spectrometry promises to localize hundreds of metabolites in tissues, cell cultures, and agar plates with cellular resolution, but it is hampered by the lack of bioinformatics tools for automated metabolite identification. We report pySM, a framework for false discovery rate (FDR)-controlled metabolite annotation at the level of the molecular sum formula, for high-mass-resolution imaging mass spectrometry (https://github.com/alexandrovteam/pySM). We introduce a metabolite-signal match score and a target–decoy FDR estimate for spatial metabolomics.
Automated Prediction of Demographic Information from Medical User Reviews
2017 · CHAPTER · en
The advent of personalized medicine and wide-scale drug tests has led to the development of methods intended to automatically mine and extract information regarding drug reactions from user reviews. For medical purposes, it is often important to know demographic information on the authors of these reviews; however, existing studies usually either presuppose that this information is available or disregard the issue. We study automatic mining of demographic information from user-generated texts, comparing modern natural language processing techniques, including extensions of topic models and deep neural networks, for this problem on datasets mined from health-related web sites.
Measuring Prejudice and Ethnic Tensions in User-Generated Content
2017 в печати · ARTICLE · en
With the spread of social media, ethnic prejudice is becoming publicly available to widening audiences and may have serious offline consequences. This creates demand to detect prejudice and other signs of ethnic tension in user- generated texts, and this task is absolutely different from measuring prejudice with surveys – an approach traditionally developed in psychology. In this work we use a hand coding instrument based on psychological definitions of prejudice and sociological methods of questionnaire construction. Compared to our previous research, we double our hand-coded collection that reaches 14,998 unique user texts retrieved from the Russian language social media. We then train computer classification algorithms to “guess” prejudice as detected by human coders and show significant improvement in quality compared to our earlier results. Still, as not all aspects of prejudice get detected sufficiently well, we analyze potential causes of low quality and outline directions for further improvement.
Who’s Bad? Attitudes Toward Resettlers From the Post-Soviet South Versus Other Nations in the Russian Blogosphere
2017 · ARTICLE · en
Communication in social media is increasingly being found to reproduce or even reinforce ethnic prejudice and hostility toward migrants. In Russia of the 2010s, with its world’s second largest immigrant population, polls have detected high levels of hostility of the Russian population toward migranty (migrants), a label attached to resettlers from Central Asia and the Caucasus. We tested the online hostility hypothesis by using the data of 363,000 posts from the Russian-language LiveJournal. We applied data mining, regression analysis, and selective interpretative reading to map bloggers’ attitudes toward migranty, among other ethnicities and nations. Our findings significantly alter the picture drawn from the polls: Migranty neither provoke the biggest amount of discussion nor experience the worst treatment in Russian blogs, in which Americans take the lead. Furthermore, Central Asians and North Caucasians are treated very differently.
Detecting interethnic relations with the data from social media
2017 в печати · CHAPTER · en
Способность социальных медиа быстро распространять суждения пользователей об этничности и влиять на офлайновые этнические отношения создает потребность в методах автоматического мониторинга онлайн-контента, связанного с этнической принадлежностью. В этом исследовании мы измерили общий объем обсуждений в русскоязычных социальных медиа, связанных с этнической принадлежностью, и разработали подход, который автоматически определяразличные аспекты отношения к этим этническим группам. Для этого был создан обширный список этнонимов и связанных с ними биграмм, которые охватывают 97 постсоветских этнических групп, и собраны все тексты русскоязычных социальных медиа, содержащие хотябы одно слово из списка и написанные за двухлетний период с 2014 года (N = 2660222 текста). Мы вручную разметили 7181 сообщение, на основе чего обучили ряд классификаторов распознавать различные аспекты отношеня авторов к этничностям и другие особенности текстов. Оценив ряд стандартных показателей качества, мы сделали вывод, что достигли хорошего качества определения межгрупповых конфликтов, положительных межгрупповых контактов и общих негативных и позитивных настроений. В то же время, точность предсказаний относительно принадлежности текстов к этнической тематике и общего отношения к этнической группе оказалась наименьшей. Также некоторые аспекты, такие как призывы к насилию в отношении этнической группы, оказались недостаточно представлены в закодированных данных, что не позволило успешно находить тексты с этими признаками.
The impact of processing order on performance: A taxonomy of semi-FIFO policies
2017 · ARTICLE · en
Modern network processors increasingly deal with packets that require heterogeneous processing. We consider a bounded size input queue buffer where each packet requires several rounds of processing before transmission. Usually the transmission order of packets is induced by processing order, but processing order can have significant impact on the performance of buffer management policies even if the transmission order is fixed. We introduce the class of Semi-FIFO policies that decouple processing order from transmission order, restricting the latter to First-In-First-Out (FIFO). We build a taxonomy of Semi-FIFO policies and provide worst case guarantees for different processing orders. We consider various special cases and properties of Semi-FIFO policies: greedy, work-conserving, lazy, and push-out policies, and show how these properties affect performance. We generalize our results to additional constraints related to copying cost and conduct a comprehensive simulation study that validates our results.
Efficient FIB Representations on Distributed Platforms
2017 · ARTICLE · en
The Internet routing ecosystem is facing substantial scalability challenges due to continuous, significant growth of the state represented in the data plane. Distributed switch architectures introduce additional constraints on efficiency of implementations from both lookup time and memory footprint perspectives. In this paper we explore efficient forwarding information base (FIB) representations in common distributed switch architectures. Our approach introduces substantial savings in memory footprint transparently for existing hardware. Our results are supported by an extensive simulation study on real IPv4 and IPv6 FIBs.
Network simplification preserving bandwidth and routing capabilities
2017 · CHAPTER · en
We introduce structural transformations that allow simplifying a given network while preserving its original “bandwidth” and “routing” capabilities, transparently to specific allocations. We minimize a certain objective such as the aggregate capacity of network links, number of nodes, or number of links, in such a way that all the bandwidth that could be routed in the original network can also be routed in the reduced one. This improves cost-efficiency for both inter- and intra-datacenter connections and simplifies network management. We also identify a fundamental tradeoff between extra added capacity and simplicity of representation for a given network. Our analytic results are supported by extensive simulation results on hundreds of real network topologies. One result is that by adding 10-30% extra capacity to evaluated real-world networks one can simplify them down to a star topology with a single switch, while all routing and bandwidth allocation decisions on the simplified topology can be mapped back to the original network. This is an important step towards simplifying network management via a reduced virtualized network infrastructure.
Throughput optimization with latency constraints
2017 · CHAPTER · en
Modern datacenters are increasingly required to deal with latency-sensitive applications. A major question here is how to represent latency in desired objectives. Incorporation of multiple traffic characteristics (e.g., packet values and required processing requirements) significantly increases the complexity of buffer management policies. In this work, we consider weighted throughput optimization (total transmitted value) in the setting where every incoming packet is branded with intrinsic value, required processing, and slack (an offset from the arrival time when a packet should be transmitted), and the buffer is unbounded but effectively bounded by slacks. The main result is a 3-competitive algorithm as the slack-to-work ratio increases. Our results supported by a comprehensive evaluation study on CAIDA network traces.
Курсы (3)
-
Машинное обучение · 4 раза
2025/2026, 2024/2025, 2023/2024, 2022/2023 · Магистратура / Маго-лего · рус
-
Deep Generative Models
2022/2023 · Маго-лего / Нижний Новгород · Анг
-
01.04.02. Прикладная математика и информатика
2022/2023 · Магистратура · рус