DSA Faculty
API
← к списку преподавателей

Николенко Сергей Игоревич

Факультет компьютерных наук

Публикаций
89
Языков
2
Наград
3
Конференций
0
Профиль Публикации (89) Курсы (3)

Профессиональные интересы

20.00.00 Информатика27.00.00 Математика

Должности

  • ПрофессорФакультет компьютерных наук, Департамент анализа данных и искусственного интеллекта

Био

  • · Начал работать в НИУ ВШЭ в 2023 году.

Образование

  • 2009 · Кандидат физико-математических наук: Санкт-Петербургский государственный университет, специальность 01.01.06 «Математическая логика, алгебра и теория чисел», тема диссертации: Новые конструкции криптографических примитивов, основанные на полугруппах, группах и линейной алгебре
  • 2005 · Специалитет: Санкт-Петербургский государственный университет, специальность «Математика», квалификация «Математик»

Опыт работы

  • · 2005-2008: : аспирант, лаборатория математической логики ПОМИ РАН, Санкт-Петербург
  • · 2006-2010: : ассистент, СПбГУ ИТМО, Санкт-Петербург
  • · 2008-2010: : старший научный сотрудник, Центр речевых технологий, Санкт-Петербург
  • · 2011-2012: : старший научный сотрудник, Лаборатория алгоритмической биологии, СПбАУ РАН, Санкт-Петербург
  • · 2011-2014: : директор по разработкам, Surfingbird, Москва. 2008-...: доцент, СПбАУ РАН, Санкт-Петербург. 2008-...: научный сотрудник, лаборатория математической логики ПОМИ РАН, Санкт-Петербург

Награды и поощрения

  • · Надбавка за публикацию в международном рецензируемом научном издании (2021–2022, 2020–2022, 2018–2020)
  • · Надбавка за статью в зарубежном рецензируемом журнале (2015–2017, 2013–2015)
  • · Лучший преподаватель — 2020–2021, 2017

Гранты и проекты

  • · на соискание учёной степени кандидата наук

Идентификаторы исследователя

Публикации (89)

druGAN: An Advanced Generative Adversarial Autoencoder Model for de Novo Generation of New Molecules with Desired Molecular Properties in Silico

2017 · ARTICLE · en

Deep generative adversarial networks (GANs) are the emerging technology in drug discovery and biomarker development. In our recent work, we demonstrated a proof-of-concept of implementing deep generative adversarial autoencoder (AAE) to identify new molecular fingerprints with predefined anticancer properties. Another popular generative model is the variational autoencoder (VAE), which is based on deep neural architectures. In this work, we developed an advanced AAE model for molecular feature extraction problems, and demonstrated its advantages compared to VAE in terms of (a) adjustability in generating molecular fingerprints; (b) capacity of processing very large molecular data sets; and (c) efficiency in unsupervised pretraining for regression model. Our results suggest that the proposed AAE model significantly enhances the capacity and efficiency of development of the new molecules with specific anticancer properties using the deep generative models.

Online Scheduling FIFO Policies with Admission and Push-Out

2016 · ARTICLE · en

We consider the problem of managing a bounded size First-In-First-Out (FIFO) queue buffer, where each incoming unit-sized packet requires several rounds of processing before it can be transmitted out. Our objective is to maximize the total number of successfully transmitted packets. We consider both push-out (when a policy is permitted to drop already admitted packets) and non-push-out cases. We provide worst-case guarantees for the throughput performance of our algorithms, proving both lower and upper bounds on their competitive ratio against the optimal algorithm, and conduct a comprehensive simulation study that experimentally validates predicted theoretical behavior.

Detecting Opinion polarisation on Twitter by constructing pseudo-bimodal networks of mentions and retweets.

2016 · CHAPTER · en

We present a novel approach to analyze and visualize opinion polarisation on Twitter based on graph features of communication networks extracted from tweets. We show that opinion polarisation can be legibly observed on unimodal projections of artificially created bimodal networks, where the most popular users in retweet and mention networks are considered nodes of the second mode. For this purpose, we select a subset of top users based on their PageRank values and assign them to be the second mode in our networks, thus called pseudo-bimodal. After projecting them onto the set of “bottom” users and vice versa, we get unimodal networks with more distinct clusters and visually coherent community separation. We developed our approach on a dataset gathered during the Russian protest meetings on 24th of December, 2011 and tested it on another dataset by Conover [13] used to analyze political polarisation, showing that our approach not only works well on our data but also improves the results from previous research on that phenomenon.

Exploiting Order Independence for Scalable and Expressive Packet Classification

2016 · ARTICLE · en

Efficient packet classification is a core concern for network services. Traditional multi-field classification approaches, in both software and ternary content-addressable memory (TCAMs), entail tradeoffs between (memory) space and (lookup) time. TCAMs cannot efficiently represent range rules, a common class of classification rules confining values of packet fields to given ranges. The exponential space growth of TCAM entries relative to the number of fields is exacerbated when multiple fields contain ranges. In this work, we present a novel approach which identifies properties of many classifiers which can be implemented in linear space and with worst-case guaranteed logarithmic time and allows the addition of more fields including range constraints without impacting space and time complexities. On real-life classifiers from Cisco Systems and additional classifiers from ClassBench (with real parameters), 90–95% of rules are thus handled, and the other 5–10% of rules can be stored in TCAM to be processed in parallel.

Online recommender system for radio station hosting based on information fusion and adaptive tag-aware profiling

2016 · ARTICLE · en

We present a new recommender system developed for the Russian interactive radio network FMhost. To the best of our knowledge, it is the first model and associated case study for recommending radio stations hosted by real DJs rather than automatically built streamed playlists. To address such problems as cold start, gray sheep, boosting of rankings, preference and repertoire dynamics, and absence of explicit feedback, the underlying model combines a collaborative user-based approach with personalized information from tags of listened tracks in order to match user and radio station profiles. This is made possible with adaptive tag-aware profiling that follows an online learning strategy based on user history. We compare the proposed algorithms with singular value decomposition (SVD) in terms of precision, recall, and normalized discounted cumulative gain (NDCG) measures; experiments show that in our case the fusion-based approach demonstrates the best results. In addition, we give a theoretical analysis of some useful properties of fusion-based linear combination methods in terms of graded ordered sets.

Communities of co-commenting in the Russian LiveJournal and their topical coherence

2016 · ARTICLE · en

Purpose – The paper addresses the problem of what drives the formation of latent discussion communities, if any, in the blogosphere: topical composition of posts or their authorship? The purpose of this paper is to contribute to the knowledge about structure of co-commenting. Design/methodology/approach – The research is based on a dataset of 17,386 full text posts written by top 2,000 LiveJournal bloggers and over 520,000 comments that result in about 4.5 million edges in the network of co-commenting, where posts are vertices. The Louvain algorithm is used to detect communities of co-commenting. Cosine similarity and topic modeling based on latent Dirichlet allocation are applied to study topical coherence within these communities. Findings – Bloggers unite into moderately manifest communities by commenting roughly the same sets of posts. The graph of co-commenting is sparse and connected by a minority of active non-top commenters. Communities are centered mainly around blog authors as opinion leaders and, to a lesser extent, around a shared topic or topics. Research limitations/implications – The research has to be replicated on other datasets with more thorough hand coding to ensure the reliability of results and to reveal average proportions of topic-centered communities. Practical implications – Knowledge about factors around which co-commenting communities emerge, in particular clustered opinion leaders that often attract such communities, can be used by policy makers in marketing and/or political campaigning when individual leadership is not enough or not applicable. Originality/value – The research contributes to the social studies of online communities. It is the first study of communities based on co-commenting that combines examination of the content of commented posts and their topics.

BASEL (Buffer mAnagement SpEcification Language)

2016 · CHAPTER · en

Buffering architectures and policies for their efficient management constitute one of the core ingredients of a network architecture. In this work we introduce a new specification language, BASEL, that allows to express virtual buffering architectures and management policies representing a variety of economic models. BASEL does not require the user to implement policies in a high-level language; rather, the entire buffering architecture and its policy are reduced to several comparators and simple functions. We show examples of buffer management policies in BASEL and demonstrate empirically the impact of various settings on performance.

Gibbs Sampler Optimization for Analysis of a Granulated Medium

2016 · ARTICLE · ru

Предлагается новая вариация метода восстановления плотности распределе- ний вероятностей для задач тематического моделирования. Рассматриваются недостатки алгоритма сэмплирования по Гиббсу и предлагается его модифицированный вариант — гранулированный метод сэмплирования. На основе статистического моделирования показано, что предлагаемый алгоритм является более стабильным по сравнению с двумя другими вариантами алгоритма сэмплирования.

Topic Quality Metrics Based on Distributed Word Representations

2016 · CHAPTER · en

Automated evaluation of topic quality remains an important unsolved problem in topic modeling and represents a major obstacle for development and evaluation of new topic models. Previous attempts at the problem have been formulated as variations on the coherence and/or mutual information of top words in a topic. In this work, we propose several new metrics for evaluating topic quality with the help of distributed word representations; our experiments suggest that the new metrics are a better match for human judgement, which is the gold standard in this case, than previously developed approaches.

Large profits or fast gains: A dilemma in maximizing throughput with applications to network processors

2016 · ARTICLE · en

We consider the fundamental problem of managing a bounded size queue buffer where traffic consists of packets of varying size, each packet requires several rounds of processing before it can be transmitted out, and the goal is to maximize the throughput, i.e., total size of successfully transmitted packets. Our work addresses the tension between two conflicting algorithmic approaches: favoring packets with fewer processing requirements as opposed to packets of larger size. We present a novel model for studying such systems and study the performance of online algorithms that aim to maximize throughput.

Курсы (3)