DSA Faculty
API
← к списку преподавателей

Браславский Павел Исаакович

Факультет компьютерных наук

Публикаций
50
Языков
3
Наград
2
Конференций
0
Профиль Публикации (50) Курсы (2)

Должности

  • Старший научный сотрудникФакультет компьютерных наук, Научно-учебная лаборатория моделей и методов вычислительной прагматики
  • ДоцентФакультет компьютерных наук, Департамент больших данных и информационного поиска

Био

  • · Начал работать в НИУ ВШЭ в 2020 году.

Образование

  • 2000 · Кандидат наук
  • 1997 · Специалитет: Уральский государственный технический университет г. Екатеринбурга, специальность «Вычислительные машины, комплексы, системы и сети», квалификация «Инженер-системотехник»

Опыт работы

  • · Яндекс, СКБ Контур, JetBrains Research

Награды и поощрения

  • · Надбавка за публикацию в журнале из Списка А (и приравненном к нему научном издании) (2025–2026, 2023–2024)
  • · Надбавка за публикацию в международном рецензируемом научном издании (2022–2023, 2021–2022, 2020–2021)

Гранты и проекты

  • · на соискание учёной степени кандидата наук

Идентификаторы исследователя

Публикации (50)

Large Dataset and Language Model Fun-Tuning for Humor Recognition

2019 · CHAPTER · en

The task of humor recognition has attracted a lot of attention recently due to the urge to process large amounts of user-generated texts and rise of conversational agents. We collected a dataset of jokes and funny dialogues in Russian from various online resources and complemented them carefully with unfunny texts with similar lexical properties. The dataset comprises of more than 300,000 short texts, which is significantly larger than any previous humor-related corpus. Manual annotation of 2,000 items proved the reliability of the corpus construction approach. Further, we applied language model fine-tuning for text classification and obtained an F1 score of 0.91 on a test set, which constitutes a considerable gain over baseline methods. The dataset is freely available for research community.

WORDS AND TOPICS: CONTENT REPRESENTATIONS FOR BOOK RECOMMENDATION

2019 · CHAPTER · en

The paper describes an exploratory study on content-based book recom-mendation. We use a large dataset of book ratings along with book content. We experiment with several topic modeling variants and tf.idf representation. Predictions based on one of the topic modeling variants slightly outperform a simple baseline of averaged book scores. The obtained results suggest that content features can potentially improve hybrid book recommender systems.

Log-Based Reading Speed Prediction: A Case Study on War and Peace

2019 · CHAPTER · en

In this exploratory study, we analyze reading behavior using logs from an ebook reading app. The logs contain users’ page turns along with time stamps and page sizes in characters. We focus on 17 readers of War and Peace by Leo Tolstoy, who read at least 80% of the novel. We aim at learning a regression model for reading speed based on shallow textual (e.g. word and sentence lengths) and contextual (e.g. time of the day and position in the book) features. Contextual features outperform textual ones and allow to predict reading speed with moderate quality. We share insights about the results and outline directions for future research. The analysis of reading behavior can be beneficial for school education, reading promotion, book recommendation, as well as for traditional creative writing and interactive fiction design.

Word Embedding for Semantically Related Words: An Experimental Study

2019 · ARTICLE · en

The ability to identify semantic relations between words has made a word2vec model widely used in NLP tasks. The idea of word2vec is based on a simple rule that a higher similarity can be reached if two words have a similar context. Each word can be represented as a vector, so the closest coordinates of vectors can be interpreted as similar words. It allows to establish semantic relations (synonymy, relations of hypernymy and hyponymy and other semantic relations) by applying an automatic extraction. The extraction of semantic relations by hand is considered as a time-consuming and biased task, requiring a large amount of time and some help of experts. Unfortunately, the word2vec model provides an associative list of words which does not consist of relative words only. In this paper, we show some additional criteria that may be applicable to solve this problem. Observations and experiments with well-known characteristics, such as word frequency, a position in an associative list, might be useful for improving results for the extraction of semantic relations for the Russian language by using word embedding. In the experiments, the word2vec model trained on the Flibusta and pairs from Wiktionary are used as examples with semantic relationships. Semantically related words are applicable to thesauri, ontologies and intelligent systems for natural language processing.

Personal Names Popularity Estimation and Its Application to Record Linkage

2018 · CHAPTER · en

In this study, we investigate several statistical techniques for personal name popularity estimation and perform a record linkage experiment guided by name popularity estimates. The results show that name popularity can leverage personal name matching in databases and be of interest for many other domains.

A Cross-Cultural Analysis of Trust in Recommender Systems

2018 · CHAPTER · en

User system trust is critical to the uptake of recommendations, and several factors of trust have been identified and compared. In this paper we present a cross-cultural, crowdsourced study examining user perceptions of nine factors of trust and link the observed differences to trust development processes and cultural dimensions. While some factors consistently instil trust, others are preferred only in certain countries. Our findings and the discovered links are important for design of trusted recommender systems

Stierlitz Meets SVM: Humor Detection in Russian

2018 · CHAPTER · en

In this paper, we investigate the problem of the humor detection for Russian language. For experiments, we used a large collection of jokes from social media and a contrast collection of non-funny sentences, as well as a small collection of puns. We implemented a large set of features and trained several SVM classifiers. The results are promising and establish a baseline for further research in this direction.

Cleaning Up After a Party: Post-processing Thesaurus Crowdsourced Data

2018 · CHAPTER · en

The study deals with post-processing of a noisy collection of synsets created using crowdsourcing. First, we cluster long synsets in three different ways. Second, we apply four cluster cleaning techniques based either on word popularity or word embeddings. Evaluation shows that the method based on word embeddings and existing dictionary definitions delivers best results.

How to Evaluate Humorous Response Generation, Seriously?

2018 · CHAPTER · en

Nowadays natural language user interfaces, such as chatbots and conversational agents, are very common. A desirable trait of such applications is a sense of humor. It is, therefore, important to be able to measure quality of humorous responses. However, humor evaluation is hard since humor is highly subjective. To address this problem, we conducted an online evaluation of 30 dialog jokes from different sources by almost 300 participants -- volunteers and Mechanical Turk workers. We collected joke ratings along with participants» age, gender, and language proficiency. Results show that demographics and joke topics can partly explain variation in humor judgments. We expect that these insights will aid humor evaluation and interpretation. The findings can also be of interest for humor generation methods in conversational systems.

Extraction of Hypernyms from Dictionaries with a Little Help from Word Embeddings

2018 · CHAPTER · en

The paper investigates several techniques for hypernymy extraction from a large collection of dictionary definitions in Russian. First, definitions from different dictionaries are clustered, then single words and multiwords are extracted as hypernym candidates. A classification-based approach on pre-trained word embeddings is implemented as a complementary technique. In total, we extracted about 40K unique hypernym candidates for 22K word entries. Evaluation showed that the proposed methods applied to a large collection of dictionary data are a viable option for automatic extraction of hyponym/hypernym pairs. The obtained data is available for research purposes.

Курсы (2)