Браславский Павел Исаакович

Факультет компьютерных наук

Профиль на hse.ru ↗ тел.: 27276

Публикаций

Языков

Наград

Конференций

Профиль Публикации (50) Курсы (2)

Должности

Старший научный сотрудник — Факультет компьютерных наук, Научно-учебная лаборатория моделей и методов вычислительной прагматики
Доцент — Факультет компьютерных наук, Департамент больших данных и информационного поиска

Био

· Начал работать в НИУ ВШЭ в 2020 году.

Образование

2000 · Кандидат наук
1997 · Специалитет: Уральский государственный технический университет г. Екатеринбурга, специальность «Вычислительные машины, комплексы, системы и сети», квалификация «Инженер-системотехник»

Опыт работы

· Яндекс, СКБ Контур, JetBrains Research

Награды и поощрения

· Надбавка за публикацию в журнале из Списка А (и приравненном к нему научном издании) (2025–2026, 2023–2024)
· Надбавка за публикацию в международном рецензируемом научном издании (2022–2023, 2021–2022, 2020–2021)

Гранты и проекты

— · на соискание учёной степени кандидата наук

Идентификаторы исследователя

ORCID: 0000-0002-6964-458X
ResearcherID: P-5139-2016
SPIN РИНЦ: 7958-6601
Google Scholar: https://scholar.google.com/citations?user=ch2vRdcAAAAJ&hl=en
Scopus AuthorID: 16548847400

Публикации (50)

A Pinch of Humor for Short-Text Conversation: An Information Retrieval Approach

2017 · CHAPTER · en

The paper describes a work in progress on humorous response generation for short-text conversation using information retrieval approach. We gathered a large collection of funny tweets and implemented three baseline retrieval models: BM25, the query term reweighting model based on syntactic parsing and named entity recognition, and the doc2vec similarity model. We evaluated these models in two ways: in situ on a popular community question answering platform and in laboratory settings. The approach proved to be promising: even simple search techniques demonstrated satisfactory performance. The collection, test questions, evaluation protocol, and assessors' judgments create a ground for future research towards more sophisticated models.

DOI ↗

Which IR model has a better sense of humor? Search over a large collection of jokes

2017 · CHAPTER · en

This paper describes experiments on humorous response generation for short text conversations. Firstly, we compiled a collection of 63,000 jokes from online social networks (VK and Twitter). Secondly, we implemented several context-aware joke retrieval models: BM25 as a baseline, query term reweighting, word2vec-based model, and learning-to-rank approach with multiple features. Finally, we evaluated these models in two ways: on the community question answering platform Otvety@Mail.ru and in laboratory settings. Evaluation shows that an information retrieval approach to humorous response generation yields satisfactory performance.

What Do You Mean Exactly?: Analyzing Clarification Questions in CQA

2017 · CHAPTER · en

Search as a dialogue is an emerging paradigm that is fueled by the proliferation of mobile devices and technological advances, e.g. in speech recognition and natural language processing. Such an interface allows search systems to engage in a dialogue with users aimed at fulfilling their information needs. One key capability required to make such search dialogues effective is asking clarification questions (CLARQ) proactively, when a user's intent is not clear, which could help the system provide more useful responses. With this in mind, we explore the dialogues between the users on a community question answering (CQA) website as a rich repository of information-seeking interactions. In particular, we study the clarification questions asked by CQA users in two different domains, analyze their behavior, and the types of clarification questions asked. Our results suggest that the types of CLARQ are very diverse, while the questions themselves tend to be specific and require both domain- and general knowledge. However, focusing on popular CLARQ types and domains can be fruitful. As a first step towards automatic generation of clarification questions, we explore the problem of predicting the specific subject of a clarification question. Our findings can be useful for future improvements of intelligent dialog search and question answering systems.

DOI ↗

Large‐scale log analysis of digital reading

2016 · CHAPTER · en

In this paper, we address daily reading practices of the general public in Russia analyzing 10 months of log data from the commercial ebook site Bookmate. We study different reading characteristics with ebooks, i.e. the reading volume and preferences, reading schedule, reading speed and reading style (including parallel reading patterns and book abandonment rates), with respect to reader gender, book length and genre of the book. We find that book genres impact certain reading behaviors, while gender differences or book length seem to play less of a role in ebook reading. Parallel book reading and book abandonment occur very frequently, possibly pointing towards changing reading behaviors in the ebook environment. The obtained insights demonstrate the high potential of log analysis for book reading studies.

DOI ↗

Does everybody lie? characterizing answerers in health-related CQA

2016 · CHAPTER · en

The study described in the paper aims at multi-faceted characterization of active community question answering (CQA) users who provide answers to health-related questions. The study employs various research techniques - both qualitative (surveys) and quantitative. With two online surveys we get insights into 1. perception of online health-related information and its use by patients by medical professionals and 2. motivtion of most active CQA answerers, a significant share of which apparently constitute users with medical education. In the second series of experiments we apply topic modeling to a yearly collection of questions and answers from a popular Russian CQA servce in order to find users focused on a particular topic. Further, we attempt to find users with professional medical backround based on the lexis of their answers. The obtained results provide a beter understanding of motivation and backround of CQA users and can be used for the improvement of CQA services, as well as for solving problems such as CQA content quality evaluation, expert search, and question routing, etc.

DOI ↗

Family Matters: Company Relations Extraction from Wikipedia

2016 · CHAPTER · en

The study described in the paper deals with the extraction of relations between organizations from the Russian Wikipedia. We experiment with two data sources for supervised methods – manual annotations made from scratch and relations from infoboxes with subsequent sentence matching, as well as different feature sets and learning methods – SVM, CRF, and UIMA Ruta. Results show that the automatically obtained training data delivers worse results than manually annotated data, but the former approach is promising due to its scalability. Evaluation of relations extracted from a subset of Wikipedia pages that are mapped to the Russian state company registry proves that external sources can enrich and complement official databases.

DOI ↗

Ten Months of Digital Reading: An Exploratory Log Study

2016 · CHAPTER · en

We address digital reading practices in Russia analyzing 10 months of logging data from a commercial ebook mobile app. We describe the data and focus on three aspects: reading schedule, reading speed, and book abandonment. The exploratory study proves a high potential of the data and proposed approach.

DOI ↗

Show Me How to Tie a Tie: Evaluation of Cross-Lingual Video Retrieval

2016 · CHAPTER · en

In this study we investigate the potential of cross-lingual video retrieval for how-to questions. How-to questions are the most frequent among wh-questions and constitute almost 1 % of the entire query stream. At the same time, how-to videos are popular on video sharing services. We analyzed a dataset of 500M+ Russian how-to questions. First, we carried out manual labelling of 1,000 queries that shows that about two thirds of all how-to question queries are potentially suitable for answers in the form of video in a language other than the language of the query. Then, we evaluated video retrieval quality for original and machine translated queries on a crowdsourcing platform. The evaluation reveals that machine translated questions yield video search quality comparable to the quality for original questions. Cross-lingual video search for how-to queries can improve recall and diversity of search results, as well as compensate the shortage of original content in emerging markets.

DOI ↗

Multiple features for multiword extraction: A learning-to-rank approach

2016 · CHAPTER · en

This paper describes the extraction of multiword expressions (MWEs) from corpora for inclusion in a large online lexical resource for Russian. The novelty of the proposed approach is twofold: 1) we use two corpora-the Russian National Corpus and Russian Wikipedia-in parallel and 2) employ an extended set of features based on both data sources. To combine syntactic and statistical features derived from two corpora, we experiment with several learning-to-rank (LETOR) methods that have been proven to be highly effective in information retrieval (IR) scenarios. We make use of bigrams from existing dictionaries for learning, which leads to very sparing manual annotation efforts. Evaluation shows that machine-learned rankings with rich features significantly outperform traditional corpus-based association measures and their combinations. Analysis of resulting lists supports the claim that multiple features and diverse data sources improve the quality of extracted MWEs. The proposed method is language-independent.

YARN: Spinning-in-progress

2016 · CHAPTER · en

YARN (Yet Another RussNet), a project started in 2013, aims at creating a large open WordNet-like thesaurus for Russian by means of crowdsourcing. The first stage of the project was to create noun synsets. Currently, the resource comprises 100K+ word entries and 46K+ synsets. More than 200 people have taken part in assembling synsets throughout the project. The paper describes the linguistic, technical, and organizational principles of the project, as well as the evaluation results, lessons learned, and the future plans.

Курсы (2)

09.06.01. Информатика и вычислительная техника

2022/2023 · Аспирантура · Анг
Research Problems in Natural Language Processing

2021/2022 · Аспирантура · Анг