DSA Faculty
API
← к списку преподавателей

Браславский Павел Исаакович

Факультет компьютерных наук

Публикаций
50
Языков
3
Наград
2
Конференций
0
Профиль Публикации (50) Курсы (2)

Должности

  • Старший научный сотрудникФакультет компьютерных наук, Научно-учебная лаборатория моделей и методов вычислительной прагматики
  • ДоцентФакультет компьютерных наук, Департамент больших данных и информационного поиска

Био

  • · Начал работать в НИУ ВШЭ в 2020 году.

Образование

  • 2000 · Кандидат наук
  • 1997 · Специалитет: Уральский государственный технический университет г. Екатеринбурга, специальность «Вычислительные машины, комплексы, системы и сети», квалификация «Инженер-системотехник»

Опыт работы

  • · Яндекс, СКБ Контур, JetBrains Research

Награды и поощрения

  • · Надбавка за публикацию в журнале из Списка А (и приравненном к нему научном издании) (2025–2026, 2023–2024)
  • · Надбавка за публикацию в международном рецензируемом научном издании (2022–2023, 2021–2022, 2020–2021)

Гранты и проекты

  • · на соискание учёной степени кандидата наук

Идентификаторы исследователя

Публикации (50)

A Systematic Evaluation of Transfer Learning and Pseudo-labeling with BERT-based Ranking Models

2021 · CHAPTER · en

Due to high annotation costs making the best use of existing human-created training data is an important research direction. We, therefore, carry out a systematic evaluation of transferability of BERT-based neural ranking models across five English datasets. Previous studies focused primarily on zero-shot and few-shot transfer from a large dataset to a dataset with a small number of queries. In contrast, each of our collections has a substantial number of queries, which enables a full-shot evaluation mode and improves reliability of our results. Furthermore, since source datasets licences often prohibit commercial use, we compare transfer learning to training on pseudo-labels generated by a BM25 scorer. We find that training on pseudo-labels---possibly with subsequent fine-tuning using a modest number of annotated queries---can produce a competitive or better model compared to transfer learning. Yet, it is necessary to improve the stability and/or effectiveness of the few-shot training, which, sometimes, can degrade performance of a pretrained model.

NEREL: A Russian Dataset with Nested Named Entities, Relations and Events

2021 · CHAPTER · en

In this paper, we present NEREL, a Rus- sian dataset for named entity recognition and relation extraction. NEREL is significantly larger than existing Russian datasets: to date it contains 56K annotated named entities and 39K annotated relations. Its important dif- ference from previous datasets is annotation of nested named entities, as well as relations within nested entities and at the discourse level. NEREL can facilitate development of novel models that can extract relations be- tween nested named entities, as well as rela- tions on both sentence and document levels. NEREL also contains the annotation of events involving named entities and their roles in the events. The NEREL collection is available via https://github.com/nerel-ds/NEREL.

Overview of SimpleText CLEF 2021 workshop and pilot tasks

2021 · CHAPTER · en

Scientific literacy is important for people to make right decisions, evaluate the information quality, maintain physiological and mental health, avoid spending money on useless items. However, since scientific publications are difficult for people outside the domain and so they do not read them at all even if they are accessible. Text simplification approaches can remove some of these barriers to use scientific information, thereby promoting the use of objective scientific findings and avoiding that users rely on shallow information in sources prioritizing commercial or political incentives rather than the correctness and informational value. The CLEF 2021 SimpleText workshop addresses the opportunities and challenges of text simplification approaches to improve scientific information access head-on. This year, we run three pilot tasks trying to answer the following questions: (1) What information should be simplified? (2) Which terms should be contextualized by giving a definition and/or application? (3) How to improve the readability of a given short text (e.g. by reducing vocabulary and syntactic complexity) without significant information distortion?

Overview of SimpleText 2021 - CLEF Workshop on Text Simplification for Scientific Information Access

2021 · CHAPTER · en

Misbeliefs and Biases in Health-Related Searches

2021 · CHAPTER · en

Comparative Web Search Questions

2020 · CHAPTER · en

Comparing Intelligent Personal Assistants on Humor Function

2020 · CHAPTER · en

Intelligent personal assistants (IPA) use humor to engage and entertain users as well as mitigate performance limitations. In order to understand the types of users’ humorous interactions with IPA, we developed a classification of humorous utterances that included categories of questions about IPA personality, requests for jokes, rhetorical statement, and others. In order to illustrate the usefulness of classification for analyzing IPA interactions, we used it for comparing the four major IPAs on their responses to humorous utterances. A representative sample of 96 humorous utterances in each humor category and IPA type was developed and tested by 14 participants. The study found that IPA responses to specific requests for jokes received the highest humor ratings from users. The study also found that, overall, Alexa was rated as the most humorous IPA, followed by Google Assistant and Cortana. Interpretation of the findings in light of humor theories and IPA features are provided.

SberQuAD – Russian Reading Comprehension Dataset: Description and Analysis

2020 · CHAPTER · en

The paper presents SberQuAD – a large Russian reading comprehension (RC) dataset created similarly to English SQuAD. SberQuAD contains about 50K question-paragraph-answer triples and is seven times larger compared to the next competitor. We provide its description, thorough analysis, and baseline experimental results. We scrutinized various aspects of the dataset that can have impact on the task performance: question/paragraph similarity, misspellings in questions, answer structure, and question types. We applied five popular RC models to SberQuAD and analyzed their performance. We believe our work makes an important contribution to research in multilingual question answering.

RuBQ: A Russian Dataset for Question Answering over Wikidata

2020 · CHAPTER · en

The paper presents RuBQ, the first Russian knowledge base question answering (KBQA) dataset. The high-quality dataset consists of 1,500 Russian questions of varying complexity, their English machine translations, SPARQL queries to Wikidata, reference answers, as well as a Wikidata sample of triples containing entities with Russian labels. The dataset creation started with a large collection of question-answer pairs from online quizzes. The data underwent automatic filtering, crowd-assisted entity linking, automatic generation of SPARQL queries, and their subsequent in-house verification. The freely available dataset will be of interest for a wide community of researchers and practitioners in the areas of Semantic Web, NLP, and IR, especially for those working on multilingual question answering. The proposed dataset generation pipeline proved to be efficient and can be employed in other data annotation projects.

Black-Box Testing of Financial Virtual Assistants

2020 · CHAPTER · en

We propose a hybrid technique of black-box testing of virtual assistants (VAs) in the financial sector. The specifics of the highly regulated industry imposes numerous limitations on the testing process: GDPR and other data protection requirements, the absence of interaction logs with real users, restricted access to internal data, etc. These limitations also decrease the applicability of a few VA testing methods that are widely described in the research literature. The approach suggested in this paper consists of semi-controlled interaction logging from the trained testers and subsequent augmenting the collected data for automated testing.

Курсы (2)