Карданова Елена Юрьевна

Институт образования

Профиль на hse.ru ↗ тел.: +7 (495) 772-95-90 | 22095 | +7 (916) 393-86-49

Публикаций

Языков

Наград

Конференций

Профиль Публикации (68) Курсы (6)

Профессиональные интересы

психометрикаразработка тестовшкалирование результатов тестированияItem Response Theoryмодели Раша

Должности

Доцент — Институт образования, Департамент образовательных программ
Ведущий научный сотрудник — Институт образования, Центр психометрики и измерений в образовании
Научный руководитель — Институт образования, Центр психометрики и измерений в образовании
Научный руководитель образовательной программы — Обучение и оценивание как наука
Руководитель программы повышения квалификации — Теория и практика разработки инструментов оценивания в образовании

Био

· Начала работать в НИУ ВШЭ в 2010 году.
· Научно-педагогический стаж: 40 лет.

Образование

2009 · Кандидат физико-математических наук: Ленинградский государственный университет им. А.А. Жданова, специальность 01.01.02 «Дифференциальные уравнения, динамические системы и оптимальное управление»
1992 · Ученое звание: Доцент
1980 · Специалитет: Ленинградский государственный университет им. А.А. Жданова, факультет: Математико-механический, специальность «Математика», квалификация «Математик»

Опыт работы

· 2010: настоящее время
· НИУ ВШЭ, Институт образования
· Директор Центра психометрики и измерений в образовании,
· Доцент Департамента образовательных программ,
· Ординарный профессор.
· Новгородский государственный университет имени Ярослава Мудрого, Институт электронных и информационных систем
· Кафедра высшей математики
· Доцент, руководитель секции тестовых технологий
· Великий Новгород, Россия
· Федеральный центр тестирования МО РФ
· Москва, Россия
· Исследователь, руководитель отдела анализа тестовых заданий и тестов

Награды и поощрения

· Благодарность Института образования НИУ ВШЭ (декабрь 2025)
· Благодарственное письмо проректора НИУ ВШЭ (декабрь 2024)
· Благодарственное письмо проректора НИУ ВШЭ (декабрь 2024)
· Медаль "Признание - 10 лет успешной работы" НИУ ВШЭ (март 2024)
· Благодарственное письмо проректора НИУ ВШЭ (февраль 2024)
· Благодарственное письмо проректора НИУ ВШЭ (февраль 2024)
· Благодарственное письмо ректора Высшей школы экономики (декабрь 2022)
· Благодарность Высшей школы экономики (сентябрь 2020)
· Благодарность Высшей школы экономики (июнь 2018)
· Почетная грамота Министерства образования и науки Российской Федерации (июнь 2012)
· Почетная грамота Министерства образования и науки Российской Федерации (январь 2012)
· Почетная грамота Комитета образования, науки и молодежной политики Новгородской области (июнь 2008)
· Надбавка за академическую работу (2015–2016)
· Надбавка за публикацию в международном рецензируемом научном издании (2021–2022, 2019–2021, 2017–2018)
· Надбавка за регулярные публикации в международных рецензируемых научных изданиях (2024–2029, 2023–2028, 2022–2027)
· Надбавка за статью в зарубежном рецензируемом научном издании (2016–2017)
· Лучший преподаватель — 2022–2023, 2013–2015
· Победитель Конкурса лучших русскоязычных научных и научно-популярных работ работников НИУ ВШЭ – 2023

Гранты и проекты

— · на соискание учёной степени кандидата наук

Конференции (23)

Показать все

· 2018: 19th Annual Conference of the Association for Educational Assessment - Europe (Неймеген). Доклад: A possibility for cross - country comparisons of early reading for children starting school in Russia and in the UK
· 2018: Seventh International Conference on Probabilistic Models for Measurement (Перт). Доклад: Setting benchmarks of children reading development for potential comparisons across countries and cultures
· 2017: 18th Annual Conference of the Association for Educational Assessment - Europe (Прага). Доклад: Providing Validity Evidence for the Engineering Students Professional Competences Test (evidence from Russia and China)
· 2017: 2017 WERA Focal Meeting. Доклад: Behavior problems of students in primary school and their impact on academic achievement and progress: case of Russia
· 2017: 18th Annual Conference of the Association for Educational Assessment - Europe (Прага). Доклад: Providing Validity Evidence for the Engineering Students Professional Competences Test (evidence from Russia and China)
· 2016: XVII Апрельская международная научная конференция «Модернизация экономики и общества» (Москва). Доклад: Прогресс первоклассников за первый год обучения в школе: фиксация неравенства на начальной ступени образования
· 2016: XVII Апрельская международная научная конференция «Модернизация экономики и общества» (Москва). Доклад: Оценка качества инженерного образования в элитных и неэлитных вузах России
· 2015: 16th Annual AEA-Europe Conference (Глазго). Доклад: The challenges of equating tests between Russia and Scotland
· 2014: 40th Annual Conference IAEA 2014 (International Association of Educational Assessment): Assessment Innovations for the 21st Century. Доклад: The study of international Performance Indicators for Primary Schools (iPIPS): a trial in Russia
· 2014: "Дифференциация в динамично меняющихся системах высшего образования: вызовы и возможности", V Международная конференция Российской ассоциации исследователей высшего образования (Москва). Доклад: Неоднородность системы российского высшего образования: введение в проблему и походы к оцениванию
· 2014: 40th Annual Conference IAEA 2014 (International Association of Educational Assessment): Assessment Innovations for the 21st Century. Доклад: The Results of Student Achievement Monitoring in Primary School in the Context of Educational Environment
· 2014: 2014 WERA Focal Meeting (Эдинбург). Доклад: The challenges of equating tests between Russia and the UK
· 2014: 15th Annual AEA-Europe Conference (Таллинн). Доклад: Cross-cultural comparability of SAM-Math results
· 2014: ECER 2014, Porto The Past, Present and Future of Educational Research in Europe. Доклад: Cross-cultural Comparability of Survey Data: Case of Nordic-Baltic Comparative Study"; "Measuring Heterogeneity Of The Educational System
· 2013: 7th International Technology, Education and Development Conference INTED 2013 (Валенсия). Доклад: Academic Heterogeneity of Universities Freshmen
· 2013: 7th International Technology, Education and Development Conference INTED 2013 (Валенсия). Доклад: The results of student achievement monitoring in primary school and their connection to educational environment characteristics
· 2013: XXIII Всероссийская научно-методическая конференция "Проблемы качества образования". Уфа-Москва 20-27 мая 2013 г. (Москва-Уфа). Доклад: Спецификация компетентностно-ориентированных оценочных средств
· 2013: XXIII Всероссийская научно-методическая конференция "Проблемы качества образования". Уфа-Москва 20-27 мая 2013 г. (Москва-Уфа). Доклад: О подходе к разработке компетентностно-ориентированных оценочных средств
· 2013: 14th Annual AEA-Europe Conference (Paris). Доклад: Comparative Study on Mathematics Teachers’ Beliefs and Practices
· 2013: 14th Annual AEA-Europe Conference (Paris). Доклад: Student Achievement Monitoring Toolkit (SAM): Validity and Reliability Issues
· 2012: 4th International Conference on Education and New Learning Technologies (EDULEARN12) (Барселона). Доклад: New technologies of assessment: School Achievements Monitoring
· 2012: 38th Annual Conference IAEA 2012 (International Association of Educational Assessment) (Астана). Доклад: School achievement monitoring toolkit: validation study
· 2012: 13th annual AEA-E Conference (Берлин). Доклад: School Achievements Monitoring Toolkit: Framework and First Results

Идентификаторы исследователя

ORCID: 0000-0003-2280-1258
ResearcherID: A-8437-2014
SPIN РИНЦ: 8135-8630
Google Scholar: https://scholar.google.ru/citations?user=pivOY3sAAAAJ&hl=ru&citsig=AMstHGTDe8QjCn5AKY6fV3zm6b2tIJdl5Q
Scopus AuthorID: 56556956700

Публикации (68)

Валидизация ЕНТ в Казахстане: доказательная основа справедливого тестирования

2026 в печати · ARTICLE · ru

Во всем мире к вступительным экзаменам, как к экзаменам с высокими ставками, предъявляются высокие требования: они должны быть максимально объективными, надежными и справедливыми. Это означает, что процесс их разработки, проведения и оценки должен быть тщательно контролируемым и стандартизированным, чтобы исключить любую возможность предвзятости. В Казахстане прием в вузы осуществляется по результатам государственного вступительного экзамена - Единого национального тестирования (ЕНТ). Целью настоящей работы является исследование валидности результатов Единого Национального Тестирования в Казахстане на примере математики. При этом основное внимание уделено двум блокам свидетельств валидности - на основе внутренней структуры и на основе связи с другими переменными, в частности критериями последующей успешности студентов на этапе получения высшего образования. В качестве теоретической рамки нашего исследования выбрана теория валидности С. Мессика. Серия валидизационных исследований проводилась согласно методологическим требованиям объединенных стандартов AERA, NCME, APA с использованием подходов классической теории тестирования, современной теории тестирования, а также статистического анализа, включая иерархическое линейное моделирование. Проведенное исследование позволяет сделать вывод, что экзамен ЕНТ по математике обладает высоким психометрическим качеством. Также, общий балл ЕНТ по пяти предметам и отдельно ЕНТ по математике позволяют успешно прогнозировать будущую успешность обучения студентов в вузе. При этом, качество используемых нами статистических моделей и объясняемая ими дисперсия оценок студентов за первый семестр во многом согласуется с результатами других исследователей из разных стран, включая Россию и США.

Can Large Language Models Develop High-Stakes Physics Exam Items? A Comprehensive Study of Cognitive and Psychometric Efficacy

2026 · ARTICLE · en

High-stakes assessment is crucial for evaluating student performance and making significant educational decisions. Traditionally, the development of test items for such examinations has relied on manual development by subject matter experts. However, Automated Item Generation (AIG) using Large Language Models (LLMs) has emerged as a promising alternative, though systematic research on their application in high-stakes assessments, particularly in STEM fields like physics, is limited. High-stakes physics assessments must evaluate a range of cognitive skills, from basic recall to advanced analytical thinking. Previous AIG studies have predominantly focused on lower-order cognitive skills, neglecting higher-order thinking. Moreover, the psychometric quality of items generated by LLMs has not been thoroughly validated, raising concerns about their validity and reliability for high-stakes contexts. To explore this, we investigated LLM capabilities in generating physics test items suitable for Nigeria’s Unified Tertiary Matriculation Examination (UTME). Based on our preliminary findings, the current study utilized Gemini 2.0 flash and instructional prompting technique for item generation. Bloom’s taxonomy was used as a framework to generate items of different cognitive levels in line with the UTME blueprint. The quality of the generated items was evaluated through expert reviews using a six-criteria rubric and pilot testing with 527 final-year high school students. Psychometric analysis, based on Classical Test Theory and Item Response Theory, confirmed the high quality of the LLM-generated physics test, demonstrating unidimensionality and good psychometric properties. This study demonstrates that under controlled conditions and with expert review, LLMs can produce high-stakes physics MCQ items whose psychometric qualities (difficulty, discrimination) are comparable to those of a real UTME physics test. This suggests LLMs have potential as cost-effective item-generation tools, for educational institutions and testing organizations.

DOI ↗

Determinants of economic literacy among Russian university students: A Hierarchical Linear Modeling approach

2026 · ARTICLE · en

This study investigates the factors influencing economic literacy among university students in Russia, addressing a significant gap in the existing literature. Economic literacy is recognized as a universal competency essential for individuals to navigate rapidly changing economic environments. Using a psychometrically robust Test of Economic Literacy (TEL) and a supplementary survey, the research employed Hierarchical Linear Modeling (HLM) to analyze data from 1,115 students nested within 56 academic groups across five Russian universities, accounting for both individual-level and group-level influences. Key results identified academic specialization (both at high school and university levels) and individual interest in economics as the strongest positive predictors of economic literacy. Notably, receiving pocket money irregularly (vs. regularly) significantly enhanced economic literacy, suggesting adaptability benefits. Conversely, students in non-economic fields (e.g., humanities, social sciences, technical sciences, pedagogy, service/tourism) demonstrated lower level of economic literacy compared to economics majors. Socio-demographic factors like gender and age showed no significant effects. These findings highlight the importance of both formal academic pathways and informal experiential learning, offering insights for educators and policymakers aiming to enhance economic literacy in Russia’s evolving economic landscape.

DOI ↗ PDF ↗

The Unified State Exam and Academic Performance: A Three-Year Analysis of Relationships Across Selection Method and Gender in University Students

2025 в печати · ARTICLE · en

Th e primary objective of this study is to evaluate the efficiency of current university admission tests in selecting qualified students in a public university by measuring the extent to which an applicant ’ s performance in admission tests can predict his / her academic performance after enrolling at the university .

DOI ↗ PDF ↗

A comparison of the stability of ability parameter estimation based on the maximum likelihood and Bayesian estimation: A case study of dichotomous scoring test results

2025 · ARTICLE · en

This research is related to Item Response Theory (IRT), which is essential for determining the best method for estimating participants' abilities on a test measuring English listening ability. This study aims to (1) determine the characteristics of the test device measuring English listening ability, (2) determine the effect of the length of the test on the stability of the ability estimation using the maximum likelihood (ML) method, (3) determine the effect of test length on the stability of the ability estimation using the Bayes method, and (4) compare the stability of the ability estimate between ML and Bayes. This research is an exploratory descriptive study using a simulation approach. The best model is selected to generate data. The result of the generation is the actual ability (θ) and the participant's response, which is estimated with the maximum likelihood and Bayes, which produces the estimated ability with 10 replications, and is compared with calculating the MSE (mean square error). The method with a smaller MSE is stable and has a better estimation method. The results show that (1) the 2PL model is the best, (2) the length of the test affects the stability of the ability estimation in the ML method and the most stable case when the test contains 46 items, (3) the length of the test affects the stability of the ability estimate in the Bayes method and it is most stable when the test contains 46 items, and (4) the Bayes method is better and more accurate for estimating ability.

DOI ↗ PDF ↗

The Influence of Demographic Characteristics on Student Academic Performance in University Admission Tests

2025 · ARTICLE · en

Background. Psychological attention to the role of academic performance in university admission tests is still growing. This attention includes both the psychometric characteristics of these tests and the cognitive and non-cognitive factors that may affect performance in admission tests. Objective. This study aims to assess the stability and changes in the academic performance of students across two consecutive admission tests. It also aims to examine how participants’ genders and languages can relate to student academic performance. Design. To test the research hypotheses, we sampled a large group of school graduates (n = 25.563) who applied and took two national university admission tests in 2024 in Kazakhstan. Results. The results of the Pearson correlation revealed that there is a positive association between the academic performance of applicants in two consecutive national admission test scores (r = .913). The paired t-test results suggested that they were statistically significant, with the largest increase in overall scores (ΔM = .96). Gender was found to affect the academic performance such that females outperformed males (85.6% vs. 64.6% pass rates), showing stronger effect sizes. There were also statistical differences between the two groups of Russian and Kazakh test takers; however, this difference was not practically significant. Conclusion. These findings suggest that demographic characteristics of applicants can considerably influence the university admission results, and it is recommended that test developers be attentive to this issue. Academic performance showed strong stability over time, especially in STEM subjects, with significant but modest overall improvements across subjects.

DOI ↗ PDF ↗

Руководство по стандартизации психодиагностического инструментария: требования и оценка качества : учебное пособие

2024 · BOOK · ru

В настоящее время в связи с укреплением позиций доказательного подхода в психологии и образовании все большую роль приобретает построение практической деятельности специалистов в этих областях на основе данных научных исследований. Пользование валидным и надежным диагностическим инструментарием является необходимой составляющей доказательного подхода и актуально для специалистов-исследователей и практиков в любой сфере психологии и образования. Цели Руководства: • Предоставить разработчикам психодиагностических инструментов (методик, тестов, шкал, опросников) описание основных процедур для создания надежного и валидного психодиагностического инструментария в соответствии с классической теорией тестирования (КТТ). В качестве дополнения могут быть использованы методы современной теории тестирования (IRT). • Определить параметры для оценки качества психодиагностических инструментов (методик, тестов, шкал, опросников) экспертным сообществом на основе статьи в рецензируемом журнале с результатами стандартизации. • Предоставить пользователям психодиагностического инструментария — исследователям и психологам-практикам организаций и учреждений системы образования — ориентиры доказательности валидности и надежности применяемых инструментов. Информация о психометрических характеристиках диагностических методик может способствовать принятию обоснованных решений об их применении, повышению квалификации пользователей и росту культуры их применения в целом. • Способствовать дальнейшему развитию и расширению «Открытого реестра психодиагностических методик, вызывающих доверие профессионального сообщества».

PDF ↗

A Systematic Review of the Use of Technology in Educational Assessment Practices: Lesson Learned and Direction for Future Studies

2024 · ARTICLE · en

Previous studies have demonstrated that technology helps achieve learning outcomes. However, many studies focus on just one aspect of technology’s role in educational assessment practices, leaving a gap in studies that examine how various aspects affect the use of technology in assessments. Hence, through a systematic work, we analyzed the extent and manner in which technology is integrated into educational assessments and how education level, domain of learning, and region may affect the use of technology. We reviewed empirical studies from two major databases (i.e., Scopus and ERIC) and a national journal whose focus and scope are on educational measurement and assessment, following PRISMA guidelines for systematic reviews. The findings of the present study are directed towards emphasizing the roles of technology in educational assessment practices and how these roles are adapted to varying educational contexts such as the level of education, the three domains of learning (i.e., cognitive, psychomotor, and affective), and the setting in which the assessment was conducted. These findings not only highlight the current roles of technology in educational assessment but also provide a roadmap for future research aimed at optimizing the integration of technology across diverse educational contexts.

DOI ↗

Automatic generation of physics items with Large Language Models (LLMs)

2024 · ARTICLE · en

High-quality items are essential for producing reliable and valid assessments, offering valuable insights for decision-making processes. As the demand for items with strong psychometric properties increases for both summative and formative assessments, automatic item generation (AIG) has gained prominence. Research highlights the potential of large language models (LLMs) in the AIG process, noting the positive impact of generative AI tools like ChatGPT on educational assessments, recognized for their ability to generate various item types across different languages and subjects. This study fills a research gap by exploring how AI-generated items in secondary/high school physics aligned with educational taxonomy. It utilizes Bloom's taxonomy, a well-known framework for designing and categorizing assessment items across various cognitive levels, from low to high. It focuses on a preliminary assessment of LLMs ability to generate physics items that match the Bloom’s taxonomy application level. Two leading LLMs, ChatGPT (GPT-4) and Gemini, were chosen for their strong performance in creating high-quality educational content. The research utilized various prompts to generate items at different cognitive levels based on Bloom's taxonomy. These items were assessed using multiple criteria: clarity, accuracy, absence of misleading content, appropriate complexity, correct language use, alignment with the intended level of Bloom's taxonomy, solvability, and assurance of a single correct answer. The findings indicated that both ChatGPT and Gemini were skilled at generating physics assessment items, though their effectiveness varied based on the prompting methods used. Instructional prompts, particularly, resulted in excellent outputs from both models, producing items that were clear, precise, and consistently aligned with the Application level of Bloom's taxonomy.

DOI ↗ PDF ↗

Application of the Contemporary Psychometrics for Assessing Economic Literacy

2024 · ARTICLE · en

Currently, new skills and various types of "new literacies" relevant to the modern world are becoming issues of growing importance. One of them is economic literacy; however, there are only few assessment instruments that fulfil the academic requirements for its assessment among university students. One of such internationally established instruments is the Test of Understanding in College Economics (TUCE), which is a popular tool in empirical studies of economic literacy in many countries around the world. Despite its advantages, the currently available version of the TUCE designed for American colleges back in 2006, is prone to cheating and provides limited opportunities for formative feedback. The purpose of this paper is to present the Updated Test of Understanding in College Economics (U-TUCE). In developing the U-TUCE, we utilized the capabilities of contemporary psychometrics, which offer sufficient advances in overcoming all limitations of the original TUCE mentioned before. First, we present a revised theoretical framework of the U-TUCE, highlighting that the test measures different types of mastery of economic literacy. Second, we describe the approaches used for modifying the TUCE items and developing new items. A half of the original test items have been replaced or redesigned to reflect the economic context that has changed since 2006. Third, we utilize the logic of automatic item generation algorithms to increaseg the level of test protection against cheating. We made all changes in such a way as to maintain comparability with the previous versions of the TUCE test if necessary. Finally, the use of the Item Response Theory (IRT) is paired up with that of Cognitive Diagnostic Modeling (CDM) to ensure the quality of the U-TUCE and enhance its formative value. We show that IRT can be used to estimate the construct as a whole (which is of interest to researchers, administrators, and policy makers), while CDM provides information relating to each of the construct components, which are of interest to educational practitioners and students themselves. The results of the data analyses show that the test can be used for both purposes simultaneously.

DOI ↗

Курсы (6)

Psychometric Theories and Analysis of Test Items · 4 раза

2025/2026, 2024/2025, 2023/2024, 2022/2023 · Магистратура / Маго-лего · Анг
Разработка инструментов измерения · 5 раза

2025/2026, 2024/2025, 2023/2024, 2022/2023, 2021/2022 · Аспирантура / Аспирантура направление: 00.00.00. Аспирантура / Аспирантура направление: 44.06.01. Образование и педагогические науки · рус
Углубленная психометрика · 4 раза

2024/2025, 2023/2024, 2022/2023, 2021/2022 · Аспирантура / Аспирантура направление: 00.00.00. Аспирантура / Аспирантура направление: 44.06.01. Образование и педагогические науки · рус
37.04.01. Психология · 3 раза

2023/2024, 2022/2023, 2021/2022 · Магистратура · Анг
Национальные и международные программы оценки образовательных достижений

2022/2023 · Маго-лего · рус
Психометрические теории и анализ тестовых заданий

2021/2022 · Магистратура · рус