Investigating the impact of Music-Speech Integration Pedagogy (MSIP) on children’s music aptitude

Last registered on March 10, 2026

View Trial History

Pre-Trial

Trial Information

General Information

Title

Investigating the impact of Music-Speech Integration Pedagogy (MSIP) on children’s music aptitude

RCT ID

AEARCTR-0017895

Initial registration date

March 06, 2026

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published

March 10, 2026, 10:31 AM EDT

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Locations

Country

Hong Kong

Region

Primary Investigator

Name

Siu-Hang Kong

Affiliation

The Department of Early Childhood Education, The Education University of Hong Kong

Contact Primary Investigator

Other Primary Investigator(s)

PI Name

Ming Ming Chiu

PI Affiliation

Analytics/Assessment Research Centre, The Education University of Hong Kong

Contact Investigator

PI Name

William Choi

PI Affiliation

Speech and Music Perception Laboratory, The University of Hong Kong

Contact Investigator

PI Name

Alfredo Bautista

PI Affiliation

The Department of Early Childhood Education, The Education University of Hong Kong

Contact Investigator

Additional Trial Information

Status

On going

Start date

2026-02-28

End date

2026-04-26

Keywords

Behavior, Education

Additional Keywords

music intervention, kindergarten, randomised controlled trial, phonological awareness, tonal-rhythmic sensitivity, Cantonese, absolute pitch, Hong Kong

JEL code(s)

I21, C93

Secondary IDs

Prior work

This trial does not extend or rely on any prior RCTs.

Abstract

This study is a randomised controlled trial (RCT) evaluating the Music–Speech Integration Pedagogy (MSIP), a small-group aural-training programme designed to strengthen kindergarten children’s tonal and rhythmic sensitivity and to examine potential transfer to English phonological awareness. Forty Cantonese-speaking children aged 4–5 years in Hong Kong will be randomly assigned to either (a) MSIP (twelve 30-minute sessions delivered outside school hours in small groups) or (b) business-as-usual kindergarten music instruction. Children will complete individual assessments at pre-test, immediate post-test, and a delayed post-test (approximately two months later). Primary outcomes include tonal and rhythmic discrimination measured by the abbreviated Primary Measures of Music Audiation (aPMMA-T and aPMMA-R) and English phonological awareness measured by an English Phonological Awareness (EPA) task. An additional outcome is performance on an absolute pitch–like pitch identification task within one octave. The analysis plan will estimate intervention impacts using multilevel difference-in-differences models and will explore whether improvements in tonal or rhythmic sensitivity mediate changes in phonological awareness. Findings will inform the feasibility and efficacy of an integrated music–speech pedagogy in a tone-language context.

External Link(s)

Registration Citation

Citation

Bautista, Alfredo et al. 2026. "Investigating the impact of Music-Speech Integration Pedagogy (MSIP) on children’s music aptitude." AEA RCT Registry. March 10. https://doi.org/10.1257/rct.17895-1.0

Sponsors & Partners

Interventions

Intervention(s)

The treatment group receives the Music–Speech Integration Pedagogy (MSIP), a manualised small-group aural-training programme for kindergarten children. MSIP consists of twelve 30-minute sessions delivered outside school hours. Sessions emphasise (a) attentive listening, (b) tonal and rhythmic training using fixed-do solfège and rhythm syllables, and (c) imitation/repetition through call-and-response. Activities include body percussion, simple percussion (e.g., cabasa), movement to melodic contour, solfège echo-singing, and reproducing short pitch patterns on an electronic keyboard/organ. The control group continues with business-as-usual kindergarten music instruction.

Intervention (Hidden)

MSIP is delivered in small groups (approximately 3–5 children; target group size n≈5) by a single trained tutor (registered early childhood teacher with ABRSM Grade 6 Piano) using standardised lesson plans. The program includes 12 sessions (30 minutes each; total 6 hours). Each session follows a fixed sequence: (1) coordinated movement with fixed-do solfège (~5 min); (2) rhythmic imitation using body percussion, rhythm syllables, and cabasa (~5 min); (3) solfège “fast response” echoing blocks (~5 min; typically 16–24 tonal echo trials per block; three blocks per session); (4) song singing in solfège (~5 min); (5) instrumental reproduction on an electronic organ/keyboard (~5 min); (6) music-and-story pitch games (~3 min) and brief reflection (~1–2 min).
Materials use the white-key diatonic scale (C–D–E–F–G–A–B) in 12-tone equal temperament (A4 = 440Hz) and include children’s rhymes and short original melodies; assessment items are excluded from instruction to reduce confounding. Each session also includes 1–2 minutes of spoken-word chants synchronised with body percussion to map prosodic stress patterns onto musical meter (e.g., “TA-ta” vs “ta-TA”). Fidelity is supported by a per-session checklist; ~20% of sessions are audio/video recorded for independent adherence ratings (target ≥85% adherence). Attendance is recorded for each child.
Control condition: children continue with usual kindergarten music activities without MSIP materials.

Intervention Start Date

2026-03-07

Intervention End Date

2026-04-19

Primary Outcomes

Primary Outcomes (end points)

Primary outcomes are (1) tonal and rhythmic music aptitude measured by the abbreviated Primary Measures of Music Audiation (aPMMA-T and aPMMA-R; accuracy scores), and (2) English phonological awareness measured by the English Phonological Awareness (EPA) task (percent correct). Outcomes are assessed at pre-test, immediate post-test, and delayed post-test (~2 months after the intervention). The primary estimand is the between-group difference in change from pre-test to post-test (and to delayed post-test).

Primary Outcomes (explanation)

For each assessment wave (pre-test, post-test, delayed post-test), aPMMA-T and aPMMA-R are scored as the number (or proportion) of correct same/different judgments across 15 tonal items and 15 rhythmic items, respectively (accuracy-based scores). English phonological awareness (EPA) is scored as percent correct across 23 items total (6 syllable-deletion items and 17 onset-phoneme deletion items), with each item scored 1 (correct) or 0 (incorrect). Primary impact estimates focus on between-group differences in changes in these scores from pre-test to post-test and from pre-test to delayed post-test.

Secondary Outcomes

Secondary Outcomes (end points)

Secondary outcome: performance on an absolute pitch–like pitch identification task (APT) within one octave (C4, D4, E4, F4, F#4, G4, A4, B4). The outcome is accuracy (proportion correct) across test trials at each wave (pre, post, delayed).

Secondary Outcomes (explanation)

APT is administered using electronically generated instrumental tones (A4=440 Hz). After each tone is presented, the child identifies the corresponding fixed-do solfège label (do, re, mi, fa, fi, sol, la, ti). The score is computed as the proportion of correctly identified pitch categories across the administered trials at each assessment wave.

Experimental Design

A two-arm parallel-group randomised controlled trial (RCT) will evaluate the Music–Speech Integration Pedagogy (MSIP) for Cantonese-speaking kindergarten children aged 4–5 in Hong Kong. Eligible children will complete baseline (pre-test) assessments of music aptitude (abbreviated Primary Measures of Music Audiation: tonal and rhythm subtests, aPMMA-T/R), pitch identification within one octave (Absolute Pitch Test; APT), and English phonological awareness (EPA). Children will then be randomly assigned to either (i) MSIP small-group instruction (12 sessions × 30 minutes; total 6 hours) delivered outside school hours or (ii) a control condition that continues usual kindergarten music instruction (“business-as-usual”). Outcomes will be reassessed immediately after the intervention (post-test) and two months later (delayed post-test). The primary analysis will follow an intention-to-treat approach and estimate treatment effects as differential changes over time between groups.

Experimental Design Details

This is a prospective, pre-registered, two-arm, parallel-group RCT with repeated measures at three time points: pre-test, immediate post-test, and delayed post-test (two months post-intervention). The target population is Cantonese-speaking children aged 4–5 enrolled in Hong Kong kindergartens. Recruitment will use convenience sampling via kindergarten flyers and online channels. Parents complete screening on eligibility (age, Cantonese L1, current enrollment, no structured music training, no special educational needs per parent report) and demographics (parent education/income; child age/gender; kindergarten type). After baseline testing, participants are assigned to MSIP or control using a computer-generated random sequence with permuted blocks of variable size, stratified by age (4 vs 5) and gender, with allocation concealment via sequentially numbered opaque sealed envelopes opened after baseline completion.

Intervention delivery: MSIP consists of twelve 30-minute sessions, delivered in small groups of approximately 3–5 children (implementation aim n≈5) by a single trained tutor (Hong Kong registered ECE teacher with ABRSM Grade 6 Piano) using manualised lesson plans. MSIP integrates listening, tonal–rhythmic training (including fixed-do Solfège), imitation/repetition, movement, body percussion and cabasa work, and keyboard (Yamaha Electone) reproduction, plus brief chant-based speech-beat integration (nonsense syllables) intended to map speech prosody to musical meter while minimising vocabulary demands. Attendance is recorded to quantify dosage. Fidelity is monitored via per-session checklists and audio/video recording of 20% of sessions with independent adherence ratings (target ≥85% adherence).

Outcome measurement: Music outcomes include aPMMA tonal and rhythmic subtests (15 items each; yes/no same-different format) and APT (13-note sequence across one octave; Solfège identification under brown noise). Speech outcome is EPA percent accuracy from syllable deletion (6 items) and onset-phoneme deletion (17 items), adapted from prior work. Planned analyses include multilevel, multivariate difference-in-differences models to account for nested item/trial structure and repeated measures, with multiple imputation for missing data and sensitivity checks.

Randomization Method

Computer-based randomisation conducted by the research team using a computer-generated sequence with permuted blocks of variable size, stratified by child age (4 vs 5 years) and gender. Allocation is concealed using sequentially numbered, opaque, sealed envelopes opened only after baseline assessment completion.

Randomization Unit

Individual-level randomisation (each child is randomised to MSIP or control). MSIP is delivered in small groups (approximately 3–5 children per session group) for logistical delivery, but assignment occurs at the individual child level.

Was the treatment clustered?

Experiment Characteristics

Sample size: planned number of clusters

N/A – individual-level randomisation (0 clusters)

Sample size: planned number of observations

40 children (Cantonese-speaking, aged 4–5 years, Hong Kong).

Sample size (or number of clusters) by treatment arms

Planned allocation: 20 children in MSIP treatment; 20 children in control (business-as-usual).

Minimum detectable effect size for main outcomes (accounting for sample design and clustering)

Planned power analysis (G*Power) assumes a standardised mean difference of d = 0.50 for main outcomes, α = .05, power = 0.80, and 10% attrition. Under these assumptions, the minimum detectable effect size (MDE) is approximately 0.50 SD (standard deviation units) for primary outcomes such as changes in aPMMA-T/R and EPA from pre- to post-test between groups. The planned sample size is N = 40 (20 per arm) to maintain power given potential attrition (minimum required N≈28; 14 per arm).

Supporting Documents and Materials

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

IRB