Effect of summarizing scaffolding and textual cues on learning performance, mental model, and cognitive load in a virtual reality environment: An experimental study (2023)


Virtual reality (VR) is a digital environment that simulates highly realistic three-dimensional visual, auditory, tactile, olfactory, and interactive experiences (Di Natale et al., 2020), which increases students' positive feelings and facilitates their entire involvement in learning activities (Ustun, Karaoglan-Yilmaz, & Yilmaz, 2020). VR technology has increasingly been used in various disciplines, such as science (Boda & Brown, 2020; Huang, 2019; Parong & Mayer, 2018), history (Parong & Mayer, 2021), language (Acar & Cavas, 2020; Xie et al., 2019), biology (Pande et al., 2021), and medical education (Taggar, Foster, Li, & Bhagavath, 2022; Ustun, Karaoglan-Yilmaz, & Yilmaz, 2020), resulting in enlarged learning engagement, knowledge transfer, empathy, and learner agency (Calvert & Abadia, 2020; Shim, 2023; Villena-Taranilla et al., 2022; Xie et al., 2019). With the development of advanced technology, the immersion and interactivity of virtual reality have been greatly improved, yet it has not yielded desirable learning performance (Luo, Li, Feng, Yang, & Zuo, 2021; Wu et al., 2020). Several studies have argued that VR applications with low-cost head-mounted displays (HMDs) are more competitive in teaching practice (Geng et al., 2018; Vishwanath et al., 2017). Wu et al. (2020) and Luo et al. (2021) reached a similar conclusion after conducting a literature review and finding that the overall learning effect (Hedge's g) of immersion virtual reality (IVR) was smaller than that of desktop virtual reality (DVR).

The cognitive affective model of immersive learning (CAMIL) stated that “it is not the medium of IVR that causes more or less learning, but rather that which instructional strategy used in an IVR lesson” (Makransky & Petersen, 2021, p. 940). Summarizing, as a generative strategy, is commonly used in learner-oriented interventions and is considered as an effective instructional strategy. However, there is no consensus on its effectiveness in the IVR environment. The summarizing strategy requires learners to extract the main ideas from a lesson and make associations between related experience and existing knowledge with newly acquired knowledge stored in memory (Fiorella & Mayer, 2015). Existing studies have shown that generating a summary could enhance academic efficacy (Peterson & Roseth, 2016), students’ comprehension (Cordero-Ponce, 2000; Fiorella & Mayer, 2016), academic writing (Khazaal, 2019), and reading achievement (Hamida et al., 2012). However, most studies were conducted in traditional classrooms with non-immersive simulation interventions, while the effectiveness of VR-based teaching has been controversial. Parong and Mayer (2018) found that summarizing after VR instruction yielded positive impacts on science learning performance, while some scholars found that the summarizing strategy did not improve biology comprehension in colleges (Zhao et al., 2020). Additionally, most studies focused on summarizing strategies that were usually used with adult students (Parong & Mayer, 2018; Zhao et al., 2020). For young children who have difficulties identifying the main ideas in materials (Brod, 2021), whether summarizing strategies work for them has not been sufficiently investigated.

Applying visual cues in the presentation may guide learners’ attention to critical instructional material and enhance the effectiveness of summarizing strategies (Alpizar et al., 2020). However, visual cues may alleviate the pressure that young students to struggle to identify critical information on their own, as cues are utilized to guide learners to relevant information concerning the overall structure, simultaneously increasing essential processing (Mayer & Fiorella, 2014; Wang et al., 2020). Existing studies have investigated the positive effect of various cues, such as highlighting or underlining text in the document (Xie et al., 2019), using pointing gestures in 2D instructional videos (Pi et al., 2019), and adding visual cues in 3D animated videos (Huk et al., 2010). However, most research on cues was conducted with university students, whereas little research has been conducted with elementary school students (Alpizar et al., 2020; Zhang et al., 2023). Additionally, whether learners incur additional cognitive load in an IVR environment with added cues needs to be further explored.

Furthermore, most studies have evaluated VR-based instruction in terms of academic performance (e.g., recall, comprehension, retention, transfer), behavior, and emotion (Klingenberg et al., 2020; Luo et al., 2021), and few studies have focused on learners’ mental model. Here the mental model refers to a particular type of mental representation that can predict and explain phenomena and concepts in the world (Chen et al., 2015; Johnson-Laird, 1983; Lucas & Mai, 2022; Sharma, 2022) and that plays an important role in the development of learners' thinking. Moreover, the improvement in the mental model acquired in various simulated environments was inconsistent. For example, Chen et al. (2015) found that a dynamic 3D representation mode with spatial visualization rotation can significantly enhance learners' atomic orbital mental model, while the opposite finding was obtained in a 2D representation mode (Tuckey et al., 1991). It seems that whether learners' mental model can be improved in an immersive VR environment requires further research.

The purpose of this study was therefore to conduct a randomized experiment to explore the effectiveness of summarizing scaffolding and textual cues on learning performance, mental model, and cognitive load in a VR setting. This study also investigated the interaction effects of summarizing strategies and textual cues.

The participants were 152 fourth- and fifth-grade students (76 boys and 76 girls) from a suburban elementary school. The average age of the participants was 9.59 years (SD=0.49, min: 9, max: 10). Before the experiment, we obtained written informed consent from all the parents of the participants. The research protocol was approved by the Institutional Review Board (IRB). Before the experiment, informal interviews were conducted with science teachers, as well as with several randomly selected

Learning performance

The analysis of the prior knowledge test showed that there was no significant difference in prior knowledge among the four conditions (C1: M=11.21, SD=3.86; C2: M=12.03, SD=3.71; C3: M=11.57, SD=4.18; C4: M=12.79, SD=4.10; F (3,148)=1.133, p=.338). We further analyzed the total score, retention, and transfer determined from the performance test. As shown in Table 2, the factorial ANOVA results indicated significant main effects of textual cues in three dimensions: total


This study investigated the effects of textual cues and summarizing scaffolding on students' learning performance, mental model, and cognitive load in a virtual reality environment with a randomized experiment. The statistical results supported the effectiveness of textual cues and summarizing scaffolding, and revealed the interaction between the two strategies. The specific findings on the effects of the two strategies on learning are discussed in detail below.

WenHao Li: Conceptualization, Methodology, Writing – Original Draft, Funding acquisition. Qinna Feng: Formal Analysis, Investigation, Writing – Original Draft, Visualization. Xiya Zhu: Investigation, Formal Analysis, Data Curation. Qiuchen Yu: Writing – Reviewing and Editing, Qiyun Wang: Writing – Reviewing and Editing, Supervision.


This work was supported by the Collaborative Innovation Center for Informatization and Balanced Development of K-12 Education by MOE of China and Hubei Province. [grant number xtzd2021-003].

There is no potential conflict of interest.

