The impact of AI-generated technologies-driven digital cultural heritage platforms on users’ offline cultural participation intentions

admin

7 months ago

The impact of AI-generated technologies-driven digital cultural heritage platforms on users’ offline cultural participation intentions

Table of Contents

Theoretical foundation and research framework

This study systematically elucidates the underlying mechanisms driving user experiences toward offline cultural participation behaviors within AIGC-enabled platforms. It integrates complementary perspectives in literature with theoretical construction. On the one hand, the research focuses on the application pathways and contextual characteristics of AIGC in the digital cultural heritage domain. It identifies core challenges and emerging trends related to digital preservation, content presentation, and ethical governance. On the other hand, the Net Valence Model (NVM) serves as the theoretical foundation, explaining the psychological trade-offs users experience between perceived benefits and perceived risks during technology adoption, which subsequently influences their behavioral judgments and participation intentions. The synthesis of these two perspectives provides a robust theoretical basis for developing a user-behavior model centered on benefit–risk perception mechanisms.

Application pathways and challenges of AI-generated technologies in digital cultural heritage

At the international level, AI-generated technologies have already given rise to diverse applications in the field of cultural heritage. For example, in the digital restoration of Dwelling in the Fuchun Mountains, research teams employed Generative Adversarial Networks (GANs) and Diffusion Models to complete missing sections and generate stylistically consistent continuations. By learning Huang Gongwang’s brushstroke characteristics and landscape painting style, the model can generate transitional landscape structures, thereby achieving a “digital recomposition” of the fragmented scroll and opening a new pathway for the digital regeneration of ancient paintings (see Fig. 1). Meanwhile, The Digital Palace Museum: Duobao Pavilion platform integrates more than 600 artifacts into a mobile mini-program. Users can browse decorative motifs and generate personalized “cultural footprint maps” and exclusive posters through AI algorithms. It demonstrates the platform’s capacity for interactive storytelling and personalized content generation (see Fig. 2). At the National Museum of Korea, the exhibition Animals in Old Paintings Come Alive allows visitors to select animals from traditional paintings via mobile devices and project them onto large screens. Powered by AI generative models, the system produces dynamic figures via touch-based interaction, creating an immersive “human–AI co-creation” experience that transcends the limitations of traditional static displays (see Fig. 3).

These cases illustrate that AI-generated technologies have expanded from digital restoration to personalized interaction and immersive presentation. It has opened new avenues for preserving and disseminating cultural heritage, while providing an important contextual reference for the subsequent analysis of the Cloud Tour of Dunhuang platform in this study.

Against this backdrop, scholars have conceptualized such systems as digital cultural heritage platforms—comprehensive systems that integrate content generation, resource display, and user interaction, typically in the form of web-based platforms and mobile applications¹⁸. With the advancements in AIGC technologies, an increasing number of platforms have incorporated image-generation and semantic-interaction modules to enable digital preservation, broaden dissemination, and enhance public engagement¹⁹. Their core functions can be broadly summarized into three categories: (1) Content Generator: leveraging large language models, image synthesis networks, and semantic reconstruction techniques to produce cultural texts, images, and situational content; (2) Narrative Constructor: enhancing the intelligibility and immersiveness of cultural heritage through multimodal explanations, virtual tours, and scene-based storytelling; and (3) Virtual Guide or AI Successor: employing natural language generation and multi-turn dialog systems to facilitate personalized exploration and context co-creation. These functions span the entire process of digital preservation, presentation, and interaction, constituting the fundamental mechanisms of AIGC cultural platforms.

Firstly, in the domain of digital preservation, AIGC technologies utilize generative models and linguistic restoration techniques to document and semantically enrich traditional stories, architectural ornamentation, artifact imagery, and craft processes²⁰. This process fills existing gaps in historical documentation and enhances readability and intergenerational transmission by utilizing contemporary language and multimodal representations²¹. However, current practices often exhibit fragmented characteristics, predominantly focusing on the generation of isolated cultural elements without systematically reconstructing the overarching cultural context²². As a result, such generated content frequently struggles to integrate into established cultural narratives, thus limiting its efficacy in sustaining cultural value and ensuring long-term preservation.

In terms of content presentation, AIGC integrates natural language and image generation technologies to provide platforms with personalized interpretive texts, immersive audiovisual content, and interactive 3D environments²³. Machidon et al.²⁴ demonstrated that users engaging in virtual interactions with AI characters can access more contextually rich cultural experiences, thereby deepening their understanding of cultural backgrounds and enhancing emotional resonance.

However, despite the growing diversity of presentation formats, existing research remains primarily focused on visual outputs and technical implementation, with limited attention to the underlying cultural narrative structures and user psychological mechanisms. This technological bias undermines the explanatory power and contextual expressiveness of AIGC in cultural communication, thereby constraining its potential to foster a holistic understanding of users and to stimulate effective offline cultural participation.

In terms of user interaction, AIGC technologies—through natural language generation (NLG), multi-turn dialog systems, and virtual agents^25,26—have introduced a novel “dialog–response–guidance” mechanism to cultural heritage platforms, enabling users to engage more actively in exploring cultural content²¹. However, two key limitations persist in current systems. First, the underlying interaction logic is still largely rule and template-based, lacking a deep semantic and emotional understanding, which often results in repetitive and mechanical content generation⁷. Second, cultural adaptability remains limited, as AI systems struggle to tailor their outputs dynamically based on users’ cultural backgrounds, linguistic preferences, or contextual nuances²⁷. These challenges diminish the potential of AIGC to contribute meaningfully to cultural education and foster psychological resonance with heritage content.

Overall, AIGC demonstrates multifaceted advantages within digital cultural heritage platforms. Regarding digital preservation, it facilitates the semantic restoration and content expansion of cultural elements; in content presentation, it enhances narrative expression and contextual reconstruction; and in user interaction, it establishes mechanisms of identity resonance through semantic generation and dynamic guidance.

In contrast, traditional platforms often adopt “digital archive” or “virtual exhibition” models that are expert-driven and content-predefined, with linear dissemination pathways that position users as passive recipients, thereby limiting opportunities for deep engagement and personalized meaning-making²⁸. AIGC, by comparison, can generate personalized content based on users’ interests and behavioral trajectories, while enabling dynamic co-construction of cultural meaning through interactive and context-sensitive adaptations. This user-centric adaptability enables cultural expressions to vary by individual and by need, highlighting the structural advantages and irreplaceable role of AIGC in reshaping the pathways of cultural communication²⁹.

Despite its growing promise, the development of AIGC also faces profound challenges. On one hand, to enable immersive experiences and personalized recommendations, platforms must rely heavily on users’ behavioral and physiological data to train models. However, the lack of transparency in data collection mechanisms raises serious concerns about privacy breaches, algorithmic manipulation, and emotional interference, ultimately undermining user trust^30,31. On the other hand, AIGC is constrained by the limitations of training corpora and model architectures, often resulting in homogenized cultural expression, symbolic formalism, and semantic superficiality—all of which compromise the authenticity and diversity of cultural representation³². Therefore, future optimization of AIGC platforms should place greater emphasis on enhancing users’ cultural understanding, trust, and willingness to co-create. It is essential to improve content quality while safeguarding user privacy, reinforcing cultural identity, and promoting active participation.

Applicability and extensibility of the Net Valence Model (NVM) in user perception research

The theoretical foundation of the Net Valence Model (NVM) can be traced back to the concept of “valence” proposed by Lewin et al.¹⁵, which posits that individual behavior results from a dynamic trade-off between positive and negative motivational forces. Building on this idea, Fishbein¹⁶ developed the “expectancy–valence model,” arguing that behavioral intention arises from individuals’ evaluations of the anticipated outcomes—both positive and negative—weighted by their subjective importance. The core proposition suggests that when perceived benefits outweigh the costs or risks associated with an action, individuals are more likely to exhibit a willingness to engage in that behavior¹⁶.

The Net Valence Model (NVM) suggests that individual behavior emerges from a comprehensive judgment formed through the dynamic interplay between positive drivers (e.g., anticipated benefits) and negative inhibitors (e.g., potential risks). Users typically seek a psychological equilibrium between maximizing gains and minimizing risks, thereby developing a clear behavioral tendency³³. NVM demonstrates strong theoretical flexibility and cross-context applicability, particularly in technology environments characterized by high uncertainty and cognitive complexity^34,35.

In AIGC-enabled digital cultural heritage platforms, users may simultaneously encounter positive experiences—such as cultural re-cognition, virtual storytelling, and immersive guided tours—and potential risks, including privacy breaches, cultural misinterpretation, ambiguous content authenticity, and algorithmic manipulation. These environments are often characterized by information opacity and the inherent incomprehensibility of generative mechanisms³⁶, as well as the sensitivity and ethical controversy surrounding cultural expression³⁷. As a result, users face highly dynamic and cognitively ambiguous decision-making contexts in which behavioral responses tend to be nonlinear, emotionally driven, and sometimes deviate from rational expectations.

Traditional rational behavior theories, such as the Theory of Reasoned Action (TRA) and the Theory of Planned Behavior (TPB), are built on the assumption that individuals are rational decision-makers. These models suggest that behavioral intentions are linear and predictable, primarily driven by cognitive factors such as attitudes, subjective norms, and perceived behavioral control^9,10,38. These models are effective in structured environments with minimal external interference. However, they cannot explain user behavior and psychological responses in the high-uncertainty contexts where perceived benefits and risks are deeply intertwined.

In contrast, the Net Valence Model (NVM) highlights the psychological mechanism by which users dynamically weigh perceived benefits against potential risks. Its core pathway—“Perceived Benefits → Perceived Risks → Behavioral Intention”— is built upon the Technology Acceptance Model’s (TAM) emphasis on perception-based variables and offers considerable theoretical extensibility. NVM can integrate a wide range of influencing factors, including emotional responses, value conflicts, and social norms. It serves as a robust dual-path framework for explaining users’ psychological trade-offs and behavioral decision-making processes.

In subsequent research, Li et al.³⁵ structurally extended the Net Valence Model (NVM) and applied it to the social media context to examine users’ psychological trade-offs in seeking and sharing health information. Their findings validated the applicability of NVM in scenarios characterized by perceived risks and benefits. NVM has been increasingly applied across a range of emerging technology domains, including autonomous driving, online healthcare, and social networking platforms^39,40,41. It demonstrates notable theoretical advantages in capturing the psychological conflicts, risk evaluations, and nonlinear decision-making pathways underlying user behavior.

Regarding perceived benefits, users’ perceived positive value brought by a technology, product, or service is one of the core driving forces behind behavioral decision-making. Prior studies have shown that such benefit perceptions are often closely linked to the system’s functionality, creativity, and interactivity⁴². Schreier et al.⁴³ found that creative design and content innovation significantly enhance users’ acceptance and willingness to engage with a product. Building on this, Yang et al.⁴⁴ further argued that when confronted with culturally rich content with high cognitive demands and strong emotional expectations, users tend to rely more heavily on creative visual expressions and interactive narrative mechanisms to achieve contextual understanding and esthetic engagement. As a result, users become sensitive to a platform’s creative and interactive features, which in turn amplify the role of perceived benefits in motivating usage and participation.

Correspondingly, perceived risk focuses on users’ systematic evaluations of the potential negative impacts associated with a given technology. Wang et al.⁴⁵ identified privacy concerns, system reliability, and psychological discomfort as key sources of perceived risk in digital applications in their technology risk model. Eslami et al.⁴⁶ further revealed that opacity in data processing and algorithmic bias significantly undermine users’ trust in digital systems. In the context of AIGC, Tsvetkova et al.⁴⁷ emphasized that algorithmic misinterpretation of semantics and the misuse of cultural symbols during content generation can trigger user concerns about authenticity and cultural appropriateness, thereby dampening their willingness to engage in offline cultural activities. The lack of cultural sensitivity has thus emerged as a critical barrier to user behavioral transformation.

Moreover, a significant trade-off exists between perceived benefits and perceived risks. Empirical research by Martins et al.⁴⁸ suggests that users are more likely to accept or adopt a technology when its positive value substantially outweighs its potential risks. Conversely, when risk perception becomes dominant, users often exhibit avoidance or resistance behaviors³⁴. Therefore, a key challenge for AIGC platform development and cultural communication design lies in enhancing/improving perceived benefits—through creative design, narrative strategies, and emotional engagement—while simultaneously mitigating user concerns related to privacy, authenticity, and cognitive overload.

Building on this research logic, this study developed an extended influence mechanism model under the framework of the Net Valence Model (NVM), tailored to the contextual characteristics of digital cultural heritage dissemination. The model incorporates both perceived benefits and perceived risks, encompassing four benefit-related dimensions—creative design, creative content, narrative design, and entertainment experience—and three risk-related dimensions—privacy concerns, ethical considerations, and negative psychological responses. Based on this structure, corresponding hypotheses are proposed to systematically examine how users’ perceived trade-offs on AIGC-enabled cultural heritage platforms influence their willingness to engage in offline cultural experiences.

The relationship between creative design/content and perceived benefits

In the context of AIGC, creative design typically refers to innovative expression forms enabled by artificial intelligence technologies, encompassing virtual scene construction, dynamic interactive presentation, and immersive storytelling⁴⁹. Im et al.⁵⁰ argue that highly novel creative designs can significantly enhance users’ perceived value of a product or service, particularly in high technological complexity scenarios where such designs are more likely to capture user attention and stimulate exploratory interest. Moon and Han⁵¹ further emphasize that creative design is not limited to technical breakthroughs; it also involves personalizing and diversifying the user experience, thereby strengthening immersion and interactive engagement.

As a subjective evaluation criterion for the positive value provided by a technological system, perceived benefits typically encompass multiple dimensions, such as functionality, emotional value, and social value. Noble and Kumar⁵² found that creative design can significantly enhance users’ overall perception of functional value and emotional benefits, thereby improving their overall satisfaction. Similarly, Huang et al.⁵³ noted that creative visual content and narrative approaches strengthen users’ perceived benefits and foster their acceptance of digital cultural heritage platforms and their inclination toward cultural participation. Accordingly, this study proposes the following hypothesis:

H1: Creative design significantly positively affects perceived benefits.

H2: Creative content significantly positively affects perceived benefits.

The relationship between recreation experience and perceived benefits

Entertainment experience generally refers to the pleasure and satisfaction users derive during interactions with a system or its content, and it represents an important psychological factor influencing their subjective evaluations and overall user experience⁵⁴. Prior studies have demonstrated that in contexts such as cultural communication, tourism, and education, entertainment elements not only enhance user engagement and emotional experience but also strengthen the perceived value of service systems^55,56. The entertainment experience enhances users’ sense of immersion and cognitive engagement through dynamic interaction, contextualized storytelling, and visual design, thereby improving their perceptions of content quality as well as their overall evaluation of platform value. Tan and Chou⁵⁷ further emphasized that entertaining interfaces and interaction mechanisms help mitigate users’ perceived technological complexity, ultimately improving both their usage experience and perceived benefits. Accordingly, this study proposes the following hypothesis:

H3: Entertainment experience significantly positively affects perceived benefits.

The relationship between narrative design and perceived benefits

In AIGC-generated digital cultural heritage content, narrative is regarded as a design approach that integrates cultural resources with contemporary contexts in a dynamic and layered manner, aiming to enhance users’ emotional resonance and understanding⁵⁸. Compared to traditional modes of information presentation, AIGC-driven narratives not only generate visually structured content but also reconstruct the historical backgrounds, traditional customs, and cultural stories associated with heritage, thereby enriching the cultural connotations and expressive intensity of the content⁵⁹. This narrative-oriented content generation approach enhances the educational value and emotional warmth of cultural information, significantly improving users’ perceived benefits. Gu et al.⁶⁰ noted that narrative-based interactive design effectively reduces users’ cognitive load and improves their satisfaction and experience of the technology. Accordingly, this study proposes the following hypothesis:

H4: Narrative design significantly positively affects users’ perceived benefits.

The relationship between privacy concerns and perceived risks

Privacy and security concerns are key challenges affecting users’ acceptance of AIGC-driven digital heritage content. Studies have shown that users often adopt a cautious and even resistant attitude toward technologies involving the collection and use of personal data⁶¹. AIGC platforms typically process large volumes of personal information for content recommendation and targeted marketing purposes, including camera surveillance, geolocation, biometric, and behavioral data⁶². These mechanisms can enhance the personalization of cultural experiences. However, the opacity of data handling practices and the potential for information leakage can trigger users’ privacy-related anxiety, with possible severe consequences such as identity theft⁶³. The absence of robust privacy protection mechanisms undermines users’ trust in the platform and significantly increases their perceived risk toward AIGC cultural content, thereby reducing their willingness to engage in cultural participation. Based on this, the following hypothesis is proposed:

H5: Privacy and security concerns significantly positively affect users’ perceived risks.

The relationship between ethical concerns and perceived risks

AIGC-driven digital cultural heritage content offers users personalized and immersive cultural experiences. However, its technical design and content generation processes still involve multiple ethical risks, particularly regarding the authenticity, fairness, and semantic appropriateness of cultural expression⁶⁴. In selecting and reproducing cultural symbols, AIGC often overlooks the values and cultural backgrounds of specific groups, resulting in partial, stereotypical, or even distorted representations⁶⁵. Moreover, content recommendation mechanisms are typically optimized based on mainstream user preferences, thereby marginalizing the needs and niche cultures of underrepresented communities, fostering a perceived sense of cultural exclusion.

Algorithmic personalization based on user preferences enhances the relevance of content. However, it can intensify “information blind spots,” making it difficult for users to access diverse cultural materials and potentially directing them toward homogenized experiential paths⁶⁶. This erosion of cultural choice and information transparency can undermine user trust and significantly increase the perceived risk associated with the platform. Based on this, the following hypothesis is proposed:

H6: Ethical concerns significantly positively affect perceived risks.

The relationship between negative psychological responses and perceived risks

AIGC-driven digital heritage content offers users a more immersive and expressive cultural experience. However, its generative mechanisms may trigger potential negative psychological responses that influence users’ perceived risk. In reconstructing virtual cultural scenes, AIGC systems often idealize representations of historical events, ritual practices, or cultural spaces, resulting in visually refined content that may deviate from real-world contexts⁶⁷. This “perceptual distortion” can create a sense of detachment and raise doubts about authenticity, especially among users who seek genuine cultural understanding. For such users, the imbalance between expectation and representation may lead to disappointment, confusion, or even feelings of alienation⁶⁸.

Moreover, AIGC cultural content often emphasizes the production of visual or informational elements while neglecting the social context and emotional interactions between the user and the heritage itself. This may lead to cognitive disengagement and emotional desensitization⁶⁹, weakening users’ holistic understanding of cultural heritage and diminishing their emotional connection to it. Such psychological dissonance and cultural detachment can intensify user skepticism regarding the platform’s authenticity and reliability, constituting a major source of perceived risk. Based on this, the following hypothesis is proposed:

H7: Negative psychological responses significantly positively affect perceived risks.

The relationship between perceived benefits/risks and users’ AIGC experience behaviors

Perceived benefits refer to the positive value users gain from engaging with AIGC. They are manifested in the convenience of information access, enhanced cultural identity, and immersive entertainment experiences. Katifori et al.⁷⁰ noted that AIGC facilitates users’ intuitive understanding of the cultural context behind heritage through dynamic narratives and creative visual representations. It increases satisfaction with the platform and strengthens users’ recognition of cultural value, ultimately enhancing their willingness to engage in online experiences.

In contrast, perceived risk reflects users’ concerns about the potential negative consequences of AI-generated content, primarily including issues such as privacy leakage, distortion of cultural expression, and algorithmic bias⁶⁸. Users may experience cognitive dissonance and emotional disengagement when content becomes overly formulaic or detached from authentic cultural contexts, which in turn undermines trust and reduces their behavioral intention⁷¹. From a cognitive trade-off perspective, the relative strength of perceived benefits and perceived risks jointly determines whether users are willing to accept the cultural content and extended experiences offered by AIGC platforms.

Users’ actual experiences on AIGC platforms not only influence their perceptions and satisfaction with digital cultural content but may also extend through multiple psychological mechanisms to translate into offline cultural participation. First, immersive narratives and cultural interaction experiences can stimulate users’ positive emotions and learning interests, which in turn translate into motivational drivers for offline cultural behaviors^72,73. Second, the knowledge gains and cultural value recognition derived from online experiences often influence users’ real-world behavioral intentions through motivational extension mechanisms, manifesting as a tendency toward real-world cultural exploration driven by virtual experiences⁷⁴. At the same time, prior studies have pointed out that although virtual cultural experiences can effectively stimulate interest, their limitations in authenticity and completeness frequently trigger users’ compensatory psychological needs, prompting them to seek more authentic offline cultural experiences⁷⁵. Based on this reasoning, the following hypothesis is proposed:

H8: Perceived benefits significantly positively affect users’ willingness to engage with AIGC platforms. H9: Perceived risks significantly negatively affect users’ willingness to engage with AIGC platforms. H10: Perceived benefits have a significant positive effect on users’ offline cultural participation intentions. H11: Perceived risks have a significant negative effect on users’ offline cultural participation intentions. H12: Users’ actual experiences on AIGC platforms have a significant positive effect on their intentions for offline cultural participation.

This study constructs a user experience behavior model for AIGC-enabled cultural heritage platforms by incorporating six core variables—creative design, creative content, entertainment experience, narrative design, privacy concerns, ethical considerations, and negative psychological responses—and introducing perceived benefits and perceived risks as antecedent factors. The model systematically investigates the transformation mechanism from online experience to offline participation (see Fig. 4).

Case Study: The “Cloud Tour of Dunhuang” mobile platform of the Dunhuang Museum

This study adopts the Cloud Tour of Dunhuang digital cultural platform, jointly developed by the Dunhuang Academy and Tencent, as the empirical case, with its mobile application serving as the subject for user surveys and behavioral pathway analysis (see Figs. 5–13). Drawing on the mural resources of the Mogao Caves, a UNESCO World Heritage site, the platform establishes a digital communication system that integrates mobile interaction, virtual roaming, and AI-generated technologies.

**Fig. 5: a–e “Explore Dunhuang – Mirror-Seeking Module”.**

In terms of operation, users can freely navigate the cave spaces via a virtual map. By clicking on specific mural nodes, they can access textual guides, image restorations, and audio commentaries. They may further enter dynamic narrative videos or interactive Questions and Answers modules corresponding to the selected murals (Fig. 5a–d). In addition, the platform incorporates “digital cave tasks” and a content-sharing recommendation mechanism, guiding users along a “discovery–reflection–sharing” pathway that progressively deepens their cultural experience (Fig. 5e).

In the Digital Library Cave module, the platform achieves millimeter-level 1:1 precision replication, faithfully reconstructing the murals, sculptures, and artifact details of the Mogao Caves’ “Three-Story Building” and Cave 17. From color and material to weathering traces, the reproduction closely mirrors the real site, delivering an ultra-realistic museum experience. This was made possible through more than 30,000 multi-angle captured images and an ultra-detailed 3D model comprising over 900 million polygons, further enhanced by AI-based super-resolution and material recognition algorithms to improve clarity and texture representation (Fig. 6).

It is worth emphasizing that the platform’s content structure comprises two categories of elements: manually preset components and AI-generated components. The manually pre-set elements mainly include the identity settings of NPCs (e.g., historical prototypes such as Master Hongbian or Taoist Wang), the overarching narrative framework (e.g., the historical timeline from the late Tang to the Northern Song to the late Qing), cave numbering, and the basic spatial topology. These elements are predefined by Dunhuang studies experts and the development team to ensure the accuracy of historical narratives and the reliability of cultural connotations.

The AI-generated elements encompass mural restoration, spatial reconstruction, lighting rendering, NPC dynamic performance, and classical-text Questions and Answers. Their implementation relies on generative AI techniques (e.g., GANs, diffusion models, speech synthesis, and retrieval-augmented generation). It produces dynamic and differentiated results that are distinguishable from manually preset content.

Regarding mural restoration, the platform applies AI-based completion and generation technologies. Based on the annotations and scholarly validation provided by Dunhuang experts, GAN models employ adversarial training between generators and discriminators to learn mural brushstrokes, pigment textures, and local structural logics, thereby inferring possible textures and details in missing areas. Diffusion models, by contrast, adopt an iterative noise-adding and denoising inversion mechanism to generate smooth and continuous transitions of color and brushwork, making them particularly suitable for high-fidelity restoration of large-scale damaged regions (Fig. 7). By integrating expert knowledge with AI-based content generation models, the platform achieves authentic restoration of heritage remains and ensures consistency of user perception and an enhanced sense of immersion.

In the overall spatial reconstruction stage, the platform integrates physics-constrained generative models with AI-based content completion techniques to simulate the stacked scene of more than 60,000 scrolls stored in the cave a century ago. The former employs finite element mechanical simulation and fiber fracture modeling to generate natural scroll forms and wear features; the latter, in the absence of complete image records, uses AI generative models to automatically produce local decorative and textural details of the scrolls, thereby ensuring both physical plausibility in geometric form and visual completeness in surface texture. At the same time, the system incorporates a stacking logic inference mechanism, which applies constraint-based 3D arrangement algorithms to deduce the spatial relationships among scrolls (e.g., stacking order, contact and occlusion, center-of-gravity shifts). This ensures that the restoration results are visually realistic and are consistent with the physical plausibility of the historical on-site stacking state (see Fig. 8). As a result, during immersive browsing, users can perceive an authentic approximation of the “original stacking” from a century ago and experience a digitally reconstructed environment imbued with greater narrative tension and historical depth.

In the virtual scene rendering stage, the platform integrates physically based rendering (PBR), global dynamic illumination, and deep learning–driven lighting generation models. PBR rendering employs physics-level modeling of material reflectivity, roughness, and normal maps, enabling the lime plaster base of murals, pigment layers, and sculpture surfaces to exhibit realistic refraction and diffuse reflection effects under virtual light sources. Global illumination further simulates the reflection and refraction paths among multiple light sources, ensuring unified and natural light–shadow relationships throughout the cave space. Building on this, the platform incorporates generative lighting modeling, which takes users’ real-time perspectives as input and dynamically produces realistically distributed rays and shadows using deep learning models. An adaptive compensation mechanism is then applied to optimize brightness gradients in dark corners and corridor areas. This fusion of physics-based modeling and AI-driven lighting generation not only overcomes the limitations of on-site visits characterized by “localized lighting and restricted visibility” but also significantly enhances the fidelity and continuity of lighting details in high-dynamic-range (HDR) environments. It provides users with a clearer, more complete, and immersive virtual exhibition experience (see Figs. 9, 10).

In terms of character and narrative design, users can assume the role of a “Guardian of the Digital Library Cave.” They can select one of six virtual characters and “travel” through different historical periods, including the late Tang, Northern Song, and late Qing dynasties. During this process, users interact with NPCs (e.g., Master Hongbian, Monk Daozhen, and Taoist Wang) who connect the historical narratives of cave excavation, artifact dispersal, and cultural rediscovery. By leveraging AI-based facial animation and speech generation models, the platform enables NPCs’ expressions, voices, and narrative dialogs to appear more natural and fluid, thereby creating a dynamic and humanized cultural experience within the pre-set historical framework (see Fig. 11). Compared with traditional static exhibitions, this AI-generated narrative pathway and character performance significantly enhance users’ sense of immersion and presence.

For heritage sites lacking image records (such as Sanjie Monastery), the platform first employs AI-based semantic modeling to transform historical features annotated by Dunhuang studies experts—such as the proportional layout of pagodas in Five Dynasties–Song temples, the arrangement of monks’ quarters, and the structure of stables—into computable parameter constraints. Next, style transfer and material generation algorithms are applied to unify styles and enhance realism in details such as brick-and-stone textures, wooden structures, and painted eaves, ensuring that the overall appearance aligns with historical authenticity. Building on this, the platform further leverages AI generative models to automatically infer and reconstruct missing architectural structures under semantic constraints and expert validation, thereby producing a complete three-dimensional spatial scene. Through this “AI semantic modeling → generative completion → style optimization” technical pipeline, users in virtual roaming can directly “see” the reconstructed entirety of Sanjie Monastery based on scholarly inference and experience a stronger sense of historical atmosphere and immersion conveyed through AI-generated architectural details (see Fig. 12).

In the ancient book exhibition module, the platform incorporates a large-model–driven Retrieval-Augmented Generation (RAG) system. Users can directly pose questions such as “What is the Diamond Sutra about?” or “When was the Library Cave discovered?” The system performs keyword retrieval and semantic matching within its knowledge base, after which the large model generates concise summaries or multilingual responses, along with background information and key ideas. This transforms ancient texts from being merely “visible” into becoming “comprehensible and interactive” knowledge experiences (see Fig. 13).

In addition, the platform is designed with multi-layered interaction and dissemination mechanisms. First, through modules such as “character tasks,” mural tours, and knowledge Questions and Answers, it enables pathway-based and contextualized immersive interaction. Second, it supports features such as likes, favorites, and multimodal navigation, which enhance the personalized processing of cultural information. Third, it integrates social sharing functions, transforming individual experiences into collective dissemination and thereby reinforcing external resonance and diffusion effects of cultural identity.

Currently, common AI-generated platforms can be roughly categorized into two types. The first are image recognition–driven platforms, typically exemplified by certain museum apps, where users can scan artifacts or exhibits to trigger 3D reconstructions or text-and-image explanations⁷⁶. These platforms excel at enhancing visual expression and providing rapid feedback, but their generative logic typically relies on preset scripts or template invocation. As a result, they lack dynamic responses to users’ semantic inputs and struggle to construct coherent narratives or deep cultural understanding. The second are text-generation–driven platforms, which rely on AI text generation and semantic Q&A as their core functions. Users can obtain explanations or knowledge responses by posing natural language questions⁷⁷. While suitable for information-retrieval scenarios, this mechanism generally offers limited interaction formats, and the generated content tends to be disconnected from users’ behavioral pathways, making it difficult to evoke contextual engagement or cultural identity.

Compared to these two types of platforms, Cloud Tour of Dunhuang demonstrates an integrated advantage of narrative-fusion and game-based mechanisms. In terms of cultural narrative depth, structural consistency of generative logic, and integrity of immersive pathways, it shows higher system integration capacity while highlighting distinct gamified features. By fusing multimodal content (images, texts, sound effects, and interactive commands), the platform creates an RPG-like immersive experience in which users gain not only knowledge but also situational enjoyment throughout the process of “exploration–learning–creation.” Through the organic integration of narrative logic, gamified mechanisms, and AI content generation technologies, Cloud Tour of Dunhuang constructs a layered and immersive cultural experience environment. This differentiated interaction mechanism provides a solid practical foundation and theoretical support for this study. It also enhances the observability and verifiability of the transformation from virtual experience to real-world participation.

Questionnaire design

To ensure that the measurement indicators are closely aligned with the research context, this study designed the questionnaire around the Cloud Tour of Dunhuang platform. Based on the AIGC-generated content and user interaction modes embedded in its core functional modules, we developed measurement indicators covering dimensions such as creative design, narrative design, ethical concerns, and platform experience. The platform presents diverse forms of AIGC-generated content, including image restoration, textual narration, voice-guided tours, and interactive tasks. Its generative mechanisms can be broadly categorized into three types: fully AI-automated generation, human–AI collaborative generation, and platform algorithm-assisted generation (see Table 1).

Table 1 Constructs and Measurement Items

The survey employed in this study consists of 25 items. To enhance content validity, each latent variable was measured using multiple indicators. All measurement items were adapted from well-established scales in existing literature and translated into Chinese using the back-translation method (see Table 1). Creative design and creative content were measured using four items adapted from scales developed by Bloch and Zhou et al.^78,79. The Entertainment experience was measured using two items from Holbrook and Hirschman⁸⁰ and narrative design with two items adapted from Eacalas⁸¹. Privacy concerns were measured using two items from Malhotra et al.⁸², while ethical concerns were based on two items from Hunt and Vitell⁸³. Negative psychological responses were measured using two items from Watson et al.⁸⁴ Perceived benefits and perceived risks were assessed using six items adapted from Li et al.³⁵. Online platform experience was measured using two items from Witmer and Singer⁸⁵, and offline cultural participation intention was measured using three items adapted from Ajzen¹⁰.

The use of multi-item measures was intended to overcome the limitations of single-item indicators and to capture the core constructs of each latent variable. All items were rated on a five-point Likert scale (1 = strongly disagree, 5 = strongly agree).

This study struck a balance between theoretical rigor and model compatibility in the design of measurement items. Although several latent variables were measured using only two items, prior research has shown that dual-item measures can yield stable and valid constructs when the theoretical dimensions are clearly defined and the measurement objectives are specific. Drolet and Morrison⁸⁶ argue that increasing the number of items provides limited incremental information and may induce “mechanical responding” and higher inter-item error correlations, ultimately compromising data quality. Similarly, Leslie A Hayduk⁸⁷ emphasizes that when constructs are well-defined, one or two high-quality indicators are sufficient for latent variable modeling, whereas excessive redundancy can lead to model instability and reduced explanatory power.

Based on these theoretical insights, dual-item designs were adopted for certain variables for two main reasons. First, the underlying dimensions are conceptually focused, and the measurement objectives are unambiguous, allowing two indicators to capture the core construct. Second, limiting the number of items helps reduce the cognitive burden on respondents, thereby improving data quality and enhancing the stability of path estimation within the model.

Questionnaire distribution and data collection

This study adopted a combination of purposive sampling and snowball sampling to test the research hypotheses. The questionnaires were distributed through both online and offline channels. To ensure that the survey content closely reflected actual usage scenarios, all participants were guided by the research team to engage in a 10–15-minute experience with the “Cloud Tour of Dunhuang” platform prior to completing the questionnaire. This pre-survey interaction was designed to establish a basic understanding of the platform’s operations and interactive features.

Given the study’s focus on AIGC-driven digital cultural heritage experiences, the sample was inevitably concentrated within specific interest groups and social networks, raising the potential risk of structural bias. To mitigate self-selection bias, the study employed a diversified seed-user recruitment strategy, including university students, cultural heritage enthusiasts, and users from diverse professional backgrounds. This research also used a social media forwarding mechanism to expand outreach and enhance sample diversity across age groups, occupational categories, and geographic regions.

A total of 1073 questionnaires were collected. After excluding invalid responses—such as those completed in under 90 seconds, with highly repetitive answer patterns, or logical inconsistencies—986 valid responses were retained, meeting the basic sample size requirements for Structural Equation Modeling (SEM–ANN) analysis (see Table 2).

Table 2 Descriptive statistics of participant demographics

According to statistical results from SPSS 27.0, respondents aged 18–32 accounted for 61.3%, constituting the primary demographic group. Additionally, 85.5% of participants held an associate degree or higher. Although the sample is somewhat concentrated in terms of age and education level, this reflects the digital literacy and cultural engagement capacity of the core user base targeted by AIGC heritage platforms. Therefore, this structural composition is both appropriate and purposeful for this study.

Reliability and validity assessment

Based on 986 valid survey responses, this study conducted reliability and validity tests for 11 latent variables and their 25 measurement items using SPSS 27.0. The results showed that all variables had Cronbach’s α coefficients exceeding 0.70, indicating good internal consistency of the scales (see Table 3). Confirmatory Factor Analysis (CFA) further demonstrated that the Average Variance Extracted (AVE) for each latent construct was above 0.50 and the Composite Reliability (CR) exceeded 0.70, meeting the convergent validity criteria proposed by Fornell and Larcker⁸⁸. These results confirm that the measurement model possesses satisfactory convergent validity.

Table 3 Relationships between observed variables and latent constructs

In addition, Pearson correlation analysis (see Table 4) revealed that the relationships among the variables were all statistically significant and in the expected directions. Specifically, creative design (CD), creative content (CC), recreation experience (RE), and narrative design (ND) were significantly and positively correlated with perceived benefits (PB); similarly, privacy concerns (PI), ethical concerns (EQ), and negative psychological responses (NE) were positively correlated with perceived risks (PR). These findings provide strong theoretical and empirical support for the subsequent structural path modeling.

Table 4 Pearson correlation analysis

Model fit and hypothesis verification

Based on 986 valid survey responses, a structural equation model (SEM) was constructed and tested using AMOS 27.0. The primary model fit indices indicate a good fit (see Table 5). Among the absolute fit indices, CMIN/df = 2.45, RMSEA = 0.067, GFI = 0.915, and AGFI = 0.903. For incremental fit, NFI, CFI, and IFI all exceed the 0.90 threshold, demonstrating strong explanatory power. The parsimony-adjusted index PGFI is 0.512, which falls within an acceptable range.

Table 5 Model Fit Indices of the structural equation model

Path analysis results (see Table 6) show that creative design (CD), creative content (CC), recreation experience (RE), and narrative design (ND) all exert significant positive effects on perceived benefits (PB). Specifically, CD, CC, and RE are significant at the 1% level (P < 0.001), while ND is significant at the 5% level (P = 0.003). Regarding perceived risks (PR), ethical concerns (EQ), negative psychological responses (NE), and privacy concerns (PI) all have significant positive effects. EQ and NE are significant at the 1% level, while PI is significant at the 10% level (P = 0.034).

Table 6 Model estimation results and hypothesis testing

Further analysis reveals that perceived benefits (PB) significantly and positively influence both AIGC platform experience (AE) and offline cultural engagement intention (OE) (P < 0.001), whereas perceived risks (PR) exhibit significant negative effects on both AE and OE. Notably, the path coefficient from PR to OE is significant at the 10% level (P = 0.048).

The results validate the structural model and suggest that AIGC-enabled cultural heritage platforms enhance users’ perceived benefits through creative design and cultural expression. This, in turn, positively influences both online experiences and offline engagement behaviors. At the same time, users’ concerns regarding privacy breaches, cultural distortion, and emotional detachment contribute to perceived risk, which negatively affects behavioral intention through the risk pathway. The full structural model is illustrated in Fig. 14.

SEM–ANN-based analysis of how AIGC-driven digital cultural heritage platforms influence users’ offline experience intentions

To further enhance the model’s predictive power and explanatory precision, this study incorporates an Artificial Neural Network (ANN) approach alongside the Structural Equation Modeling (SEM) analysis, resulting in a hybrid SEM–ANN model. This integrated framework explores the mechanisms through which AIGC-driven digital cultural heritage platforms influence users’ intentions for offline cultural engagement. Drawing on the methodological framework proposed by Liébana⁸⁹, four ANN sub-models (Models A–D) were constructed based on the significant SEM path results and the principles of ANN modeling to enhance the accuracy of fitting multi-path relationships and capture non-linear patterns more effectively (see Fig. 15).

**Fig. 15: A–D ANN models for predicting perceived benefits and perceived risks.**

Model A: The input layer includes creative design (CD), creative content (CC), narrative design (ND), and recreation experience (RE), with perceived benefits (PB) as the output layer. This model estimates the relative importance of positive content dimensions in predicting PB.

Model B: The input layer includes privacy concerns (PI), ethical concerns (EQ), and negative psychological responses (NE), with perceived risks (PR) as the output layer. This model measures the impact strength of various negative cognitive factors on PR.

Model C: The input layer consists of perceived benefits (PB) and perceived risks (PR), with AIGC platform experience (AE) as the output layer. This model examines the driving factors of cognitive evaluations of the online experience.

Model D: The input layer comprises perceived benefits (PB), perceived risks (PR), and online experience (AE), with offline cultural engagement intention (OE) as the output layer. This model tests the predictive strength of the complete behavioral transformation path.

Subsequently, the artificial neural network (ANN) models were constructed, as defined in Eq. (1).

$$\hat{y}=f({W}^{(2)}\cdot \sigma ({W}^{(1)}\cdot {\bf{x}}+{{\bf{b}}}^{(1)})+{b}^{(2)})$$

(1)

Where $\,x\in {R}^{n}$ represents the input vector, W⁽¹⁾ and W⁽²⁾ are the weight matrices, b⁽¹⁾ and b⁽²) are the bias terms, and $\sigma (\cdot )$ denotes the activation function. In this study, both the hidden layer and the output layer adopt the Sigmoid activation function, defined as (Eq. 2).

$$\sigma (z)=\frac{1}{1+{e}^{-z}}$$

(2)

All input and output variables were normalized using the min-max normalization method to improve the model’s training performance (Eq. 3).

$$x{\prime} =\frac{x-\,\min (x)}{\max (x)-\,\min (x)}\in [0,1]$$

(3)

This study employed a 10-fold cross-validation strategy to mitigate the risk of overfitting, where 90% of the data were used for training and the remaining 10% for testing in each iteration. The model’s predictive performance was evaluated using the Root Mean Square Error (RMSE) (Eq. 4).

$${\rm{RMSE}}=\sqrt{\frac{1}{n}\mathop{\sum }\limits_{i=1}^{n}{({\hat{y}}_{i}-{y}_{i})}^{2}}$$

(4)

As shown in Table 7, the average Root Mean Square Error (RMSE) values over ten training iterations for the four artificial neural network models are as follows: Models A, B, C, and D yielded average RMSEs of 0.2187, 0.2524, 0.2295, and 0.1853, respectively, on the training sets. Corresponding RMSE values for the test sets were 0.2280, 0.2570, 0.2175, and 0.1724, respectively. The relatively small standard deviations across models indicate good model fit, stable convergence, and strong generalization capability.

Table 7 Root Mean Square Error (RMSE) of artificial neural network models

To further evaluate the predictive performance of the integrated SEM–ANN model, a sensitivity analysis was conducted for each of the four ANN models. Based on the Normalized Importance metrics reported in Table 8, the following conclusions can be drawn:

Table 8 Sensitivity analysis of artificial neural network models

In Model A, creative design (CD) contributed most significantly to predicting perceived benefits (PB), with a normalized importance of 100%, underscoring its dominant role in enhancing users’ positive perceptions.

In Model B, ethical concerns (EQ) emerged as the key predictor of perceived risk (PR), with a significantly higher weight than other risk-related variables.

In Models C and D, perceived benefits (PB) and AIGC platform experience (AE) were the most influential predictors for their respective dependent variables, indicating that users’ subjective evaluations of content value and immersive experience play a pivotal role in the behavioral transformation mechanism.

Moreover, in the ANN analysis of Model A, narrative design (ND) exhibited a normalized importance of 64.12%, which was significantly higher than that of recreation experience (RE) at 28.56%. This ranking contrasts with the results of the Structural Equation Model (SEM), which showed a stronger linear path effect of recreation experience on behavioral intention. The ANN model reveals the dominant psychological role of narrative content in shaping user perceptions.

This finding highlights that narrative design on AIGC platforms functions as an information delivery medium and a critical mechanism for value construction and emotional engagement. In comparison, while recreational experiences may elicit immediate enjoyment and motivate short-term participation, their influence within the deeper psychological chain of “meaning-making – emotional resonance – value evaluation” is relatively limited. Narrative elements, on the other hand, more effectively trigger advanced cognitive processing and emotional involvement, significantly enhancing users’ overall perception of benefits.

From a psychological mechanism perspective, the strengths of narrative design can be understood through three key pathways: (1) Enhancing immersion and cultural identification through role-based engagement and cultural contextualization; (2) Reducing information complexity via causal structures that improve cognitive manageability; (3) Evoking nostalgia, reverence, and a sense of cultural continuity through contextualized representations of heritage.

These mechanisms are jointly activated and weighted within the ANN’s feature interaction modeling process, amplifying their importance in predicting user behavior.

Therefore, this study argues that although recreation experience appears more prominent in traditional linear models due to its significant path coefficients, narrative design plays a more central role in psychological guidance and perception shaping within the context of cultural value transmission and deep user experience construction. The ANN findings complement the explanatory gaps of SEM and empirically support the user psychology hypothesis that “narrative outweighs entertainment,” offering strong theoretical and practical insights.

link