Museum achieving future sustainable development: A study on the attraction of historical exhibits in museum nursing direction

Revision as of 06:54, 9 May 2024

ABSTRACT

This research aims to explore the method of combining digital twins (DTs) with Convolutional Neural Network (CNN) algorithms to analyze the attraction of museum historical and cultural exhibits to achieve intelligent and digital development of museum exhibitions. Firstly, the DTs technology is used to digitally model the museum's historical and cultural exhibits, realizing virtual exhibit display and interaction. Then, the Mini_Xception network is proposed to improve the CNN algorithm, which is combined with the ResNet algorithm to construct a human facial emotion recognition model. Finally, the proposed model is used to accurately predict the attraction of museum historical and cultural DTs exhibits by recognizing people's facial expressions when observing them. Comparative experimental results show that the proposed recognition method can greatly improve accuracy and scalability. Compared with traditional recognition methods, the recognition accuracy can be improved by 5.53%, and 2.71s can reduce the model's data transmission delay. The enhanced scalability of the recognition type can also meet the real-time interaction requirements in a shorter time. This research has important reference value for the digital and intelligent development of museum historical and cultural exhibitions.

Keywords: Digital Twins; Convolutional Neural Network; Museum Exhibits; Mini Xception Network.

1. Introduction

The historical and cultural exhibits in the National Museum of China are handicrafts and fashionable cultural consumer goods [1,2]. The National Museum, located in Beijing, China, is one of the largest comprehensive museums in the country. The museum houses a rich collection of historical and cultural artifacts, including ancient Chinese handicrafts, artworks, and representative cultural consumer goods. These exhibits carry abundant historical and cultural connotations and possess 46 significant educational and research value. However, the cultural 47 products of museums face problems such as slow innovation and fierce competition. How to find a good balance between the quality, fashion, and price of museum products is areal and urgent problem that needs to be solved. At present, traditional products in many areas only need the latest artistic processing technology, mainly based on cost, speed, and quantity. But the shortcomings are also obvious. 1. The daily practicability of traditional artworks is poor, and the artistic value is very limited. With the development of industrial production methods, many artworks made by merchants in the past have become shoddy replicas that lack craftsmanship and spiritual value. The advantages of these replicas are low cost and high speed. The downside is that museum-specific products have become standard street goods, lacking unique cultural and artistic connotations and tastes. 2. Some museum products are produced based on industrial models. The product's appearance is mainly based on simple animation and urban tourism features, with little creativity, technical content, and few stylistic changes and differences. Some products have only slight adjustments to the text content and are not original. The traditional construction team for raw materials and derived resources is not strong, and the level of management and control needs to be improved. Operations lack both macro-level planning techniques and detailed implementation of basic knowledge. The product design process does not reflect the core cultural content well [3]. Digital twins (DTs) can map entities or systems from the real world to the digital world and perform modeling, simulation, and analysis [4]. Applying this technology in cultural exhibit attraction assessment can help evaluators understand the characteristics, strengths, and weaknesses of cultural exhibits more intuitively.

In order to give full play to the educational function of museums and better adapt to the rapid development of the industry, relevant policy documents need to be effectively implemented. Development, construction, and resource advantages are transformed into market advantages, enriching spiritual connotations, avoiding single-product content, and overcoming the problem of insufficient creativity. The presentation of core culture and the development of cultural products must pay attention to product development and guide cultural opening through cultural creation [5, 6]. This research analyzes the current product development status in the National Museum of China and analyzes the existing production system and consumer demand. The results can not only guide product design practice and promote design upgrading but also enrich the theoretical system of design evaluation and overcome the randomness of subjective experience evaluation, thereby nourishing practice and promoting industrial development.

The objective of this research is to investigate the appeal of historical and cultural displays. The approach involves several key steps. Firstly, DTs are employed to digitally model historical and cultural museum exhibits, enabling virtual presentation and interaction. Subsequently, enhancements are made to the Convolutional Neural Network (CNN) algorithm, and a visitor facial emotion recognition model is developed by combining Mini_Xception with ResNet. Lastly, a cultural exhibit attractiveness assessment model is established using facial emotion recognition. The innovation of this research lies in the application of DTs technology for the digital modeling of historical and cultural museum exhibits, enabling virtual display and interaction. This represents a significant stride in digital and intelligent advancement. Furthermore, the research introduces the Mini_Xception network to refine the CNN algorithm and merges it with the ResNet algorithm to construct a visitor facial emotion recognition model, which yields more precise predictions of visitor attraction to exhibits. The overall organizational structure of this research is delineated below. Section 1 provides an overview of the historical and cultural context of the National Museum of China, along with the museum's functions. Section 2 arranges data based on the features of the museum's cultural exhibits by referring to relevant literature on the CNN algorithm. Section 3 develops an evaluation model for the appeal of cultural exhibits, integrating CNN and facial emotion recognition strategies. Section 4 presents experimental data results acquired through data transmission and experiments. Section 5 draws experimental conclusions by analyzing and organizing the experimental data, ultimately leading to the experiment's conclusion. This research holds practical applicability in enhancing the allure of historical and cultural exhibits within museums.

2. Related work

2.1. CNN algorithm and related research on museum exhibits

In the context of relevant CNN research, Sakai et al. (2021) [7] employed graph CNN to anticipate the pharmacological behavior of chemical structures, utilizing the constructed model for virtual screening and the identification of a novel serotonin transporter inhibitor. The efficacy of this new compound is akin to that of in-vitro-marketed drugs and has demonstrated antidepressant effects in behavioral analyses. Rački et al. (2022) [8] utilized CNN to investigate surface defects in solid oral pharmaceutical dosage forms. The proposed structural framework is evaluated for performance, demonstrating cutting-edge capabilities. The model attains remarkable performance with just 3% of parameter computations, yielding an approximate eight fold enhancement in drug property identification efficiency. Yoon et al. (2022) [9] employed CNN and generative adversarial networks to synthesize and explore colonoscopy imagery through the development of a proficiently trained system using imbalanced polyp data. Generative adversarial networks were harnessed to synthesize high-resolution comprehensive endoscopic images. The findings reveal that the system augmented with synthetic image enhancement exhibits a 17.5% greater sensitivity in image recognition. Benradi et al. (2023) [10] introduced a hybrid approach for facial recognition, melding CNN with feature extraction techniques. Empirical outcomes validate the method's efficacy in facial recognition, significantly enhancing precision and recall. Ahmad et al. (2023) [11] proposed a methodology for recognizing human activities grounded in deep temporal learning. The amalgamation of CNN features with bidirectional gated recurrent units, along with the application of feature selection strategies, enhances accuracy and recall rates. Experimental results underscore the method's substantial practicality and accuracy in human activity recognition.

Furthermore, certain scholars have conducted relevant studies into museum exhibit attributes. Ahmad et al. (2018) [12] investigated the requisites of museums as perceived by the public, deriving insights from museum scholars and experts to outline the trajectory for developing museum exhibitions in Malaysia aimed at facilitating public learning. The results highlight the unique role of museum exhibits in residents' lifelong learning journey. Ryabinin et al. (2021) [13] harnessed a scientific visualization system to explore cyber-physical museum displays rooted in a system-on-a-chip architecture with a customized user interface. This research introduced an intelligent scientific visualization module capable of interactive engagement with and display of museum exhibits. Shahrizoda (2022) [14] delved into architectural and artistic solutions for museum exhibitions, clarifying architectural matters based on existing scientific and historical documents, and assessing prevailing characteristics of architectural and artistic solutions aligned with museum objectives.

These pertinent studies offer valuable points of reference and thought-provoking insights for the current research. They exemplify the diverse applications of CNN across domains such as drug research, image synthesis and recognition, and facial identification. These investigations underscore CNN's robust modeling and predictive capabilities, indicating its potential significance within the realm of cultural exhibits. They present intriguing concepts for image synthesis and enhancement within the museum exhibit domain. Additionally, in the arena of museum product development, the exploration of the integration of deep learning with other technologies holds promise for elevating the precision and effectiveness of product design.

2.2. A review of research on the attractiveness of museum exhibits

The exploration of the allure exhibited by cultural displays within museums can draw parallels with the allure that cities exert upon tourists. Boivinetal. (2019) [15] delved into the determinants of urban tourism allure. Utilizing Bordeaux as an illustrative case, the theoretical framework was applied to scrutinize the factors influencing the city's tourism appeal. Through an analysis rooted in a threefold theoretical framework of allure, the research unveiled varying assessments of allure associated with the Internet and social media. Raimkulov et al. (2021) [16] examined destination appeal within the context of Silk Road tourism and probed the intermediary role of revisitation satisfaction and loyalty. Findings indicated a direct correlation between tourist satisfaction and loyalty, affirming the mediating role of revisitation satisfaction and loyalty. Hong et al. (2022) [17] highlighted the charm intrinsic to Chinese traditional musical instruments and ethnic art tourism, investigating the practical application of augmented reality technology in the preservation of ethnic culture and art. By generating a comprehensive database of traditional musical instrument information, they crafted a virtual reality application tailored for educational and tourism purposes. Outcomes demonstrated that information technology has the potential to augment the appeal of cultural and artistic tourism. Palumbo (2022) [18] embarked on a study of the National Museum through digital avenues, exploring the impact of digitalization on the allure of institutions and cultural displays. Results underscored that digital technologies can directly enhance the appeal of exhibits by furnishing digital services to physical visitors.

Numerous academics have extensively examined the attributes of cultural tourism within urban settings and its appeal. As information technology advances, the evolution and refinement of the CNN algorithm enable the practical implementation of facial expression recognition technology in real-life situations. The utilization of DTs facilitates the digital modeling of cultural displays. The allure of historical and cultural artifacts housed in the National Museum of China is investigated through the application of an attractiveness analysis model and historical-cultural exhibit-focused artificial intelligence algorithms. This endeavor aims to enhance the appeal of cultural exhibits while broadening the reach and popularity of these cultural treasures.

3. Model research on the attractiveness of historical and cultural exhibits in the National Museum based on CNN

3.1. Functional analysis of cultural products in the National Museum of China

Viewed from a cultural design standpoint, the National Museum of China encompasses a historical journey spanning 5,000 years of traditional Chinese culture. Consequently, cultural products should meticulously select representative elements from the wealth of cultural resources, aligning with their distinctive historical significance. Through these cultural products, one can propagate informational culture within the cultural heritage, uphold esteemed traditional practices, and fortify the educational role of cultural offerings [19-21]. Firstly, the design of museum exhibition booths can consider isolating certain popular exhibits to disperse visitor traffic effectively. Additionally, in booth design, efforts should be made to encourage visitor engagement in interactive activities, fostering experiences deeply rooted in cultural impact. Regularly rotating less popular exhibits can align with the concept of catering to visitors. Given the subjective and challenging nature of exhibit evaluation, different visitors have varying cultural content preferences, inevitably leading to differences. A broader audience can be attracted by contemplating and crafting cultural art works, cultural attributes, and innovative choices, thus cultivating a positive feedback loop. The cultural products and functional attributes of the museum are delineated in Fig. 1:

Fig. 1 The structure of the cultural product features of the National Museum

The architectural framework of China National Museum's cultural product characteristics is illustrated in Figure 1. This framework comprises the input layer, data input embedding layer, standardization and normalization layer, output layer, and accuracy loss layer. The input layer receives data from cultural exhibit images as input. The data input embedding layer pre-processes and extracts features from input images, transforming them into a format the model can handle. The standardization and normalization layer processes the data for standardization and normalization, ensuring the model's universality across different datasets. The output layer is the model's final layer, used to predict the features of cultural exhibits. The accuracy loss layer calculates the error between the model's predicted and actual results, updating model parameters through back propagation. This process aims to minimize error loss and enhance model accuracy. Through the collaborative operation of these layers, the essential structural features of cultural products can be effectively extracted and abstracted, achieving automatic recognition, classification, and recommendation capabilities for cultural exhibits.

3.2. Analysis of DTs applied to digital modeling of museum historical and cultural exhibits

DTs can provide more realistic simulations of exhibition venues and exhibits, enabling visitors to understand better the context and meaning of exhibits [22,23]. This research applies DTs to the digital modeling of museum historical and cultural exhibits, including key element entity modeling, DTs virtual modeling, and virtual-real mapping association modeling.

In solid modeling, the elements involved in the activities include exhibit characteristics, functions, performance, etc. Considering the above factors, this research uses a formal modeling language to model the key elements of the exhibit digitization process. The model definition is shown in Eq. (1):

(1)

In Eq. (1), PS refers to the physical space of key elements in the digitization process of exhibits. PE refers to the collection of characteristics of historical and cultural exhibits in physical space. $\vartriangleleft$ refers to the collection of functions of historical and cultural exhibits in physical space. PW refers to the collection of historical and cultural exhibits in physical space. refers to the natural connection between PE, PP, and PW, indicating the natural interaction between the three. PE, PP, and PW are all dynamic collections. Collection elements and their status can be continuously updated with the dynamic operation of the digitization process of exhibits.

DTs volume modeling needs high modularity, good scalability, and dynamic adaptability, which can be completed in information space using the parametric modeling method. Virtual models of physical entities are established in Tecnomatrix, Demo3D, Visual Components, and other software [24,25]. In addition to describing the geometric information and topological relationship of the automated production line, the virtual model also contains the complete dynamic engineering information description of each physical object [26,27]. The multiple- dimensional attributes of the model are parametrically defined to realize the real-time mapping of the digital modeling process of exhibits. The specific definition of DTs volume modeling is shown in Eq. (2):

(2)

In Eq. (2), CS refers to the information space of the key elements of the digitalization process of exhibits. DE refers to collecting characteristics of historical and cultural exhibits in the information space. DP refers to the collection of functions of historical and cultural exhibits in the information space. $\vartriangleleft$ refers to collecting historical and cultural exhibits in the information space. refers to the natural connection between DE, DP, and DW, indicating the natural interaction between the three. DE, DP, and DW are all dynamic collections. The collection elements and their states in the information space are updated synchronously with the dynamic operation of the digitalization process of the exhibits in the physical space.

Finally, the virtual-real mapping association is modeled. The virtual-real mapping relationship between the two is further established based on establishing the physical space entity model PS and the information space twin model CS. The formal modeling language is used to model its virtual-real mapping relationship. The model definition is shown in Eq. (3):

(3)

In Eq. (3), ${\overset {1:1}{\Leftrightarrow }}$ refers to the real two-way mapping between the physical space entity model and the information space twin model. $\vartriangleleft$ refers to the natural connections between different models. Therefore, PE, DE, PP, DP, PW, and DW should maintain asynchronous two-way real

3.3. mapping. Design and research of facial emotion recognition strategy based on CNN

Emotion refers to the subjective emotional experience and internal psychological state that humans have in response to external stimuli. It encompasses various complex experiences and reactions such as happiness, sadness, surprise, anger, fear, and more. Facial expressions are the outward manifestation of emotions, reflecting the emotional states of individuals through muscle movements and neural transmission. Facial expressions can be automatically recognized and classified using computer vision and image processing techniques, helping people gain a more accurate understanding of visitors' emotional states and responses in front of historical and cultural exhibits in museums. Facial recognition technology can identify facial expressions automatically by analyzing the changes and combinations of facial features, such as the eyes, mouth, eyebrows, and more. For example, when visitors appreciate an exhibit in a museum, they may display different emotions and attitudes, such as liking, disliking, or surprise. By utilizing facial recognition technology, people can monitor visitors'emotional changes and responses in real-time and connect this information with the knowledge and stories related to the displayed historical and cultural artifacts. This enhances visitors' experience and cultural awareness. In conclusion, facial expressions are the outward manifestation of emotions, and facial recognition technology can infer visitors' emotional states and responses by analyzing these expressions. By employing these technologies, people can better understand visitors' emotional experiences and responses in front of historical and cultural exhibits in museums, providing a more scientific and effective reference for exhibition planning and cultural heritage preservation in museums.

Since the input data source of tourist facial expressions is image sequence information collected by surveillance cameras, the model must maintain high recognition accuracy while maintaining high recognition speed [28]. The mini_xception model boasts a lightweight design, enabling it to maintain relatively high accuracy while reducing the network's parameter count and computational load. This lightweight structure makes the mini_xception network well-suited for integration into practical applications, particularly in resource- constrained scenarios [29,30]. Therefore, mini_xception is chosen as the basic network structure. Firstly, layers are added in batches after each convolution to speed up training and improve the model's generalization ability. A regularized dropout rate is used in the model to randomly eliminate some nodes in the network with a given probability. While the network structure is simplified, over-adjustment of the network is avoided. The classification ability of the network is improved. The principle is that in forwarding propagation, some neurons' activation values are randomly stopped in a certain proportion. Therefore, the model is less dependent on some local features. The actual visitors of the museum are different, and the lighting intensity of the booth is also different. The main pre-processing methods adopted include grayscale, normalization, and histogram processing. Based on CNN, the structural framework of the designed facial emotion recognition model is shown in Fig. 2:

Fig. 2 Framework of facial emotion recognition model structure based on Mini_Xception and ResNet

Assuming there is a sequence of facial images of a group of tourists encompassing diverse emotional expressions as they stand in front of museum exhibits. The objective is to employ a model for recognizing and classifying these emotions while maintaining a balance between speed and accuracy. The mini_xception architecture has been chosen for the network structure. Following each convolutional layer, batch normalization layers have been added to expedite training and enhance model generalization. Additionally, dropout regularization has been employed within the model to deactivate some nodes, mitigating the risk of overfitting randomly. Given a sequence of facial images of different tourists, during the forward propagation phase, a portion of neuron activations is randomly halted to a certain proportion. This ensures the model does not excessively rely on specific localized features. Considering the varying illumination conditions faced by real tourists in the museum, pre-processing is essential for input facial images. Operations such as cropping, scaling, and grayscale conversion are conducted to optimize the image input for network processing. Subsequently, the mini_xception algorithm is applied to extract features from the facial images. This algorithm accelerates recognition speed while reducing the number of parameters and computations. The mini_xception algorithm transforms tourists' facial images into feature-rich representations encompassing spatial and semantic information. Following this, the ResNet algorithm is employed to fuse features extracted by mini_xception. ResNet, being are sidual network, leverages residual connections to reduce training time and enhance accuracy. This algorithm combines the spatial and semantic features derived from mini_xception to elevate emotion classification's accuracy and generalization capabilities. Ultimately, a Softmax classifier is utilized to categorize the fused features, thereby predicting the emotional category of the facial image. This classifier maps the feature vector into a probability space, yielding the predicted probabilities for each category. Suppose the model has been trained. In that case, a facial image of a tourist can be predicted as belonging to the "liking" emotional category. From capturing facial images to feature extraction and emotional classification, this algorithm aids in the automated recognition of tourists' emotional expressions before historical and cultural exhibits in museums. Consequently, it offers a more scientific and effective basis for museum exhibit planning and cultural heritage preservation.

In this model, W and I represent one network layer's weight tensor and input tensor, respectively. Among them, c represents the channel and $w<w_{in}{\mbox{ }},h<h_{in}$ represents the width and height, respectively. The network approximates W through a binary tensor $B{\text{ e}}\{+1,-1\}^{c{\text{х}}w{\text{х}}h}$ cxwxh and a scaling factor $C{\mbox{ }}eR+$ , as shown in Eq. (4):

(4)

In Eq. (4), W refers to the real weight.

In order to obtain the best approximation, it is assumed that vectors w and b represent weights and binarized weights, respectively, as shown in Eq. (5):

(5)

The determination of the optimal scale factor and binarization parameters is shown in Eq. (6):

(6)

The two sides of the above formula are subtracted, and the optimization function is obtained, as shown in Eqs. (7) and (8):

(7)

(8)

In Eq. (8), C*, b * represents the scale factor and the weight optimization value after binarization, respectively. Eq. (7) is expanded to obtain Eq. (9):

(9)

In Eq. (9), be{+1,一1} n , b Tb = n isa constant. w refers to a known quantity, so w T w is also a constant. Let q = w Tw, Eq. (9) can be simplified to Eq. (10):

(10)

C refers to a positive value. The optimization problem of b is transformed into Eq. (11):

(11)

According to the characteristics of the value of b, Eq. (12) is calculated:

(12)

In order to obtain the optimal value of C* , J takes the partial derivative of C and makes it zero. The optimization result of C is obtained, as shown in Eq. (13):

(13)

The value of b* is plugged into Eq. (13), and Eq. (14) is obtained:

(14)

In order to further reduce the quantization error, the network's weight and activation quantization process is introduced into a scaling factor. Quantization weights Kb (w) and activations Kb (C) , including scale factors, are obtained, as shown in Eqs. (15) and (16):

(15)

(16)

In Eq. (16), b, h are the results of the weight vector w and the activation vector $C$ after the Sign function. $C,\beta$ is the corresponding scaling factor. The definition of the convolution operation ofthe network forward process is shown in Eq. (17):

(17)

In Eq. (17),z refers to the output and $\sigma \left(.\right)$ refers to the activation function. $h{\text{e}}\left\{+1,-1\right\}^{n}$ refers to the activation value after binarization. $\beta$ is the scaling factor for the activation. 勇 is the convolution operation. ⊙ points to the inner product ofthe quantity.

The expression of the approximate point product between C and w is shown in Eq. (18):

(18)

The optimization problem to be solved is shown in Eq. (19):

(19)

The weight vectors are optimized, and the emotions of faces exhibited by DTs are analyzed precisely.

In order to solve the optimization problem defined by Eq. (19) using swarm intelligence optimization algorithms, the steps are as follows:

Step 1: Randomly generate a set of particles, where each particle represents a possible solution in the solution space. Each particle has its own position and velocity, initialized with appropriate values.

Step 2: Calculate the fitness value of each particle based on the objective function defined in Eq. (19). The fitness value indicates the quality of the solution corresponding to the particle.

Step 3: For each particle, update its individual best position based on its current fitness value and historical best fitness value.

Step 4: Update the global best position of the swarm based on the fitness values of all particles. Record the optimal solution and its fitness value.

Step 5: Update the velocity and position of each particle based on its current velocity, individual best position, and global best position.

Step 6: Check if the termination condition is met, such as reaching the maximum number of iterations or the fitness value reaching a predetermined threshold. If the condition is met, proceed to step 8; otherwise, continue with steps 2 to 6.

Step 7: Repeat steps 3 to 6 until the termination condition is met.

Step 8: The optimal solution recorded during the iteration process is the optimization result of the problem.

3.4. Evaluation model of the attractiveness of cultural exhibits based on face recognition algorithm

The data recognized by the face recognition algorithm is mainly used to associate the subsequent facial expression information with character recognition. The residence time of each visitor in the current state can also be calculated through identity recognition information. Since the representation of features in consecutive images is difficult to change in the actual computational process, continuous recognition inevitably leads to many repeated expression results. The facial expression network information is extracted every 15 frames. Facial expression classification theory and created datasets are used as the basis. The client's facial expressions are divided into six categories. Simple results from facial expression recognition may not accurately reflect visitors' satisfaction with an exhibit. Therefore, information about the expressions of tourists is provided in the index. The attractiveness evaluation model of cultural exhibits is analyzed. The evaluation process of the attractiveness of cultural products is shown in Fig. 3:

Fig. 3 The evaluation process of the attraction of museum historical and cultural products

In Fig. 3, the evaluation process of the attractiveness of historical and cultural artifacts in museums can be divided into five steps. Step 1: Model Initialization. An appropriate model needs to be selected, and its parameters and network structure are initialized. Commonly used models include CNN and Recurrent Neural Network (RNN). Step 2: Computing Hidden Layer and Unit Outputs. The selected model is used to compute the hidden layer and unit outputs by inputting the relevant data of historical and cultural artifacts in museums. Step 3: Data Input and Vectorization. The relevant data of historical and cultural artifacts in museums is transformed into vector form for computation and processing. Step 4: Calculating Deviation between Target and Prediction. A comparison is made between the predicted results and the actual results to calculate the deviation or error between them. Step 5: Output and Weight Updating. The model parameters are updated through backpropagation based on the magnitude of the error. This aims to minimize the error loss and improve the accuracy of the model. This process can be performed in both forward and backward directions, continuously optimizing the model's performance by updating weights and biases using the backpropagation algorithm. In summary, through the iterative process of these five steps, the accuracy and precision of the attractiveness evaluation model for historical and cultural artifacts in museums can be gradually improved. This provides a more scientific and effective basis for museum exhibition planning and cultural heritage preservation.

3.5. Data transmission and experimental research

The experiment is conducted in a museum located in City B. The duration of the experiment is one month. Firstly, a random sample of 500 visitors who visited the museum was selected as the subjects. Then, cameras are installed in various exhibition halls to collect facial expression data of the visitors using facial recognition technology in front of different exhibits. The data is acquired by installing cameras in the museum's ten booths. When the file is saved, the identification information of the exposure point in the filename is used to distinguish the video data of each exposure point. If the traffic is high, it is limited to the amount of data to be considered, and 10 minutes of exposure input data plus traffic is selected for each program. The CNN algorithm detects faces in each frame of video data. Then, a 160*160-pixel face image is cut out, and the face edge is output at a specific position of the frame image. After the visitor obtains the collected face image, each point is input with an identification code. From the data in the MySQL database, quantitative scoring information, human identification information, residence time information, and exposure point information of the program representation labels are imported. Front-end code is used to connect to the back-end. Then, JavaScript is used to access the database data. Finally, an object library that displays attractiveness metrics is used to display the data.

In addition, in order to further analyze the attractiveness of the museum's historical and cultural exhibits, the proposed model algorithm is compared with the performance of Mini_Xception, ResNet, and Benradiet al. (2023). The recognition accuracy, scalability rate, data transmission delay of the system, and the recognition speed of tourists' emotions are analyzed using the human-computer interaction (HCI) system. Additionally, the higher the attraction of the exhibits, the longer the visitors stay in the exhibition area. The system analyzes the facial expression scores of tourists in front of different numbered exhibits and the visitor's stay time. It compares the attractiveness of different numbered exhibits according to the facial expression score and the visitor's stay time.

4. Results and Discussion

4.1. Comparison of Recognition Accuracy and Expansion Rate of DTs System

In order to analyze the performance of each algorithm, different algorithms are used in the DTs system to analyze the accuracy of tourist emotion recognition and the system's scalability, as shown in Fig. 4.

Fig. 4. Recognition accuracy and scalability curves of different algorithms in museum exhibit DTs system (a. recognition accuracy curve; b.scalability rate change curve)

In Fig. 4, as the model step size parameter increases, the accuracy of different algorithms in the DTs system for emotion recognition will slowly increase. When the model step size parameter is 0, the proposed algorithm's recognition accuracy is 75.60%, while the recognition accuracy of other models is less than 73.67%. When the model step size parameter is increased to 100, the recognition accuracy of the proposed model algorithm reaches 95.92%, while other model algorithms do not exceed 90.39%. The recognition accuracy of each model algorithm is sorted from large to small: proposed model algorithm> Benradi et al. (2023)> Mini_Xception> ResNet. In addition, when the model step size parameter is 80, the proposed algorithm improves to 95.75%, which is better than other model algorithms. The results of comparative experiments show that the proposed recognition method can greatly improve the recognition accuracy and extension rate. The recognition accuracy can be increased by 5.53% compared with existing methods. When visitors identify DTs exhibits, the model can achieve more accurate emotion recognition.

4.2. Comparison of results between data transmission delay and tourist emotion recognition speed

The system uses different algorithms to compare the data transmission delay time and the speed of tourist emotion recognition, as shown in Fig. 5.

Fig. 5. The data transmission delay time and tourist emotion recognition speed change curve of different algorithms in the system (a. data transmission delay time; b. tourist emotion recognition speed change curve)

In Fig. 5, in the system, as the number of model iterations increases, the data transmission delay time of each model algorithm shows a downward trend. When the model iterates 100 times, the data transmission delay of the traditional algorithm ResNet is 3.96s. The data transmission delay of the improved CNN model is only 2.71s. After 550 model iterations, the data transmission delay of the traditional algorithm ResNet drops to 2.49s. The data transmission delay of the optimized CNN algorithm can be reduced to 1.67s. In addition, the emotion recognition speed of the traditional algorithm ResNet is 3.22s after 550 iterations, and the emotion recognition speed of the improved CNN algorithm is only 1.33s. The data analysis results show that the data transmission delay of the proposed model algorithm can be reduced to 2.71s. The algorithm is superior to the traditional algorithm regarding emotion recognition speed.

In addition, the resource consumption of the four models is compared, as shown in Fig. 6.

Fig. 6. Comparison of Computational Resource Consumption for Different Algorithms

In Fig. 6, the proposed algorithm's computational time, memory usage, and GPU memory consumption are 4.2s, 120MB, and 500MB, respectively, the lowest values among the four algorithms. This data indicates that the proposed algorithm has advantages over other models in terms of computational time, memory usage, and GPU memory consumption. The result means that the proposed algorithm can efficiently utilize computational resources and has lower resource consumption in the task of recognizing the emotions of museum visitors.

Furthermore, the research also compared the results of different algorithms in terms of data transmission latency and the speed of visitor emotion recognition. The findings revealed that with an increase in the number of model iterations, the data transmission latency for each algorithm exhibited a decreasing trend. For instance, after 100 rounds of model iteration, the data transmission latency for the conventional ResNet algorithm was 3.96 seconds, whereas the improved CNN model demonstrated a data transmission latency of only 2.71 seconds. Upon further augmentation to 550 rounds of model iteration, the data transmission latency of the conventional ResNet algorithm decreased to 2.49 seconds, while the optimized CNN algorithm achieved a reduced data transmission latency of 1.67 seconds. Similarly, the emotion recognition speed of the conventional ResNet algorithm was 3.22 seconds after 550 iterations, whereas the improved CNN algorithm exhibited a recognition speed of merely 1.33 seconds. Based on the results of data analysis, the proposed model algorithm is capable of reducing data transmission latency to 2.71 seconds and outperforms the traditional algorithm in terms of emotion recognition speed.

4.3. Analysis of facial expression scores and visitor stay time results for exhibits with different numbers

The facial expression scores of tourists in front of different numbered exhibits and the data on tourists' staying time in the exhibition area are analyzed, as shown in Fig. 7.

Fig. 7. The average score of facial expressions of museum visitors to different exhibits and the average stay time change

In Fig. 7, the horizontal axis represents the museum's numbered historical and cultural exhibits. The left vertical axis corresponds to the average facial expression scores of 500 visitors in front of various exhibits, while the right vertical axis represents the visitors' average duration of stay and admiration time in front of these exhibits. There is a certain variation in both the average facial expression scores and the average duration of stay for visitors in front of different museum exhibits. Exhibit 4 displays a higher average facial expression score (73) and a relatively longer average stay time (106 seconds). In contrast, exhibit 5 exhibits a lower average facial expression score (46) along with a relatively shorter average stay time (39 seconds). This observation implies a potential correlation between visitors' facial expressions and their duration of stay when encountering different exhibits.

Within this experiment, emotional reactions are the feelings experienced by visitors while observing the exhibits. These emotional reactions may encompass excitement, pleasure, curiosity, and surprise. Such emotional responses could directly impact visitors' preferences for the exhibits. For instance, if a visitor experiences strong excitement and pleasure when observing a particular exhibit, they are likely to favor it. Facial expressions serve as physiological manifestations of emotional experiences conveyed through facial muscle movements like smiling or frowning. Different facial expressions could be associated with distinct emotions and moods, which, in turn, may affect visitors' preferences for the exhibits. For instance, a visitor displaying a noticeable smile while observing an exhibit may indicate their pleasure and satisfaction with it. Positive emotional experiences associated with an exhibit may lead to a higher degree of liking. Comparing visitors' facial expressions in front of different exhibits can help determine which exhibits align better with their interests and preferences. Moreover, visitors might exhibit positive facial expressions such as smiling or widening eyes when they come across exhibits they like. These positive expressions could signify their interest or satisfaction with the exhibit, prompting them to spend more time better appreciating it. Conversely, negative facial expressions like furrowing brows or frowns might indicate dissatisfaction, disinterest, or confusion, leading to reduced interest and shorter stays. Therefore, the experiment compares visitors' average facial expression scores and their average duration of stay. The result shows a significance level (P) of 0.0138, which is less than the conventional threshold of 0.05, indicating a statistically significant positive correlation between visitors' facial expressions and their duration of stay. In this experiment, the research explored the correlation between facial expression scores and dwell time among visitors by recognizing and analyzing visitors' facial expressions. Facial expression scores were evaluated based on facial muscle movements to assess visitors' emotional experiences, while dwell time referred to the time visitors spent viewing exhibits. The research results demonstrated a statistically significant positive correlation between facial expression scores and dwell time. This implies that when visitors have higher facial expression scores, they tend to spend a longer time in front of exhibits. A higher facial expression score might indicate a positive emotional experience, such as excitement, curiosity, or satisfaction, prompting them to develop a greater fondness for the exhibit and be willing to invest more time in appreciation. Furthermore, the value of the correlation coefficient provides a deeper understanding. Based on the magnitude and sign of the correlation coefficient, the degree and direction of the association between facial expression scores and dwell time can be determined. As an illustrative instance of this research, the obtained results revealed a correlation coefficient of 0.0138, falling below the established conventional threshold of 0.05. The data indicate a significant positive correlation between the two, implying that as facial expression scores increase, visitors' dwell time also correspondingly increases.

Therefore, the findings of this research reveal a close relationship between visitors' facial expression scores and their dwell time within the exhibition hall. This discovery holds important implications for exhibition design and improvement, aiding in enhancing the attractiveness of exhibits and visitor experiences, thereby elevating visitor satisfaction and the overall success of the exhibition.

4.4. Discussion

In summary, the proposed model exhibits low latency in data transmission and improves the accuracy of emotion recognition. Furthermore, a comparison was made with other studies. Razzaq et al.(2022) proposed DeepClass-Rooms, a DTs framework for attendance and course content monitoring in public sector schools in Punjab, Pakistan. It employed high-end computing devices with readers and fog layers for attendance monitoring and content matching. CNN was utilized for on-campus and online courses to enhance the educational level [31]. Sun et al. (2021) introduced a novel technique that formalizes personality as a DTs model by observing users' posting content and liking behavior. A multi-task learning deep neural network (DNN) model was employed to predict users' personalities based on two types of data representations. Experimental results demonstrated that combining these two types of data could improve the accuracy of personality prediction [32]. Lin and Xiong (2022) proposed a framework for performing controllable facial editing in video reconstruction. By retraining the generator of a generative adversarial network, a novel personalized generative adversarial network inversion was proposed for real face embeddings cropped from videos, preserving the identity details of real faces. The results indicate that this method achieves notable identity preservation and semantic disentanglement in controllable facial editing, surpassing recent state- of-the-art methods [33]. In conclusion, an increasing body of research suggests that combining DTs technology with DNN can enhance the accuracy and efficiency of facial expression recognition.

5. Conclusion

With the rapid advancement of science and technology, AI and neural network algorithms have pervaded diverse domains across society. The integration of information technology has profoundly impacted daily life through the meticulous analysis of extensive datasets. Within the context of disseminating historical and cultural narratives, conventional museums often rely on staff members' prolonged visual observations and intuitive assessments to gauge the appeal of exhibits to visitors. This qualitative approach, while competent, poses challenges in terms of quantifiability. This research leverages DTs to digitally model historical and cultural artifacts digitally, enhancing the CNN algorithm to construct a facial emotion recognition model. By coupling this with visitors' dwell time within the exhibit areas, the degree of attraction exerted by each booth upon tourists is assessed and quantified. The functional analysis of historical and cultural artifacts at the National Museum of China provides insight. However, certain limitations persist. Foremost among these is the need for enhancement in the accuracy of the facial expression recognition algorithm. Subsequent research endeavors will seek to realize real-time detection of changes in visitors' facial expressions and dwell time through the refinement of the face detection algorithm, thereby addressing the appeal of cultural exhibits to tourists via intelligent HCI approaches.

References

[1] Wang D. Exploring a narrative-based framework for historical exhibits combining JanusVR with photometric stereo. Neural Computing and Applications, 2018, 29(5),pp. 1425-1432.

[2] Garcia- Luis V, Dancstep T. Straight from the girls: The importance of incorporating the EDGE Design Attributes at exhibits. Curator: The Museum Journal, 2019, 62(2), pp. 195- 221.

[3] Li P, Shi Z, Ding Y, et al. Analysis of the Temporal and Spatial Characteristics of Material Cultural Heritage Driven by Big Data—Take Museum Relics as an Example. Information, 2021, 12(4),pp. 153.

[4] Niccolucci F, Felicetti A, Hermon S. Populating the Data Space for Cultural Heritage with Heritage Digital Twins. Data, 2022, 7(8),pp. 105.

[5] Götz F M, Ebert T, Gosling S D, et al. Local housing market dynamics predict rapid shifts in cultural openness: A 9-year study across 199 cities. American Psychologist, 2021, 76(6),pp. 947.

[6] Bebenroth R, Goehlich R A. Necessity to integrate operational business during M&A: the effect of employees' vision and cultural openness. SN Business & Economics, 2021, 1(8), pp. 1-17.

[7] Sakai M, Nagayasu K, Shibui N, et al. Prediction of pharmacological activities from chemical structures with graph convolutional neural networks. Scientific reports, 2021, 11(1), pp. 1-14.

[8] RačkiD, Tomaževič D, Skočaj D. Detection ofsurface defects on pharmaceutical solid oral dosage forms with convolutional neural networks. Neural Computing and Applications, 2022, 34(1), pp. 631-650.

[9] Yoon D, Kong H J, Kim B S, et al. Colonoscopic image synthesis with generative adversarial network for enhanced detection of sessile serrated lesions using convolutional neural network. Scientific reports, 2022, 12(1),pp. 1-12.

[10] Benradi H, Chater A, Lasfar A. A hybrid approach for face recognition using a convolutional neural network combined with feature extraction techniques. IAES International Journal of Artificial Intelligence, 2023, 12(2),pp. 627.

[11] Ahmad T, Wu J, Alwageed H S, et al. Human Activity Recognition Based on Deep-Temporal Learning Using Convolution Neural Networks Features and Bidirectional Gated Recurrent Unit With Features Selection. IEEE Access, 2023, 11, pp. 33148-33159.

[12] Ahmad S, Abbas M Y, Yusof W Z M, et al. Creating Museum Exhibition: What The Public Want. Asian Journal Of Behavioural Studies (Ajbes), 2018, 3(11),pp. 27-36.

[13] Ryabinin K V, Kolesnik M A. Automated creation of cyber- physical museum exhibits using a scientific visualization system on a chip. Programming and Computer Software, 2021, 47(3), pp. 161-166.

[14] Shahrizoda B. Architectural and Artistic Solution of Museum Exhibition. European Multidisciplinary Journal of Modern Science, 2022, 5, pp. 149-151.

[15] Boivin M, Tanguay G A. Analysis of the determinants of urban tourism attractiveness: The case of Québec City and Bordeaux. Journal of destination marketing & management, 2019, 11, pp.67- 79.

[16] Raimkulov M, Juraturgunov H, Ahn Y. Destination attractiveness and memorable travel experiences in silk road tourism in Uzbekistan. Sustainability, 2021, 13(4),pp. 2252.

[17] Hong X, Wu Y H. The use of AR to preserve and popularize traditional Chinese musical instruments as part of the formation of the tourist attractiveness of the national art of Guizhou province. Digital Scholarship in the Humanities, 2022, 37(2), pp. 426-440.

[18] Palumbo R. Enhancing museums' attractiveness through digitization: An investigation of Italian medium and large- sized museums and cultural institutions. International Journal of Tourism Research, 2022, 24(2),pp. 202-215.

[19] Xianying D. Analysis on the Educational Function of Campus Culture in Higher Vocational Colleges. The Theory and Practice of Innovation and Enntrepreneurship, 2021, 4(23), pp. 77.

[20] Chikaeva K S, Gorbunova NV, Vishnevskij VA, et al. Corporate culture of educational organization as a factor of influencing the social health ofthe Russian student youth. Práxis Educacional, 2019, 15(36),pp. 583-598.

[21] DeBacker J M, Routon P W. A culture of despair? Inequality and expectations of educational success. Contemporary Economic Policy, 2021, 39(3),pp. 573-588.

[22] Zhong X, Babaie SarijalooF, Prakash A, et al. A multidisciplinary approach to the development ofdigital twin models ofcritical care delivery in intensive care units. International Journal of Production Research, 2022, 60(13),pp.4197-4213.

[23] Zhang J, Kwok H H L, Luo H, et al. Automatic relative humidity optimization in underground heritage sites through ventilation system based on digital twins. Building and Environment, 2022, 216, pp. 108999.

[24] Yang Y, Yang X, Li Y, et al. Application of image recognition technology in digital twinning technology: Taking tangram splicing as an example. Digital Twin, 2022, 2: 6.

[25] Herman H, Sulistyani S, Ngongo M, et al. The structures ofvisual components on a print advertisement: A case on multimodal analysis. Studies in Media and Communication, 2022, 10(2): 145-154.

[26] Kim D, Kim B K, Hong S D. Digital Twin for Immersive Exhibition Space Design. Webology, 2022, 19(1), pp. 4736-

[27] 4 Z 7 h 4 a4 n .g X, Yang D, Yow C H, et al. Metaverse for Cultural Heritages. Electronics, 2022, 11(22),pp. 3730.

[28] Wang Y, Li Y, Song Y, et al. The influence of the activation function in a convolution neural network model of facial expression recognition. Applied Sciences, 2020, 10(5), pp. 1897.

[29] Tian X, Tang S, Zhu H, et al. Real-time sentiment analysis of students based on mini-Xception architecture for wisdom classroom. Concurrency and Computation: Practice and Experience, 2022, 34(21): e7059.

[30] Briceño I E S, Ojeda G A G, Vásquez G M S. Arquitectura mini- Xception para reconocimiento de sexo con rostros mestizos del norte del Perú. MATHEMA, 2020, 3(1): 29-34.

[31] Razzaq S, Shah B, Iqbal F, et al. DeepClassRooms: a deep learning based digital twin framework for on-campus class rooms. Neural Computing and Applications, 2022, pp. 1-10.

[32] Sun J, Tian Z, Fu Y, et al. Digital twins in human understanding: a deep learning-based method to recognize personality traits. International Journal of Computer Integrated Manufacturing, 2021, 34(7-8),pp. 860-873.

[33] LinC, Xiong S. Controllable face editing for video reconstruction in human digital twins. Image and Vision Computing, 2022, 125, pp. 10451

@@ Line 18: / Line 18: @@
 -->
-==ABSTRACT==
+=='''ABSTRACT'''==
 This research aims to explore the method of combining digital twins (DTs) with Convolutional Neural Network (CNN) algorithms to analyze the attraction of museum historical and cultural exhibits to achieve intelligent and digital development of museum exhibitions. Firstly, the DTs technology is used to digitally model the museum's historical and cultural exhibits, realizing virtual exhibit display and interaction. Then, the Mini_Xception network is proposed to improve the CNN algorithm, which is combined with the ResNet algorithm to construct a human facial emotion recognition model. Finally, the proposed model is used to accurately predict the attraction of museum historical and cultural DTs exhibits by recognizing people's facial expressions when observing them. Comparative experimental results show that the proposed recognition method can greatly improve accuracy and scalability. Compared with traditional recognition methods, the recognition accuracy can be improved by 5.53%, and 2.71s can reduce the model's data transmission delay. The enhanced scalability of the recognition type can also meet the real-time interaction requirements in a shorter time. This research has important reference value for the digital and intelligent development of museum historical and cultural exhibitions.
@@ Line 363: / Line 363: @@
 In addition, in order to further analyze the attractiveness of the museum's historical and cultural exhibits, the proposed model algorithm is compared with the performance of Mini_Xception, ResNet, and Benradiet al. (2023). The recognition accuracy, scalability rate, data transmission delay of the system, and the recognition speed of tourists' emotions are analyzed using the human-computer interaction (HCI) system. Additionally, the higher the attraction of the exhibits, the longer the visitors stay in the exhibition area. The system analyzes the facial expression scores of tourists in front of different numbered exhibits and the visitor's stay time. It compares the attractiveness of different numbered exhibits according to the facial expression score and the visitor's stay time.
-==4. Results and Discussion ==
+=='''4. Results and Discussion''' ==
 ===4.1. Comparison of Recognition Accuracy and Expansion Rate of DTs System ===
@@ Line 413: / Line 413: @@
 In summary, the proposed model exhibits low latency in data transmission and improves the accuracy of emotion recognition. Furthermore, a comparison was made with other studies. Razzaq et al.(2022) proposed DeepClass-Rooms, a DTs framework for attendance and course content monitoring in public sector schools in Punjab, Pakistan. It employed high-end computing devices with readers and fog layers for attendance monitoring and content matching. CNN was utilized for on-campus and online courses to enhance the educational level [31]. Sun et al. (2021) introduced a novel technique that formalizes personality as a DTs model by observing users' posting content and liking behavior. A multi-task learning deep neural network (DNN) model was employed to predict users' personalities based on two types of data representations. Experimental results demonstrated that combining these two types of data could improve the accuracy of personality prediction [32]. Lin and Xiong (2022) proposed a framework for performing controllable facial editing in video reconstruction. By retraining the generator of a generative adversarial network, a novel personalized generative adversarial network inversion was proposed for real face embeddings cropped from videos, preserving the identity details of real faces. The results indicate that this method achieves notable identity preservation and semantic disentanglement in controllable facial editing, surpassing recent state- of-the-art methods [33]. In conclusion, an increasing body of research suggests that combining DTs technology with DNN can enhance the accuracy and efficiency of facial expression recognition.
-==5. Conclusion ==
+=='''5. Conclusion''' ==
 With the rapid advancement of science and technology, AI and neural network algorithms have pervaded diverse domains across society. The integration of information technology has profoundly impacted daily life through the meticulous analysis of extensive datasets. Within the context of disseminating historical and cultural narratives, conventional museums often rely on staff members' prolonged visual observations and intuitive assessments to gauge the appeal of exhibits to visitors. This qualitative approach, while competent, poses challenges in terms of quantifiability. This research leverages DTs to digitally model historical and cultural artifacts digitally, enhancing the CNN algorithm to construct a facial emotion recognition model. By coupling this with visitors' dwell time within the exhibit areas, the degree of attraction exerted by each booth upon tourists is assessed and quantified. The functional analysis of historical and cultural artifacts at the National Museum of China provides insight. However, certain limitations persist. Foremost among these is the need for enhancement in the accuracy of the facial expression recognition algorithm. Subsequent research endeavors will seek to realize real-time detection of changes in visitors' facial expressions and dwell time through the refinement of the face detection algorithm, thereby addressing the appeal of cultural exhibits to tourists via intelligent HCI approaches.
-==References==
+=='''References'''==
 [1] Wang D. Exploring a narrative-based framework for historical exhibits combining JanusVR with photometric stereo. Neural Computing and Applications, 2018, 29(5),pp. 1425-1432.