Applied research on innovation and development of blue calico of Chinese intangible cultural heritage based on artificial intelligence

This paper proposes a blue calico product design method based on design data and generative adversarial networks, as illustrated in Fig. 1. The framework comprises five interconnected stages that synergistically integrate traditional artistic features into modern design. The process begins with design data preparation, where blue calico patterns are systematically collected from historical archives and contemporary workshops, supplemented by the MSCOCO2014 dataset as content images. Key structural elements such as geometric motifs and floral arrangements are annotated through multi-channel analysis, while hierarchical features including indigo dye variations and texture complexity (e.g., stitch density) are extracted through spectral analysis.

Subsequently, the style transfer stage employs an enhanced Cycle-Consistent GAN architecture with three critical modifications: (1) Replacement of traditional convolution layers with Ghost convolution modules to reduce parameters and computational costs; (2) Integration of SRM attention modules for style feature recalibration through channel-wise adaptive weighting; (3) Implementation of cycle consistency loss combined with identity loss (L1 norm) to preserve semantic content. The generator network, structured with sequential Ghost convolution, SRM attention, and deconvolution modules, achieves precise transfer of blue calico characteristics like line breaks and indigo gradients while maintaining structural coherence.

The adversarial optimization framework employs a dual-generator architecture with gradient-normalized discriminators, combining adversarial training objectives with cyclic consistency constraints. This configuration achieves parameter reduction compared to baseline models while enhancing image fidelity metrics. The system automatically routes suboptimal outputs back through the processing pipeline based on predefined quality thresholds, maintaining continuous refinement of generated patterns.

Finally, the product generation stage converts validated designs into 2D patterns compatible with traditional craftsmanship techniques, demonstrated through practical applications in ceramic printing and textile weaving. The complete workflow achieves systematic transformation of blue calico artistry through three core innovations: Ghost convolution optimization for lightweight computation, SRM-enhanced style feature extraction, and dual-domain cyclic consistency preservation. Figure 1 explicitly visualizes the data flow from pattern analysis to synthetic image generation, including the critical feedback loop for manual refinement and model retraining.

Table of Contents

Recurrent generative adversarial networks

The CycleGAN consists of two mirror-symmetric generative adversarial networks forming a cyclic structure, containing two generators and discriminators, to achieve bidirectional mapping between images x and y.

Based on extensive literature review and technical analysis, we selected CycleGAN as our base framework for three main considerations: First, CycleGAN employs an unpaired dataset training paradigm, making it particularly suitable for artistic style transfer tasks, which is especially crucial for blue calico style transfer with its unique visual characteristics. Second, its cycle consistency loss mechanism effectively preserves the semantic information integrity of source domain images. Finally, the dual architecture design significantly enhances the stability and robustness of the style transfer process.

In this network framework, Generator G is responsible for mapping ordinary images to the target domain, transforming them into images with blue calico artistic features, while Generator F performs reverse mapping to ensure stylized images can be reconstructed back to their original domain. Meanwhile, two discriminators evaluate the authenticity of images in their respective domains. This bidirectional mapping mechanism provides important guarantees for maintaining semantic consistency during the style transfer process. The network structure of CycleGAN is shown in Fig. 2.

Generator networks

Because the performance of the generator network in the original network fails to meet expectations, the ghost convolution module and SRM attention module are introduced into the network, and the original 7 $\times$ 7 convolution module is replaced with three 3 $\times$ 3 convolution modules to reduce the network parameters. To improve the style refinement of image generation, this study introduces the SRM attention module into the converter to achieve refined feature extraction between style encoding and feature transformation. This approach maintains spatial consistency while making the migrated image more semantically relevant to the content image in terms of the color and segmentation information. The improved generator network consists of four main components: a 3 $\times$ 3 ordinary convolution module, a ghost convolution module, an SRM attention module, and an anti-convolution module; the structure is schematically shown in Fig. 3.

To reduce the learning cost in the process of generator network re-image style migration, shrink the amount of computation required by the model to process non-critical feature information, while simultaneously improving the performance of the generator network, the Ghost convolution is combined with the instance normalization layer and the nonlinear activation layer to form a Ghost convolution module, which is subsequently added to the generator network, as shown in Fig. 4.

The Ghost convolution module first obtains the eigenfeature map from the input features using ordinary convolution, computed as

$$\begin Y=(X)+b \end$$

(1)

where $X\in $ is the input feature and $Y\in $ is the eigenfeature map. Subsequently, the eigenfeature map is processed by a depth-separable convolution operation to obtain an output feature map similar to the eigenfeature map, which is calculated as

$$\begin {_}=\Phi ({{Y}_{\text {i}}}),i\in [1,m],j\in [1,s] \end{aligned}$$

(2)

Finally, the intrinsic and output feature maps are spliced in the channel dimension, and the spliced results are transferred to the instantiation normalization layer and nonlinear activation layer sequentially to finally obtain the output features.

Ghost convolution module

GhostNet²³ (Ghost Module for Efficient Neural Networks) is an innovative module proposed to improve computational efficiency and reduce parameter counts, functioning as a lightweight functional unit. It generates partial feature maps through a small number of conventional convolution operations, then produces additional Ghost feature maps through linear transformations and other operations on these feature maps. These feature maps are subsequently combined, significantly reducing computational costs while maintaining or even enhancing network performance. The Ghost module can be seamlessly integrated into various neural network architectures. The implementation process of the Ghost module is illustrated in Fig. 5.

In this paper, we employ the Ghost module to replace traditional convolution layers to improve computational efficiency and reduce memory consumption. The specific process is as follows:

(1)

Input Processing: The input image undergoes a conventional convolution operation with size and $c’$ channels to generate m intrinsic feature maps.
(2)

Ghost Feature Generation: Apply $1\times 1$ convolution kernels to each intrinsic feature map for linear transformation, generating $s-1$ Ghost feature maps for each intrinsic feature map.
(3)

Feature Map Combination: Concatenate the intrinsic feature maps with Ghost feature maps to form the final output feature maps, significantly reducing computational costs.

Compared to traditional convolution operations, the Ghost module reduces computational complexity by approximately s-fold when generating the same number of feature maps, thus substantially reducing memory and computational requirements while maintaining performance.

SRM attention module

SRM²⁴ (A Style-based Recalibration Module for Convolutional Neural Networks) proposed by Lee et al., is a soft attention mechanism that adaptively recalibrates feature map styles by extracting style information from hidden layer feature maps. By integrating the SRM module into the CycleGAN generator, we enhanced the model’s capability to extract and express blue calico style features, as illustrated in Fig. 6.

The SRM module is composed of two parts: a style pooling layer and a style integration layer. Once the feature map has been imported into the module, it is initially processed by the style pooling layer. This layer is responsible for extracting the style features from each channel, which it achieves by pooling the feature responses of each channel and obtaining them through the application of average pooling and standard deviation pooling. The calculation process is as follows:

$$\begin{aligned} {{\mu }_{nc}}=\frac{1}{H\times W}\sum \limits _{h=1}^{H}{\sum \limits _{w=1}^{W}{{{X}_{nchw}}}} \end{aligned}$$

(3)

$$\begin{aligned} {{\sigma }_{nc}}=\sqrt{\frac{1}{H\times W}\sum \limits _{h=1}^{H}{\sum \limits _{w=1}^{W}{{{({{X}_{nchw}}-{{\mu }_{nc}})}^{2}}}}} \end{aligned}$$

(4)

In the context of the module, the input feature is designated as ${{X}_{nc}}\in {{R}^{N\times C\times H\times W}}$, where N represents the number of samples, C denotes the number of channels, and H and W signify the height and width of the feature map, respectively. Subsequently, the style feature ${{T}_{nc}}\in {{R}^{N\times C\times d}}$ is obtained following statistical processing by the style pooling layer. The style feature is expressed as follows:

$$\begin{aligned} {{\text {T}}_{nc}}\text { = }\!\![\!\!\,{{\mu }_{nc}},{{\sigma }_{nc}}] \end{aligned}$$

(5)

The style integration layer is an adaptive weighted operation comprising a channel and a fully connected layer, a batch normalization layer, and a sigmoid activation function. The style integration layer is capable of converting style features into style weights at the channel level. These style weights serve to represent the relative importance of different style features within each channel. The model’s learning of these style features is either reinforced or inhibited in accordance with the weight value.

Discriminator networks

The discriminator used is PatchGAN²⁵, which is mainly composed of three network layer components: an ordinary convolutional layer, instantiation normalization layer, and nonlinear activation layer. The basic idea of the discriminator is to segment the input image into multiple patches, judge the truth or falsity of each patch, and finally aggregate all patch judgments to obtain the truth or falsity of all images. To control the smoothness of the function, the traditional convolutional batch normalization function (BN) in the Markov discriminator was replaced by the gradient normalization function (GN). The gradient normalization function can effectively balance the gradient distribution of each layer in the network, improve the convergence speed and stability of the network, and reduce the risk of overfitting. The structure of the discriminator network is shown in Fig. 7.

Loss function

In order to improve the quality of the generated image, make the output image of the generative network as close as possible to the input image, and avoid the generator to modify the color tone of the generated image, this paper adds a constant mapping loss function (identity loss) to the total loss function. The identity loss function is shown in Eq. (6):

$$\begin{aligned} {{L}_{identity}}(G,F)={{E}_{y\sim {{P}_{data}}(y)}}\left[ ||G(y)-y|{{|}_{1}} \right] +{{E}_{x\sim {{P}_{data}}(x)}}\left[ ||G(x)-x|{{|}_{1}} \right] \end{aligned}$$

(6)

The generator G fits the input image x from the data distribution in the X domain to the data distribution associated with the Y domain $G\left( x \right)$, and the discriminator ${{D}_{X}}$ discriminates G(x) and feeds the generated adversarial loss back to the generator G. The loss at this point is shown in Eq. (7):

$$\begin{aligned} {{L}_{GAN}}(G,{{D}_{Y}},X,Y)={{E}_{y\sim {{P}_{data}}(y)}}\left[ \log {{D}_{Y}}\left( y \right) \right] +{{E}_{x\sim {{P}_{data}}\left( x \right) }}\left[ \log \left( 1-{{D}_{Y}}\left( G\left( x \right) \right) \right) \right] \end{aligned}$$

(7)

The generator F fits the input image y from the data distribution in the X domain to the data distribution F(y) associated with the Y domain, and the discriminator ${{D}_{Y}}$ discriminates against $G\left( x \right)$ and feeds the generated adversarial loss back to the generator G. At this point, the loss is as shown in Eq. (8):

$$\begin{aligned} {{L}_{GAN}}\left( F,{{D}_{Y}},Y,X \right) ={{E}_{x\sim {{P}_{data}}\left( x \right) }}\left[ \log {{D}_{X}}\left( x \right) \right] +{{E}_{y\sim {{P}_{data}}\left( y \right) }}\left[ \log \left( 1-{{D}_{X}}\left( F\left( y \right) \right) \right) \right] \end{aligned}$$

(8)

The CycleGAN model introduces loss of cyclic consistency while learning the feature mappings of the generators F and G. The principle of cyclic consistency is to obtain the forged sample G(x) from the X domain and then feed the forged sample into the generator F(G(x)) while constraining the reconstruction of the sample from the source domain $F(G(x))=x$, that is,$x\rightarrow G(x)\rightarrow F(G(x))\approx x$. Similarly for the sample y in the Y domain, both the generator and the discriminator satisfy the cyclic consistency: $y\rightarrow G(y)\rightarrow F(G(y))\approx y$. The cyclic consistency loss function can be expressed by Eq. (9):

$$\begin{aligned} {{L}_{cyc}}(G,F)={{E}_{x\sim {{P}_{data}}(x)}}\left[ ||F(G(x))-x|{{|}_{1}} \right] +{{E}_{y\sim {{P}_{data}}(y)}}\left[ ||F(G(y))-y|{{|}_{1}} \right] \end{aligned}$$

(9)

Finally, the total loss function of the model is the sum of the two adversarial losses and the cyclic consistency loss, as shown in Eq. (10):

$$\begin{aligned} L(G,F,{{D}_{X}},{{D}_{Y}})={{L}_{GAN}}(G,{{D}_{X}},X,Y)+{{L}_{GAN}}(F,{{D}_{X}},Y,X)+\lambda {{L}_{cyc}}(G,F) \end{aligned}$$

(10)

Thus, the proposed expression for the total loss function for model training is

$$\begin{aligned} {{L}_{Generator}}={{L}_{GAN}}(G,{{D}_{X}},X,Y)+{{L}_{GAN}}(F,{{D}_{X}},Y,X)+{{\lambda }_{1}}{{L}_{cyc}}(G,F)+{{\lambda }_{2}}{{L}_{identity}}(G,F) \end{aligned}$$

(11)

where the ${{\lambda }_{1}},{{\lambda }_{2}}$ parameter is used to control the linear combination of these losses.

link

Applied research on innovation and development of blue calico of Chinese intangible cultural heritage based on artificial intelligence