Open Access Paper
28 December 2022 Design of WeChat teaching information retrieval platform based on DCMH
Author Affiliations +
Proceedings Volume 12506, Third International Conference on Computer Science and Communication Technology (ICCSCT 2022); 1250649 (2022) https://doi.org/10.1117/12.2662354
Event: International Conference on Computer Science and Communication Technology (ICCSCT 2022), 2022, Beijing, China
Abstract
Deep Semantic Multimodal Hashing (DCMH) is a cross-modal hash retrieval method based on a deep learning framework that combines deep feature learning and hash learning. The method uses the extracted text and image data features as hash codes to form an end-to-end network of directly discrete optimized binary hash codes. In the era of new media, the WeChat public platform is one of the most important channels for teaching information services in colleges and universities, so that information services can directly point to the needs of teachers and students. Using the WeChat public account as the front end , based on DCMH, a method of extracting fine-grained features on images and texts was designed, and it set up feature learning and label prediction for multi-modal data, built a multi-modal retrieval platform based on WeChat public account, and designed DCMH-based feature training data mining and public account front multimodal data interaction model diagram, keeping up with the digital age to meet the development requirements, to achieve a major change in teaching management, and improve the efficiency and level of service.

1.

INTRODUCTION

At this stage, the rapid development of Internet + and big data technology has brought new challenges and new opportunities for the innovation of teaching management1. It is imperative to actively meet the challenges of the digital age. In teaching management, students’ daily activities and teaching management generate massive multimodal data. There are many types of data presentation, large amount of data, unsystematic and inconvenient retrieval. Traditional single-modal data retrieval cannot meet the retrieval needs of teachers and students for various types of data, and a wide variety of multimedia data poses a challenge to retrieval2-6. Taking advantage of the characteristics of low storage capacity and high retrieval efficiency of hash codes, multi-modal hash retrieval has gradually attracted the attention and research of the industry among many retrieval methods.

The key issue of multimodal hash retrieval is how to exploit the latent associations in heterogeneous data to shorten the semantic gap. Most methods ignore the interactive exploration of latent semantically relevant information and only use a binary matrix to represent the degree of relevance. They fail to capture deeper semantic information between multi-label data, ignore the maintenance of semantic structure and the preservation of data features, thus affecting the performance of cross-modal retrieval7-8. The hash-based multimodal retrieval method solves the limitation of large-scale data storage and retrieval time, and brings the possibility for us to build a corresponding platform.

At the same time, as one of the important social software in our lives, WeChat has applied the WeChat public platform to teaching management, so that teachers and students can get personalized teaching services without installing application software, eliminating the need for new educational software. The sense of exclusion and strangeness can effectively reduce the cost of teaching management, and it is one of the better entrances for the front-end construction of multimodal systems.

2.

DATA INTERACTION METHOD BASED ON MULTI-MODAL HASH RETRIEVAL OF WECHAT PUBLIC ACCOUNT

The WeChat official account provides a development interface for multi-modal data interaction. When a user sends a message to the official account (or when some specific user operation triggers an event push), a POST request will be generated, and the developer can use the response packet (Get) returns a specific XML structure, the following code returns the graphic data to the user, and other modal data is returned through the XML structure similar to9-15:

00151_PSISDG12506_1250649_page_2_1.jpg

The received and returned data supports text, pictures, graphics, voice, video, and music. It provides a unified entrance for the construction of educational administration platform, as shown in Figure 1.

Figure 1.

Multimodal data interaction platform based on official account.

00151_PSISDG12506_1250649_page_2_2.jpg

3.

MATHEMATICAL MODEL OF MULTIMODAL HASH RETRIEVAL BASED ON DCMH

DCMH is the first cross-modal hash retrieval method that combines deep feature learning and hash learning into a complete deep learning-based framework. The method uses the extracted text and image data features as hash codes to form an end-to-end network of directly discrete optimized binary hash codes. Under the supervision of labels, the final learned hash codes remain relevant. The objective function of DCMH is defined as follows:

00151_PSISDG12506_1250649_page_3_1.jpg

The three terms represent the cross-modal similarity loss, respectively; the discretized hash code is kept close to the features extracted by the continuous deep network, maintaining the similarity correlation and balancing the loss. DCMH data input uses 00151_PSISDG12506_1250649_page_3_2.jpg to represent image data and 00151_PSISDG12506_1250649_page_3_3.jpg to represent text data. In addition, the similarity matrix S is used to measure the similarity between text and image modalities, that is, when Sij =1, it means that text Yi and image Xi are similar.

The DCMH framework mainly takes text and pictures as examples, as shown in Figure 2, and is divided into two parts. First, feature learning is performed. This part can be regarded as the corresponding real-valued representations obtained by projecting the text and picture data respectively. For image data, use the CNN-F model. For text data, first use bag-of-words to represent, and then get text representation through fully connected layer16-24.

Figure 2.

Frame diagram based on DCMH algorithm.

00151_PSISDG12506_1250649_page_3_4.jpg

How to quickly retrieve multimodal data, the semantic features between different types of data are often inconsistent. The low-level expressive features such as size, colour, outline, etc. selected in the image cannot fully describe the complete content. To solve the heterogeneous differences, it is necessary to build the association between semantic expressions and different modal data, and measure the similarity through semantic association to complete cross-modal hash retrieval. Different types of data with semantics learn the same hash code as possible, and samples with different semantics learn different hash codes.

4.

DESIGN OF DCMH MULTIMODAL HASH SYSTEM BASED ON PYTHON

4.1

Image semantic feature extraction

Image training distinguishes different types of images based on their semantic information. It is an important fundamental problem in computer vision and the basis for other advanced vision tasks such as image detection, image segmentation, object tracking, and behaviour analysis25-30. We can program the picture according to the set training rules, the code is as follows:

00151_PSISDG12506_1250649_page_4_1.jpg

To detect an affordance cue in a previously unseen image, all features in the test image process are matched to the codebook, as shown in Figure 3. For each matchable codebook entry, each stored feature appears probabilistically, voting by a hypothetical object position in the generalization. The probabilistic vote is its distance from the codebook entry in the feature space, the edge strength of the corresponding image feature, and the amount of overlap between the occurrence of the stored feature and the interaction region of the original training energy supply cue.

Figure 3.

Image feature and codebook matching diagram.

00151_PSISDG12506_1250649_page_5_2.jpg

We use standard density estimation techniques to estimate the pattern in the voting distribution and accept that pattern as a confidence threshold-based test. Since we are interested in accurately estimating the exact location of a location in the image, we back-project these features into the image by choosing a fixed volume, which has a significant effect on both modalities. Current visual object category detection methods mainly focus on the identification of basic-level categories, such as animals, cars, people, and text in pictures.

4.2

Text semantic feature extraction

By using labelled data to train two classes into two complementary views, the protocol performed between class predictions generates pseudo-labelled unlabelled data and combines pseudo-labelled data with labels for further training.

00151_PSISDG12506_1250649_page_4_2.jpg00151_PSISDG12506_1250649_page_5_1.jpg

4.3

The extracted features are improved with semantic tags, and the hashes are stored in the database for retrieval by the platform

We take images as an example, use semantic labels to improve the feature learning part can preserve the semantic information of the learned features and maintain the invariance of cross-modal data. In addition, the loss between modalities, the cross-entropy loss and the quantization loss are used to guarantee that the ranking correlation of all similar instance pairs is higher than that of different instance pairs. The addition of semantic labels can be used to learn more consistent hash codes for interrelated cross-modal data, reduce the modal gap and improve the performance of cross-modal retrieval. The design label codes are as follows:

00151_PSISDG12506_1250649_page_5_3.jpg00151_PSISDG12506_1250649_page_6_1.jpg

5.

DESIGN OF INFORMATION RETRIEVAL MODEL FOR TEACHING INFORMATION RETRIEVAL BASED ON PUBLIC ACCOUNT MULTIMODAL HASH

The WeChat service account has an open API port, which can make full use of the developer documentation in it to customize development according to user needs. The front end is authenticated through WeChat Oauth2.0, and through the WeChat JSSDK interface, it is linked to the feature hash code data trained based on DCMH. The warehouse, through data mining, retrieves multi-modal data, and interacts with users through the WeChat public platform to provide cross-modal data services. The specific model flow chart is shown in Figure 4.

Figure 4.

DCMH feature training data mining and foreground multimodal data interaction model diagram.

00151_PSISDG12506_1250649_page_6_2.jpg

6.

CONCLUSION

By analyzing the multi-modal retrieval problem based on DCMH, the feature training design and application for graphic and text modal data are carried out. The main challenge of cross-modal retrieval is still the “semantic gap” between different modal data. In the background, semantic label training is added to improve the retrieval performance. The public platform is one of the important functions of WeChat. It provides convenience for our life, work and study. It is an APP with a very high penetration rate among teachers and students. By using the WeChat public platform as the front-end and using the open API interface, development is shortened. Combined with the trained multimodal data warehouse, we designed the application framework of the WeChat mobile teaching management platform. The new information technology prompted us to explore and establish a better teaching management model, laying a solid foundation for further building convenient personalized teaching services. solid foundation.

REFERENCES

[1] 

Li, Y., Miao, Z. and Wang, J., “Deep binary constraint hashing for fast image retrieval,” Electronics Letters, 54 (1), 25 –27 (2018). https://doi.org/10.1049/ell2.v54.1 Google Scholar

[2] 

Tang, X. M., Wang, Y. F. and Zou, F. H., “A fast large-scale image retrieval method based on multi-hash algorithm,” Computer Engineering and Science, 7 1316 –1321 (2016). Google Scholar

[3] 

Li, S., Tao, Z. Q., Li, K. and Fu, Y., “Visual to text: Survey of imagand video captioning,” IEEE Transactions on Emerging Topics in Computational Intelligence, 3 (4), 97 –312 (2019). https://doi.org/10.1109/TETCI Google Scholar

[4] 

Lin, Z. J., Ding, G. G. and Hu, M. Q., “Semantics-preserving hashing for cross-view retrieval,” IEEE Computer Society, 3864 –3872 (2015). Google Scholar

[5] 

Ding, G. G., Guo, Y. C. and Zhou, J. L., “Collective matrix factorization hashing for multimodal data,” IEEE Computer Society, 2083 –2090 (2014). Google Scholar

[6] 

Zhang, D. Q. and Li, W. J., “Large-scale supervised multimodal hashing with semantic correlation maximization,” in Proc. of the AAAI Conf. on Artificial Intelligence, 2177 –2183 (2014). Google Scholar

[7] 

Wang, D., Gao X. B. and Wang, X. M., “Semantic topic multi-modal hashing for cross-media retrieval,” AAAI, 3890 –3896 (2015). Google Scholar

[8] 

Liu, F. M. and Zhang, H., “A discriminant cross-modal hash retrieval algorithm based on multilevel semantics,” Computer Applications, 41 (8), 2187 –2192 (2021). Google Scholar

[9] 

Zhang, Q. Q., Tian, Y. D. and Yang, F., “Fusion retrieval model based on mathematical text and expression transformation,” Computer Engineering, 45 (3), 175 –181+187 (2019). Google Scholar

[10] 

Miao, F., Jia, H. D. and Xiong, Y. N., “Approximate neighbor selection method for mobile users based on service similarity,” Computer Engineering, 44 (5), 168 –173+179 (2018). Google Scholar

[11] 

Wang, K., He, R. and Wang, L., “Joint feature selection and subspace learning for cross-modal retrieval,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 1 –14 (2016). Google Scholar

[12] 

Bronstein, A. M. and Michel, F., “Data fusion through cross-modality metric learning using similarity-sensitive hashing,” in IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, 3594 –3601 (2010). Google Scholar

[13] 

Kumar, S. and Udupa, R., “Learning hash functions for cross-view similarity search,” in Proc. of the 22nd Inter. Joint Conf. on Artificial Intelligence, 1360 –1365 (2011). Google Scholar

[14] 

Wang, B., Yang, Y. and Xu, X., “Adversarial cross-modal retrieval,” in Proc. of the 25th ACM Inter. Conf. on Multimedia, (2017). Google Scholar

[15] 

Creswell, A., White, T. and Dumoulin, V., “Generative adversarial networks: An overview,” IEEE Signal Processing Magazine, 35 (1), 53 –65 (2018). https://doi.org/10.1109/MSP.2017.2765202 Google Scholar

[16] 

Li, C., Deng, C. and Li, N., “Self-supervised adversarial hashing networks for cross-modal retrieval,” IEEE Computer Society, 4242 –4251 (2018). Google Scholar

[17] 

Datar, M., Immorlica, N. and Indyk, P., “Locality-sensitive hashing scheme based on P-stable distributions,” in Symp. on Computational Geometry, 253 –262 (2004). Google Scholar

[18] 

Kulis, B. and Darrell, T., “Learning to hash with binary reconstructive embeddings,” in Conf. on Neural Information Processing Systems, 1042 –1050 (2009). Google Scholar

[19] 

Liu, W., Wang, J. and Ji, R., “Supervised hashing with kernels,” in IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2074 –2081 (2012). Google Scholar

[20] 

Karen, S. Y. and Andrew, Z., “Very deep convolutional networks for large-scale image recognition,” arXiv, (15562014). Google Scholar

[21] 

Russakovsky, O., Deng, J. and Su, H., “Image net: A large-scale hierarchical image database,” in Proc. of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, 248 –255 (2009). Google Scholar

[22] 

Huiskes, M. J. and Lew, M. S., “The MIR flickr retrieval evaluation,” in Proc. of the 1st ACM Inter. Conf. on Multimedia Information Retrieval, 39 –43 (2008). Google Scholar

[23] 

Chu, A. T., Tang, J. and Hong, R., “NUS-WIDE: A real-world web image database from National University of Singapore,” in Proc. of the ACM Inter. Conf. on Image and Video Retrieval, 1 –9 (2009). Google Scholar

[24] 

Yao, T., Kong, X. and Fu, H., “Discrete semantic alignment hashing for cross-media retrieval,” IEEE Transactions on Cybernetics, 99 1 –12 (2019). Google Scholar

[25] 

Toulouse, T., Rossi, L., Campana, A., Celik, T. and Akhloufi, M. A., “Computer vision for wildfire research: An evolving image dataset for processing and analysis,” Fire Safety Journal, 92 (2017). https://doi.org/10.1016/j.firesaf.2017.06.012 Google Scholar

[26] 

Liu, H., Qi, W. J. and Dai, J. F., “Application of pattern recognition technology in digital city image processing,” Microcomputer Information, (20), 107 –108 (2005). Google Scholar

[27] 

Wang, L. C., “Research status and prospect of database technology for subject information retrieval,” Chinese Library Journal, (04), 60 –63 (2004). Google Scholar

[28] 

Shih, H. C. and Liu, E. R., “New quartile-based region merging algorithm for unsupervised image segmentation using color-alone feature,” Information Sciences, 342 24 –36 (2016). https://doi.org/10.1016/j.ins.2015.12.030 Google Scholar

[29] 

Pourreza, A. and Kiani, K., “A partial-duplicate image retrieval method using color-based SIFT,” Iranian Conf. on Electrical Engineering, 1410 –1415 (2016). Google Scholar

[30] 

Wang, Q. W., Wan, S. H. and Yue, L. H., “A new spatial color histogram and its measurement method,” Small Microcomputer System, 35 (06), 1338 –1341 (2014). Google Scholar
© (2022) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Lingyun Huang "Design of WeChat teaching information retrieval platform based on DCMH", Proc. SPIE 12506, Third International Conference on Computer Science and Communication Technology (ICCSCT 2022), 1250649 (28 December 2022); https://doi.org/10.1117/12.2662354
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Feature extraction

Data modeling

Image compression

Binary data

Data mining

Lawrencium

Visualization

RELATED CONTENT


Back to Top