IRMA-International.org: Creator of Knowledge
Information Resources Management Association
Advancing the Concepts & Practices of Information Resources Management in Modern Organizations

Semantic Annotation Model and Method Based on Internet Open Dataset

Semantic Annotation Model and Method Based on Internet Open Dataset
View Sample PDF
Author(s): Xin Gao (State Grid Beijing Electric Power Company, China), Yansong Wang (State Grid Beijing Electric Power Company, China), Fang Wang (State Grid Beijing Electric Power Company, China), Baoqun Zhang (State Grid Beijing Electric Power Company, China), Caie Hu (State Grid Beijing Electric Power Company, China), Jian Wang (State Grid Beijing Electric Power Company, China)and Longfei Ma (State Grid Beijing Electric Power Company, China)
Copyright: 2025
Volume: 21
Issue: 1
Pages: 19
Source title: International Journal of Intelligent Information Technologies (IJIIT)
Editor(s)-in-Chief: Vijayan Sugumaran (Oakland University, Rochester, USA)
DOI: 10.4018/IJIIT.370966

Purchase

View Semantic Annotation Model and Method Based on Internet Open Dataset on the publisher's website for pricing and purchasing information.

Abstract

Traditional semantic annotation faces the problem of dataset diversity. Different fields and scenarios need to be specially annotated, and annotation work usually requires a lot of manpower and time investment. To meet these challenges, this paper deeply studies the semantic annotation model and method based on internet open datasets, aiming to improve annotation efficiency and accuracy and promote data resource sharing and utilization. This paper selects Common Crawl dataset to provide sufficient training samples; methods such as removing stop words and deduplication are used to preprocess data to improve data quality; a keyword extraction model based on heuristic rules and text context is constructed. In terms of semantic annotation model, this paper constructs a model based on Bidirectional Long Short-Term Memory (BiLSTM), which can make full use of the part-of-speech information of the corpus context, capture the part-of-speech features of the corpus, and generate semantic tags through supervised learning.

Related Content

Tingting Guo. © 2025. 15 pages.
Ran Hu, Xi Lin. © 2025. 18 pages.
Tiffanie Turner-Henderson. © 2025. 16 pages.
Anshu Saxena Arora, Luisa Saboia, Amit Arora, John R. McIntyre. © 2025. 13 pages.
Yuanlong Ye, Hui Zhang. © 2025. 17 pages.
Xin Gao, Yansong Wang, Fang Wang, Baoqun Zhang, Caie Hu, Jian Wang, Longfei Ma. © 2025. 19 pages.
Sidra Zaheer, Congzhi Ma, Yimeng Zhu, Sheri Vasinda. © 2025. 34 pages.
Body Bottom