[1]秦瑞琳.一种改进的基于n-gram的古汉语断句与标点方法[J].集美大学学报(自然科学版),2025,(2):198-204.
QIN Ruilin.An Improved Method Based on n-gram Model for Ancient Chinese Sentence Segmentation and Punctuation[J].Journal of Jimei University,2025,(2):198-204.
点击复制
一种改进的基于n-gram的古汉语断句与标点方法(PDF)
《集美大学学报(自然科学版)》[ISSN:1007-7405/CN:35-1186/N]
- 卷:
-
- 期数:
-
2025年第2期
- 页码:
-
198-204
- 栏目:
-
数理科学与信息工程
- 出版日期:
-
2025-03-28
文章信息/Info
- Title:
-
An Improved Method Based on n-gram Model for Ancient Chinese Sentence Segmentation and Punctuation
- 作者:
-
秦瑞琳
-
集美大学计算机工程学院,福建 厦门 361021
- Author(s):
-
QIN Ruilin
-
College of Computer Engineering,Jimei University,Xiamen 361021,China
-
- 关键词:
-
古汉语; 断句; 标点; n-gram模型; 深度学习
- Keywords:
-
ancient Chinese; sentence segmentation; punctuation; n-gram model; deep learning
- 分类号:
-
-
- DOI:
-
-
- 文献标志码:
-
A
- 摘要:
-
古汉语文本的自动断句与标点对提高我国古籍整理的自动化水平具有重要意义。现有古汉语断句与标点算法大多缺少对前后标点间相互影响的考虑。针对这一问题,本文提出一种改进的基于n-gram的古汉语断句与标点方法。该方法综合考虑了二元组到五元组的上下文信息,加权计算当前位置标点的概率,并据此辅助计算前后位置标点的概率,从而反映出前后标点间的相互影响。在多种古籍语料上的实验表明,所提方法在断句任务上能够取得比现有n-gram和GRU-RNN模型更高的F1值,且在部分语料上的断句与标点性能优于BiLSTM+CRF模型。
- Abstract:
-
The automatic sentence segmentation and punctuation of ancient Chinese texts are of great significance to the improvement of the automatic level of Chinese ancient books.Most of the existing algorithms lack the consideration of the interaction between the preceding and the following punctuation marks.To address this issue,this paper proposes an improved method based on n-gram model.The method comprehensively considers the contextual information from 2grams to 5grams and calculates the punctuation probability of current position by weighting,which further assists in calculating the punctuation probability of the preceding and the following position,thereby reflecting the mutual influence between the preceding and the following punctuation marks.Experiments on various ancientbook corpora show that the proposed method achieves higher F1 scores than existing n-gram and GRU-RNN models on sentence segmentation,and performs better than BiLSTM+CRF model on sentence segmentation and punctuation in some corpora.
参考文献/References:
相似文献/References:
更新日期/Last Update:
2025-04-25