|本期目录/Table of Contents|

[1]秦瑞琳.一种改进的基于n-gram的古汉语断句与标点方法[J].集美大学学报(自然科学版),2025,(2):198-204.
 QIN Ruilin.An Improved Method Based on n-gram Model for Ancient Chinese Sentence Segmentation and Punctuation[J].Journal of Jimei University,2025,(2):198-204.
点击复制

一种改进的基于n-gram的古汉语断句与标点方法(PDF)
分享到:

《集美大学学报(自然科学版)》[ISSN:1007-7405/CN:35-1186/N]

卷:
期数:
2025年第2期
页码:
198-204
栏目:
数理科学与信息工程
出版日期:
2025-03-28

文章信息/Info

Title:
An Improved Method Based on n-gram Model for Ancient Chinese Sentence Segmentation and Punctuation
作者:
秦瑞琳
集美大学计算机工程学院,福建 厦门 361021
Author(s):
QIN Ruilin
College of Computer Engineering,Jimei University,Xiamen 361021,China
关键词:
古汉语断句标点n-gram模型深度学习
Keywords:
ancient Chinesesentence segmentationpunctuationn-gram modeldeep learning
分类号:
-
DOI:
-
文献标志码:
A
摘要:
古汉语文本的自动断句与标点对提高我国古籍整理的自动化水平具有重要意义。现有古汉语断句与标点算法大多缺少对前后标点间相互影响的考虑。针对这一问题,本文提出一种改进的基于n-gram的古汉语断句与标点方法。该方法综合考虑了二元组到五元组的上下文信息,加权计算当前位置标点的概率,并据此辅助计算前后位置标点的概率,从而反映出前后标点间的相互影响。在多种古籍语料上的实验表明,所提方法在断句任务上能够取得比现有n-gram和GRU-RNN模型更高的F1值,且在部分语料上的断句与标点性能优于BiLSTM+CRF模型。
Abstract:
The automatic sentence segmentation and punctuation of ancient Chinese texts are of great significance to the improvement of the automatic level of Chinese ancient books.Most of the existing algorithms lack the consideration of the interaction between the preceding and the following punctuation marks.To address this issue,this paper proposes an improved method based on n-gram model.The method comprehensively considers the contextual information from 2grams to 5grams and calculates the punctuation probability of current position by weighting,which further assists in calculating the punctuation probability of the preceding and the following position,thereby reflecting the mutual influence between the preceding and the following punctuation marks.Experiments on various ancientbook corpora show that the proposed method achieves higher F1 scores than existing n-gram and GRU-RNN models on sentence segmentation,and performs better than BiLSTM+CRF model on sentence segmentation and punctuation in some corpora.

参考文献/References:

相似文献/References:

备注/Memo

备注/Memo:
更新日期/Last Update: 2025-04-25