高级检索+

Transformer优化及其在苹果病虫命名实体识别中的应用

蒲攀, 张越, 刘勇, 聂炎明, 黄铝文

蒲攀, 张越, 刘勇, 聂炎明, 黄铝文. Transformer优化及其在苹果病虫命名实体识别中的应用[J]. 农业机械学报, 2023, 54(6): 264-271.
引用本文: 蒲攀, 张越, 刘勇, 聂炎明, 黄铝文. Transformer优化及其在苹果病虫命名实体识别中的应用[J]. 农业机械学报, 2023, 54(6): 264-271.
PU Pan, ZHANG Yue, LIU Yong, NIE Yan-ming, HUANG Lu:-wen. Transformer Optimization and Application in Named Entity Recognition of Apple Diseases and Pests[J]. Transactions of the Chinese Society for Agricultural Machinery, 2023, 54(6): 264-271.
Citation: PU Pan, ZHANG Yue, LIU Yong, NIE Yan-ming, HUANG Lu:-wen. Transformer Optimization and Application in Named Entity Recognition of Apple Diseases and Pests[J]. Transactions of the Chinese Society for Agricultural Machinery, 2023, 54(6): 264-271.

Transformer优化及其在苹果病虫命名实体识别中的应用

基金项目: 

陕西省重点研发计划项目(2019ZDLNY07-06-01)

详细信息
    作者简介:

    蒲攀(1982—),男,讲师,博士,主要从事农业物联网与传感器网络研究,E-mail:pupan@nwsuaf.edu.cn

    通讯作者:

    黄铝文(1976—),男,副教授,主要从事生物图像处理研究,E-mail:huanglvwen@nwsuaf.edu.cn

  • 中图分类号: TP391.1;S436.611

Transformer Optimization and Application in Named Entity Recognition of Apple Diseases and Pests

  • 摘要: 为提高苹果生产领域实体识别的准确性,提出一种新的Transformer优化模型。首先,为解决苹果生产数据集的缺失,基于苹果栽培领域园艺专家的知识经验,创建以苹果病虫害为主的产业数据集。通过字向量与词向量的拼接,提高文本语义表征的准确性;随后,为防止位置信息缺失,引入具有方向和距离感知的注意力机制,平均集成BiLSTM的上下文长距离依赖特征;最后,结合条件随机场(Conditional random fields, CRF)约束上下文标注结果,最终得到Transformer优化模型。实验结果表明,所提方法在苹果病虫命名实体识别中的F1值可达92.66%,可为农业命名实体的准确智能识别提供技术手段。
    Abstract: Aiming to improve the accuracy of entity identification in apple production field, a new Transformer optimization model was proposed. Firstly, in order to address the lack of apple production dataset, a corpus focusing on diseases and pests was constructed based on the knowledge and experience of horticultural experts in related field of apple cultivation. The accuracy of semantic representation of text was improved by combining word vector and character vector. Secondly, since the location information was crucial to text semantics, but the traditional Transformer model lacks the directionality of location information, in order to take advantage of the location features of text, an attention mechanism with direction and distance perception was introduced in Transformer encoder. And the contextual long-distance dependence features of BiLSTM was integrated on average to enhance semantic representation. Lastly, with imposing restrictions on labeling results by conditional random fields(CRF), the Transformer optimization model was obtained. The experimental results showed that the F1 score of the proposed method was 92.66% in Chinese named entity recognition of Apple diseases and pests. It indicated that the method proposed could effectively identify the named entities of apple diseases and pest, and provide a technical means for the accurate and intelligent identification of other agricultural named entities.
  • [1] 张山清,普宗朝,李新建,等.气候变化对新疆苹果种植气候适宜性的影响[J].中国农业资源与区划,2018,39(8):255-264.ZHANG Shanqing,PU Zongchao,LI Xinjian,et al.Impact of climate change on apple-planting climatic suitability in Xinjiang[J].Chinese Journal of Agricultural Resources and Regional Planning,2018,39(8):255-264.(in Chinese)
    [2] 李想,魏小红,贾璐,等.基于条件随机场的农作物病虫害及农药命名实体识别[J].农业机械学报,2017,48(增刊):178-185.LI Xiang,WEI Xiaohong,JIA Lu,et al.Recognition of crops,diseases and pesticides named entities in Chinese based on conditional random fields[J].Transactions of the Chinese Society for Agricultural Machinery,2017,48(Supp.):178-185.(in Chinese)
    [3] 黄健格,贾真,张凡,等.基于多特征嵌入的中文医学命名实体识别[J/OL].计算机科学,2023:1-12. http://kns.cnki.net/kcms/detail/50.1075.TP.20230308.1155.008.html.HUANG" target="_blank"> http://kns.cnki.net/kcms/detail/50.1075.TP.20230308.1155.008.html.HUANG Jian'ge,JIA Zhen,ZHANG Fan,et al.Chinese medical named entity recognition based on multi-feature embedding[J/OL].Computer Science,2023:1-12. http://kns.cnki.net/kcms/detail/50.1075.TP.20230308.1155.008.html.(in Chinese)
    [4] 刘合兵,张德梦,熊蜀峰,等.融合ALBERT与规则的小麦病虫害命名实体识别[J/OL].计算机科学与探索,2022:1-12. http://kns.cnki.net/kcms/detail/11.5602.TP.20220704.1056.002.html.LIU" target="_blank"> http://kns.cnki.net/kcms/detail/11.5602.TP.20220704.1056.002.html.LIU Hebing,ZHANG Demeng,XIONG Shufeng,et al.Named entity recognition of wheat diseases and pests fusing ALBERT and rules[J/OL].Journal of Frontiers of Computer Science and Technology,2022:1-12. http://kns.cnki.net/kcms/detail/11.5602.TP.20220704.1056.002.html.(in Chinese)
    [5] 刘巨升,于红,杨惠宁,等.基于多核卷积神经网络(BERT+Multi-CNN+CRF)的水产医学嵌套命名实体识别[J].大连海洋大学学报,2022,37(3):524-530.LIU Jusheng,YU Hong,YANG Huining,et al.Recognition of nested named entities in aquature medicine based on multi-kernel convolution(BERT+Multi-CNN+CRF)[J].Journal of Dalian Ocean University,2022,37(3):524-530.(in Chinese)
    [6] 郭旭超,唐詹,刁磊,等.基于部首嵌入和注意力机制的病虫害命名实体识别[J].农业机械学报,2020,51(增刊2):335-343.GUO Xuchao,TANG Zhan,DIAO Lei,et al.Recognition of Chinese agricultural diseases and pests named entity with joint radical-embedding and self-attention mechanism[J].Transactions of the Chinese Society for Agricultural Machinery,2020,51(Supp.2):335-343.(in Chinese)
    [7] 李林,周晗,郭旭超,等.基于多源信息融合的中文农作物病虫害命名实体识别[J].农业机械学报,2021,52(12):253-263.LI Lin,ZHOU Han,GUO Xuchao,et al.Named entity recognition of diseases and insect pests based on multi source information fusion[J].Transactions of the Chinese Society for Agricultural Machinery,2021,52(12):253-263.(in Chinese)
    [8]

    ZHANG J,GUO M,GENG Y,et al.Chinese named entity recognition for apple diseases and pests based on character augmentation[J].Computers and Electronics in Agriculture,2021,190:106464.

    [9]

    VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[C]//Advances in Neural Information Processing Systems,2017.

    [10] 李想,王卫兵,尚学达.指针生成网络和覆盖损失优化的Transformer在生成式文本摘要领域的应用[J].计算机应用,2021,41(6):1647-1651.LI Xiang,WANG Weibing,SHANG Xueda.Application of Transformer optimized by pointer generator network and coverage loss in field of abstractive text summarization[J].Journal of Computer Applications,2021,41(6):1647-1651.(in Chinese)
    [11]

    DAI Z,YANG Z,YANG Y,et al.Transformer-XL:attentive language models beyond a fixed-length context[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics,2019:2978-2988.

    [12]

    GUO Q,QIU X,LIU P,et al.Star-Transformer[C]//Proceedings of NAACL-HLT,2019:1315-1325.

    [13]

    YAN H,DENG B,LI X,et al.TENER:adapting transformer encoder for named entity recognition[J].arXiv preprint.arXiv:1911.04474,2019.

    [14] 赵鹏飞,赵春江,吴华瑞,等.基于注意力机制的农业文本命名实体识别[J].农业机械学报,2021,52(1):185-192.ZHAO Pengfei,ZHAO Chunjiang,WU Huarui,et al.Named entity recognition of Chinese agricultural text based on attention mechanism[J].Transactions of the Chinese Society for Agricultural Machinery,2021,52(1):185-192.(in Chinese)
    [15]

    LI X,MENG Y,SUN X,et al.Is word segmentation necessary for deep learning of Chinese representations?[C]//The Association for Computational Linguistics,2019:3242-3252.

    [16]

    ZHANG Y,YANG J.Chinese NER using lattice LSTM[J].arXiv preprint.arXiv:1805.02023,2018.

    [17]

    SAK H,SENIOR A W,BEAUFAYS F.Long short-term memory based recurrent neural network architectures for large vocabulary speech recognition[J].arXiv preprint.arXiv.1402.1128,2014.

    [18]

    WANG Q,XIA Y,ZHOU Y,et al.Incorporating dictionaries into deep neural networks for the Chinese clinical named entity recognition[J].Journal of Biomedical Informatics,2019,92:103133.

    [19] 郭知鑫,邓小龙.基于BERT-BiLSTM-CRF的法律案件实体智能识别方法[J].北京邮电大学学报,2021,44(4):129-134.GUO Zhixin,DENG Xiaolong.Intelligent identification method of legal case entity based on BERT-BiLSTM-CRF[J].Journal of Beijing University of Posts and Telecommunications,2021,44(4):129-134.(in Chinese)
    [20]

    WANG Y,SUN Y,MA Z,et al.A hybrid model for named entity recognition on Chinese electronic medical records[J].ACM Transactions on Asian and Low-Resource Language Information Processing,2021,20(2):1-12.

    [21]

    GRAVES A,SCHMIDHUBER J.Framewise phoneme classification with bidirectional LSTM networks[C]//Proceedings of the IEEE International Joint Conference on Neural Networks,2005:2047-2052.

    [22] 张云秋,汪洋,李博诚.基于RoBERTa-wwm动态融合模型的中文电子病历命名实体识别[J].数据分析与知识发现,2022,6(2/3):242-250.ZHANG Yunqiu,WANG Yang,LI Bocheng.Identifying named entities of Chinese electronic medical records based on RoBERTa-wwm dynamic fusion model[J].Data Analysis and Knowledge Discovery,2022,6(2/3):242-250.(in Chinese)
    [23]

    LAFFERTY J,MCCALLUM A,PEREIRA F.Conditional random fields:probabilistic models for segmenting and labeling sequence data[C]//Proceedings of the Eighteenth International Conference on Machine Learning,2001:282-289.

    [24]

    JOHNSON S,SHEN S,LIU Y.CWPC_BiAtt:character-word-position combined BiLSTM-attention for Chinese named entity recognition[J].Information,2020,11(1):45.

计量
  • 文章访问数:  0
  • HTML全文浏览量:  0
  • PDF下载量:  0
  • 被引次数: 0
出版历程
  • 收稿日期:  2022-11-21
  • 刊出日期:  2023-06-24

目录

    /

    返回文章
    返回