RESEARCH AND APPLICATION OF AUTOMATICAL ANNOTATION ON CHINESE PATENTS
Deng Na
Hubei University of Technology
Copyright © 2017 by Cayley Nielson Press, Inc.
ISBN: 978-0-9992443-1-9
Cayley Nielson Press Scholarly Monograph Series Book Code No.: 140-1-2
US$96.75
Preface
With the development of society, people are more and more aware of the tremendous changes in our life brought about by innovation. As one of the most important ways to protect innovation, patent has been paid more and more attention. More and more patents are accumulated in the worldwide since the amount of patents applications increases year by year. Because patents contain rich technology, economy and law information, patent analysis and mining has become an important research topic in the field of data mining. However, currently, the annotation of Chinese patents still rely on human work. This book focuses on the research and application of automatical annotation of Chinese patents.
The author is supported financially by This paper was supported by Research Foundation for Advanced Talents of Hubei University of Technology (No.BSQD12131), Fundamental Research Funds for the Young Teachers' Innovation project of Zhongnan University of Economics and Law (No.2014147), National Natural Science Foundation of China (No.61201250), Natural Science Foundation of Anhui Province (No. 1308085QF103), Guangxi Natural Science Foundation (No.2012GXNSFBA053174), and Guangxi University Key Lab of Cloud Computing and Complex System Found.
As the co-instructor, Professor Chunzhi Wang and Professor Zhiwei Ye devoted a lot of effort. Because the situation and the level is limited, the selection and evaluation of the methods are inappropriate or even wrong department, I were thankful that the readers could give criticism and correction.
Contents
Preface......................................................................................................... I
1 Automatically generation and evaluation of stop words list for Chinese Patents 1
1.1 Introduction........................................................................................... 1
1.2 The stop word list for general Chinese texts......................................... 4
1.3 Word Segmentation of Chinese............................................................ 6
1.4 Two methodologies of generating stop words lists for Chinese patents 6
1.4.1 Methodology One: based on the most simple and general strategy... 7
1.4.2 Methodology Two: based on statistics............................................... 7
1.4.3 Modification and adjustment of SAT................................................. 9
1.5 Analysis and Evaluation of Experiment.............................................. 10
1.5.1 Dataset.............................................................................................. 10
1.5.2 Using Methodology One to generate stop words list under corpuses with different scales 10
1.5.3 The accuracies of stop words list under corpuses with different scales using Methodology One 11
1.5.4 Methodology One’s list compared with the list for general texts.... 12
1.5.5 The accuracies of stop words list under corpuses with different scales using Methodology Two 13
1.6. Conclusion.......................................................................................... 14
2 Keywords extraction for Chinese patents based on Part-of-speech tagging and stop words elimination 15
2.1 Introduction......................................................................................... 15
2.2 Related work....................................................................................... 16
2.3 Our methodology................................................................................ 17
2.4 Word Segmentation and Part-of-speech tagging................................ 17
2.5 Elimination of stop words in patents................................................... 20
2.6 Algorithm............................................................................................ 20
2.7 Experiment.......................................................................................... 21
2.7.1 Dataset.............................................................................................. 21
2.7.2 Extraction result............................................................................... 21
2.7.3 The accuracy of keywords extraction and its influence factors....... 23
2.8 Conclusion........................................................................................... 23
3 A functional clauses extraction method of Chinese patents in specific domain based on templates 24
3.1 Introduction......................................................................................... 24
3.2 Concept............................................................................................... 26
3.2.1 Function clause................................................................................. 26
3.2.2 Template........................................................................................... 27
3.3 The collection and classification of templates..................................... 28
3.3.1 The collection of templates............................................................... 28
3.3.2 The classification of templates......................................................... 28
3.4 The extraction method of function clauses......................................... 29
3.5 Experiments......................................................................................... 30
3.6 Conclusions......................................................................................... 31
4 The construction method of clue words thesaurus in Chinese patents based on iteration and self-filtering.... 32
4.1 Introduction......................................................................................... 32
4.2 Related work....................................................................................... 33
4.3 Clue words.......................................................................................... 34
4.4 Algorithm............................................................................................ 36
4.4.1 Self-filtering..................................................................................... 37
4.4.2 Locating candidate effect statements.............................................. 38
4.5 Experiments......................................................................................... 40
4.5.1 Collection of initial clue words........................................................ 40
4.5.2 Iteration............................................................................................ 41
4.6 Conclusion and future work................................................................ 41
5 PaEffExtr: A Method to Extract Effect Statements Automatically from Patents 43
5.1 Introduction......................................................................................... 43
5.2 Related Work...................................................................................... 44
5.3 Characteristics of Patent Abstracts..................................................... 45
5.4 Multi-features fused scoring algorithm............................................... 47
5.4.1 Calculation of distribution score...................................................... 48
5.4.2 Calculation of morphological score.................................................. 49
5.4.3 Algorithm PaEffExtr........................................................................ 50
5.4.4 Evaluation of algorithm.................................................................... 52
5.5 Experiments......................................................................................... 53
5.5.1 Clue words....................................................................................... 53
5.5.2 Comparative experiments................................................................. 54
5.5.3 Runtime............................................................................................ 55
5.6 Conclusion and future work................................................................ 56
6 Intelligent Recommendation of Chinese Traditional Medicine Patents Supporting New Medicine’s R&D....... 57
6.1 Introduction......................................................................................... 57
6.2 Related Work...................................................................................... 59
6.3 Chinese traditional medicine patents’ intelligent recommendation..... 62
6.3.1 Architecture...................................................................................... 62
6.3.2 Construction of Alias Database........................................................ 62
6.3.3 Technology Annotation Automatically in Chinese Traditional Medicine Patents 64
6.3.4 The Similarity of Patents.................................................................. 65
6.4 Experiments......................................................................................... 66
6.4.1 Recalling optimization...................................................................... 67
6.4.2 Patent recommendation.................................................................... 68
6.5 Conclusion and Future Work.............................................................. 79
References................................................................................................. 80
Readership
This book should be useful for students, scientists, engineers and professionals working in the areas of optoelectronic packaging, photonic devices, semiconductor technology, materials science, polymer science, electrical and electronics engineering. This book could be used for one semester course on adhesives for photonics packaging designed for both undergraduate and graduate engineering students.