NATURAL LANGUAGE PROCESSING AND ANALYSIS ON PATENTS


laptop

NATURAL LANGUAGE PROCESSING AND ANALYSIS ON PATENTS

Deng Na

Hubei University of Technology

Copyright © 2024 by Cayley Nielson Press, Inc.

ISBN: 978-1-957274-20-1

Cayley Nielson Press Scholarly Monograph Series Book Code No.: 214-10-1

US$185.50

 

 

 

 

 

Preface


With China's increasing emphasis on intellectual property protection, the number of patent applications in China has been increasing year by year. As a special type of text, patents contain rich technical, economic, legal, and market information. The effective analysis and utilization of this information is of great significance to inventors, patent examiners, and enterprises. With the help of natural language processing, artificial intelligence, and complex network technology, this book explores issues such as information extraction, classification, community discovery, and evolution in patent texts, using real patent data from the fields of machinery, electric vehicles, and traditional Chinese medicine as experimental objects. The resolution of these issues will provide support and ideas for research directions such as patent knowledge graphs’ construction, patent mining, core patent analysis, and job opportunity discovery.

I would like to express my special thanks to my graduate and undergraduate students, such as Du Tiansi, Zheng Cheng, Cui Ruiyi, Fu Hao, Chen Tianci, and others from the School of Computer Science at Hubei University of Technology. They provide me a lot of assistance in completing many experiments and texts tasks.

Due to my limited knowledge, there might be some mistakes and flaws in this book, I am thankful that the readers could give criticism and correction.

Deng Na
Hubei University of Technology
Wuhan, China
April 8, 2024


 

 

Contents


Preface I
1 Terminology Extraction in Patents 1
1.1 Problem Description 1
1.2 Related Work 3
1.3 Model building 5
1.3.1 BERT pre-training model 5
1.3.2 LSTM layer 7
1.3.3 Conditional Random Field 9
1.3.4 The Proposed Model 10
1.4 Experiment 11
1.4.1 New energy vehicle patent text corpus construction 12
1.4.2 Annotation text processing flow 13
1.4.3 Experimental parameter setting 15
1.4.4 Experimental results and analysis 16
1.5 Conclusion 17
2 Technology and efficacy extraction of mechanical patents 19
2.1 Problem Description 19
2.2 Related Work 21
2.3 Conditional Random Field 23
2.4 Long and Short-term Memory Neural Networks 24
2.5 Our Model 27
2.6 Experiments 31
2.6.1 Experimental Data 31
2.6.2 Evaluation Criteria 32
2.6.3 Experimental Environment and Parameters 33
2.7 Conclusion 35
3 Named Entity Recognition of Traditional Chinese Medicine Patents 37
3.1 Problem Description 37
3.2 Related Work 39
3.3 Conditional Random Field 42
3.4 Long Short-Term Memory Neural Network 43
3.5 The Proposed Model 45
3.5.1 Input Layer 48
3.5.2 Embedding Layer 49
3.5.3 BiLSTM Layer 50
3.5.4 CRF Layer 51
3.6 Experiment 53
3.6.1 Experimental Procedure 53
3.6.2 Dataset Preparation 54
3.6.3 Sequence Labeling 55
3.6.4 Model Training 60
3.6.5 Model Test 60
3.6.6 Results and Discussion 62
3.7 Conclusion 65
4 TCM Patent Annotation to Support Medicine R&D and Patent Acquisition Decision-Making 67
4.1 Problem Description 67
4.2 Model building 70
4.3 Entity recognition process 71
4.4 Experimental results and analysis 75
4.4.1 Experimental data 75
4.4.2 Experimental results and analysis 76
4.5 Conclusion 80
5 NER and Association Rules Mining on Traditional Chinese Medicine Patents 81
5.1 Problem Description 81
5.2 Solution 83
5.2.1 Recognition of Words for TCM 84
5.2.2 Bitmap Representation 86
5.2.3 Apriori and Association Rules 87
5.3 Experimental results 92
5.3.1 Experimental Protocol 92
5.3.2 Statistical Results of Medicine Frequency 92
5.3.3 Frequent Itemset and Association Rules with Different Minimum Support 94
5.3.4 Association Rules Presentation 95
5.4 Conclusion 99
6 Classification of Patents 100
6.1 Problem Description 100
6.2 Related Work 101
6.3 Model building 104
6.3.1 BERT character embedding layer 106
6.3.2 BILSTM layer 106
6.4 Experiments and results analysis 107
6.4.1 Data Collection and Cleaning 107
6.4.2 Dataset Production 107
6.4.3 Experimental evaluation criteria 108
6.4.4 Experimental environment and parameter settings 109
6.4.5 Experimental results 110
6.4.6 Experimental analysis 111
6.5 Conclusion 112
7 Evolution analysis of R&D jobs based on patents’ technology efficacy labeling 114
7.1 Problem Description 114
7.2 Related Work 115
7.3 Job Prediction Model 119
7.3.1 Data Pre-processing 120
7.3.2 BiLSTM-CRF Neural Network Model Construction 120
7.3.3 Text Vectorization Representation 121
7.3.4 Similarity Calculation 122
7.4 Experimental Results and Analysis 122
7.4.1 Experimental Data 122
7.4.2 Experimental Environment 123
7.4.3 Technical Efficacy Extraction Results 123
7.4.4 Word Embedding 124
7.4.5 Similarity calculation 125
7.4.6 Patent Development 126
7.4.7 Evolution of Mechanical R&D Jobs 127
7.5 Conclusion 128
8 Detection and Evolution Analysis of TCM Patent Community 129
8.1 Problem Description 129
8.2 Related work 131
8.3 Related definitions 133
8.3.1 Node correlation degree 133
8.3.2 Node importance 135
8.3.3 Node similarity 136
8.4 Evaluation indicators 138
8.4.1 Global modularity 138
8.4.2 Local modularity 139
8.5 Algorithm 139
8.5.1 Algorithm Description 139
8.5.2 Algorithm Pseudocode 142
8.5.3 Effectiveness Analysis of the Algorithm 143
8.6 Experiment and analysis 148
8.6.1 Data Sources 148
8.6.2 Data preprocessing 149
8.6.3 Experimental results 153
8.6.4 Result analysis 158
8.7 Conclusions 159
References 161


 

Readership


This book should be useful for students, scientists, engineers and professionals working in the areas of optoelectronic packaging, photonic devices, semiconductor technology, materials science, polymer science, electrical and electronics engineering. This book could be used for one semester course on adhesives for photonics packaging designed for both undergraduate and graduate engineering students.

 

Originality and Plagiarism

Prospective authors should note that only original and previously unpublished manuscripts will be considered. The authors should ensure that they have written entirely original works, and if the authors have used the work and/or words of others, that this has been appropriately cited or quoted. Furthermore, simultaneous submissions are not acceptable. Submission of a manuscript is interpreted as a statement of certification that no part of the manuscript is copyrighted by any other publication nor is under review by any other formal publication. It is the primary responsibility of the author to obtain proper permission for the use of any copyrighted materials in the manuscript, prior to the submission of the manuscript.