Volume 19 Issue 2
April  2022
Turn off MathJax
Article Contents
Yong Rui, Vicente Ivan Sanchez Carmona, Mohsen Pourvali, Yun Xing, Wei-Wen Yi, Hui-Bin Ruan, Yu Zhang. Knowledge Mining: A Cross-disciplinary Survey. Machine Intelligence Research, vol. 19, no. 2, pp.89-114, 2022. https://doi.org/10.1007/s11633-022-1323-6
 Citation: Yong Rui, Vicente Ivan Sanchez Carmona, Mohsen Pourvali, Yun Xing, Wei-Wen Yi, Hui-Bin Ruan, Yu Zhang. Knowledge Mining: A Cross-disciplinary Survey. Machine Intelligence Research, vol. 19, no. 2, pp.89-114, 2022.

# Knowledge Mining: A Cross-disciplinary Survey

##### doi: 10.1007/s11633-022-1323-6
• Author Bio:

Yong Rui received the B. Sc. degree in electrical engineering from Southeast University, China in 1991, the M. Sc. degree in electrical engineering from Tsinghua University, China in 1994, and the Ph. D. degree in electrical and computer engineering from University of Illinois at Urbana-Champaign (UIUC), USA in 1999. He is currently the Chief Technology Officer and Senior Vice President of Lenovo Group, China. He is a Fellow of ACM, IEEE, IAPR, China SPIE, CCF and CAAI, and a Foreign Member of Academia Europaea. He holds 70 patents, and is the recipient of the prestigious 2018 ACM SIGMM Technical Achievement Award and 2016 IEEE Computer Society Edward J. McCluskey Technical Achievement Award. His research interests include multimedia, artificial intelligence, big data and knowledge mining.E-mail: yongrui@lenovo.com (Corresponding author)

Vicente Ivan Sanchez Carmona received the B. Eng. and M. Eng. degree in computer engineering from National Autonomous University of Mexico, Mexico in 2008 and 2011, and the Ph. D. degree in computer science from University College London, UK in 2018. He is currently a researcher in Lenovo′s AI Lab, China. He has served as a reviewer in different conferences such as AAAI, ACL, CoNLL, COLING, among others.His research interests include artificial intelligence, behavioral science, cognitive science and human-computer interaction.E-mail: vcarmona@lenovo.com

Mohsen Pourvali received the Ph. D. degree in computer science from Ca′ Foscari University of Venice, ltaly in 2017. During the Ph. D. period, he was working on text summarization and document enrichment. Currently, he is an advisory researcher at AI Lab in Lenovo. He is an experienced lecturer with a demonstrated history of teaching in universities. His research interests include explainable artificial intelligence and knowledge graph, especially in domain adaptive information extraction. E-mail: mpourvali@lenovo.comORCID iD: 0000-0003-2653-9613

Yun Xing received the B. Sc. degree in optical information science and technology from Beijing Institute of Technology, China in 2012, and the M. Eng. degree in electronics and optics from Polytech Orleans, France in 2016. Currently, he is a NLP researcher in AI Lab at Lenovo Research, China.His research interests include natural language processing in machine learning and deep learning.E-mail: xingyun44@hotmail.com

Wei-Wen Yi received the B. Eng. and M. Eng. degrees in information and communication engineering from Beijing University of Posts and Telecommunications, China in 2017 and 2020, respectively. Currently, she is a natural language processing researcher at Lenovo Research, China. She received the Best Paper Award of the EAI International Conference on Communications and Networking in China, China in 2018. She is a member of EAI and IEEE.Her research interests include named entity recognition, relation extraction and entity linking.E-mail: yiww1@lenovo.com

Hui-Bin Ruan received the M. Sc. degrees in computer technology from Soochow University, China in 2020. Currently, she is a researcher in natural language process in Lenovo, China. Her research interests include discourse parsing, text classification and entity linking.E-mail: ruanhb2@lenovo.com

Yu Zhang received the B. Eng. degree in human factors, B. Sc. (Minor) degree in applied mathematics and M. Sc. degree in engineering physics from Beihang University, China in 2008, 2008 and 2011. He is currently the technical assistant to chief technology officer at Lenovo, China, and a Ph. D. degree candidate in computer science at Southeast University, China. His research interests include human computer interaction and human-centered AI.E-mail: zhangyu29@lenovo.com

• Accepted Date: 2022-01-29
• Publish Online: 2022-03-10
• Publish Date: 2022-04-01
• Knowledge mining is a widely active research area across disciplines such as natural language processing (NLP), data mining (DM), and machine learning (ML). The overall objective of extracting knowledge from data source is to create a structured representation that allows researchers to better understand such data and operate upon it to build applications. Each mentioned discipline has come up with an ample body of research, proposing different methods that can be applied to different data types. A significant number of surveys have been carried out to summarize research works in each discipline. However, no survey has presented a cross-disciplinary review where traits from different fields were exposed to further stimulate research ideas and to try to build bridges among these fields. In this work, we present such a survey.

• 1 First-order logic (FOL) allows us to represent facts (objects and their relations) through predicates[4]. In this way, a predicate accounts for a relation type, and a predicate symbol refers to the name of a relation. The arity of a predicate indicates the number of arguments it can receive. Representing factual knowledge using FOL not only aligns with the traditional way of representing knowledge in artificial intelligence[4], but it also satisfies some useful characteristics of a representation for natural languages such as verifiability, avoidance of unambiguity, inference, and expressiveness among others[5]. 2 Usually, knowledge bases contain several thousands of semantic relation types.
Usually, knowledge bases contain several thousands of semantic relation types.
3 A token can be a word or a punctuation mark.
4 Probably, the earliest work for NER using an LSTM is that of [22].5 Character embeddings can be pre-trained on a corpus and then fine-tuned on the target dataset via a convolutional neural network.
Character embeddings can be pre-trained on a corpus and then fine-tuned on the target dataset via a Convolutional Neural Network.
6 Negative instances can be generated by pairing entities that have no relation in the knowledge base, assigning them the label no-relation, and extracting features from sentences where these two entities co-occur.
7 Recall, as shown in (4) in Section 2.5, cannot be computed since we do not know all the instances from which relation labels are to be recovered. In order to alleviate this problem, different Recall levels are proposed for increasingly bigger samples where predictions from the system are firstly ranked from higher to lower confidence scores.
8 Association rules extracted from transactional data are a type of Boolean association rules where an item either appears or does not appear in the rule. In turn, we can see these types of rules as FOL-like rules. For example, the association rule ${\rm{diapers}} \to$${\rm{beer}}$ can be written as ${\rm{buys}}\left( {{\rm{X}},{\rm{ diapers}}} \right) \to {\rm{buys}}\left( {{\rm{X}},{\rm{ beer}}} \right)$, as noted in [79].
9 As a note, a plausible explanation behind the association rule ${\rm{diapers}} \to {\rm{beer}}$ seems to be due to a change of activities of people who used to frequent bars but cannot do so anymore because they have now the activity of parenting. Thus, these people purchase now the target product in supermarkets rather than in a bar[80].
10 The support count of an itemset can be seen as the unnormalized support of a rule. Formally, the support of a rule can be computed as the joint probability of the itemsets in the antecedent and consequent of the rule: $p\left( {A \cap B} \right)$.
11 Sometimes also referred to as explainable artificial intelligence (xAI).
12 Or set-of-parameters-level.13 As noted in [113, 114], a decision tree can be converted into IF-THEN logic rules where each internal node of the tree in a path from top to bottom represents an antecedent (the IF part of the rule), and leaf nodes deciding on the class label of an instance represent consequents (the THEN part of the rule). Thus, a path on a decision tree has a logic rule counterpart of the form IF ${f_i} = valu{e_i}$ AND ${f_j} = valu{e_j}$ AND ${f_k} = valu{e_k}$ THEN $class\_label = \hat y$, where ${f_i}$, ${f_j}$ and ${f_k}$ represent features of the input space.
14 We note that the scope of a proxy model can either be at the local or global level. While a local explanation of an ML system targets a single prediction, a global explanation aims to account for the predictive behavior of the black-box system across a collection of instances; thus, a global proxy model can explain the system′s behavior for any input instance. Choosing one or the other explanation type usually corresponds to the algorithmic complexity of the method to extract the proxy model (global explanations may be NP-Hard to compute in some cases[115]).15 It is assumed that each input neuron in a neural network receives an input value (a feature) that is human-understandable. For example, if the concept that the target NN learns is to classify houses as cheap or expensive, possible input features are the number of rooms in a house, the age of the house, or the size of the house in square meters.
We note that the scope of a proxy model can either be at the local or global level. While a local explanation of an ML system targets a single prediction, a global explanation aims to account for the predictive behavior of the black-box system across a collection of instances; thus, a global proxy model can explain the system’s behavior for any input instance. Choosing one or the other explanation type usually corresponds to the algorithmic complexity of the method to extract the proxy model (global explanations may be NP-Hard to compute in some cases [115] H. Jacobsson. Rule Extraction from Recurrent Neural Networks: A Taxonomy and Review. Neural Computation, vol. 17, no. 6, pp. 1223-1263, 2005.).
15 It is assumed that each input neuron in a neural network receives an input value (a feature) that is human-understandable. For example, if the concept that the target NN learns is to classify houses as cheap or expensive, possible input features are the number of rooms in a house, the age of the house, or the size of the house in square meters.
16 This system makes a prediction by applying a sigmoid function to the dot product of the vector representations corresponding to an entity pair and a relation.
17 We hold the same assumption as in Section 4.1, namely that the input features of a neural network are human interpretable.
18 As we saw in Section 4.1, a single instance is perturbed to generate the behavioral dataset; thus, the number of predictions obtained will be proportional to the number of perturbations performed; however, some of the perturbations may yield out-of-domain instances which may not be representative of the domain where the black-box system was trained on; thus, these instances may elicit an inconsistent behavior from the black-box system.
19 Domains such as medicine require applications to extract knowledge such as gene and drug entities and their different types of relations from biomedical texts or explanations from machine learning systems predicting relationships between entities as accurately as possible.

### Catalog

###### 通讯作者: 陈斌, bchen63@163.com
• 1.

沈阳化工大学材料科学与工程学院 沈阳 110142

Figures(3)  / Tables(1)

用微信扫码二维码

分享至好友和朋友圈