Review | Knowledge Mining: A Cross-disciplinary Survey

Knowledge mining is a widely active research area across disciplines such as natural language processing (NLP), data mining (DM), and machine learning (ML). The overall objective of extracting knowledge from data source is to create a structured representation that allows researchers to better understand such data and operate upon it to build applications. A significant number of surveys have been carried out to summarize research works in each discipline. Differently, a Lenovo Research team, led by Lenovo CTO & SVP Dr. Yong Rui, presents a cross-disciplinary survey, where traits from the three fields of natural language processing, data mining and machine learning were exposed to further stimulate research ideas and to try to build bridges among these fields. Related work is published in the second issue of Machine Intelligence Research in 2022.


 


 

 

Automatic extraction of knowledge from diverse sources of data is a challenging task across different fields. For example, in natural language processing (NLP), research on the extraction of structured knowledge bases from natural language text has received much attention due to its applications (e.g., automatically building these knowledge structures from biomedical text to understand drug-drug interactions).

 

In data mining (DM), a wide area of research has focused on mining rules from structured databases that can help people discover novel associations between items or features and make decisions in diverse contexts such as business or education.

 

Furthermore, in the field of machine learning (ML), plenty of effort has been advocated towards extracting knowledge, mainly in the form of logic rules, from both machine learning system′s predictions and parameters in order to build an interpretable representation that helps to explain the system′s decisions (the so-called interpretability problem); a scenario highly sought in medicine, for example.

 

Extracting or mining knowledge from data (be it unstructured, structured, or behavioral data) is an open problem that has been tackled across different research fields. This wide scenario has not only led to different definitions and ways to represent the construct of knowledge (and consequently, to define the task of knowledge mining), but it has also resulted in diverse research perspectives, which seem to use different methodologies to extract knowledge and different metrics to evaluate the consistency of the knowledge extracted.

 

On the other hand, in the NLP field, a knowledge base is usually represented as a tensor structure where each entry usually corresponds to a probabilistic assignment of the belief of a fact.

 

Finally, in the field of machine learning, the problem of knowledge mining has been motivated by the problem of trying to understand and validate ML systems which due to their complexity are not easy to be inspected manually. Similarly, the choice of the representation of knowledge has been constrained to be understandable by humans, where a widely common and accepted representation in this area are logic rules.

 

From this brief overview of knowledge mining across fields, we can observe the diversity of objectives and constructs and the wide scenario we claimed at the beginning, which leads us to the questions: How is knowledge mining characterized across research fields? What are their proposed approaches and shared traits? And how can we consolidate them?

 

We note that while there are already several in-depth surveys in the literature of each field showing the methods and algorithms to extract knowledge, to the best of our knowledge, there is no survey that jointly traverses these research areas to answer the above questions.

 

Furthermore, the importance of mining knowledge has permeated different fields and has also impacted the industry. Therefore, we believe that a crossdisciplinary literature review, in a landscape-oriented approach, that encompasses all these varying degrees of freedom underlying the problem of mining knowledge from data is on the call. 

 

In this paper, rather than surveying a plethora of methods and previous works across these three research areas, we intend to overview the nuances, and attached idiosyncrasies, of the approaches taken to extract knowledge from a target data source.

 

Hence, this paper advocates for an additive overview of the problem of extracting knowledge across the fields of natural language processing, data mining and machine learning to show their key objectives, methods, and evaluations, and how some previous works have made links among these areas for the task of knowledge mining.

 

The final aim of this paper is to stimulate and provoke new ideas and research agendas among researchers from the different disciplines so that new bridges among the areas surveyed can emerge to further advance in the task of knowledge mining. Following this approach, we avoid providing a single definition of knowledge and knowledge mining, and rather present how these constructs have been embraced across fields. Thus, we depart from a common starting point across fields. We fix the choice of knowledge representation to that of logic, or logic-like formulas, which is a representation highly used across these fields.

 

Based on this knowledge representation, in Sections 2−4, we will walk through the different goals and key approaches of each field, in a problem-oriented perspective, to gain a refined insight into how knowledge mining is embodied and what traits we find in these research areas.

 

Finally, in Section 5, we will summarize these approaches and their shared traits, and pinpoint some examples of previous efforts in forging a bridge among the three fields while closing with a proposed future research direction. We believe this paper will contribute to creating future research directions for the task of knowledge mining that encompass the three, so far unlinked, research areas of NLP, DM, and ML.

 

 

Download full text

Knowledge Mining: A Cross-disciplinary Survey

Yong Rui, Vicente Ivan Sanchez Carmona, Mohsen Pourvali, Yun Xing, Wei-Wen Yi, Hui-Bin Ruan, Yu Zhang

https://link.springer.com/article/10.1007/s11633-022-1323-6

https://www.mi-research.net/article/doi/10.1007/s11633-022-1323-6


  • Share:
Release Date: 2022-04-07 Visited: