Ontology engineering using formal concept analysis from unstructured texual data
Résumé Knowledge extraction especially from unstructured data such as texts has been always consideredas one of the highly demanded requests with lots of applications in almost all industries. Design and building of solutions that are capable of extracting knowledge, in an almost automated way, is not an easy task at all. Many researchers have proposed variety of methodologies and algorithms to describe how one can give some structure to textual data with the ultimate goal of knowledge extraction since decades ago. One of the key elements of those solutions is to utilize ontology as a graph-like structure for representing knowledge. Building ontologies especially from textual data, however, is not quite straightforward. To the best of our knowledge, there is no yet a comprehensive methodology to describe how one can forman ontology from processing textual data in a given domain of interest to be later used for explicit as well as implicit (or semantic) knowledge extraction.
In this thesis, we propose a pipeline to describe howwe can start from analyzing texts to end up with an ontology, which is equipped with the most informative statements of that text corpus
about a given context, in order to be used for knowledge extraction. The proposed pipeline is based on utilization of three different yet complementary data analysis methods including (i) natural language processing, (ii) formal concept analysis, and (iii) ontology learning. In a nutshell, the pipeline will start by mining the input text corpus (in a given domain of interest)
using state-of-the-art natural language processing techniques. The formal concept analysis will then be used to formthe concepts and build the hierarchies among them (i.e., a concept
lattice) as the cornerstone of the desired ontology. Finally, the most informative statements extracted from text corpus will be embedded into the ontology, that has been derived from a set of proposed algorithms applied on the aforementioned concept lattice.
To validate the accuracy of the proposed pipeline we tested it on a few toy examples as well as a real use case in the context of pharmaceuticals. We could demonstrate that such an engineered ontology can be used for querying valuable knowledge and insights from unstructured textual data, and to be employed as the core component of smart search engines with applications in semantic analysis. One of the advantages of our proposed solution is that it does not require so much of human intervention, as opposed to many existing solutions whose performance highly depends on the presence of a subject matter expert along the ontology engineering process. It does not, however, mean that our proposed pipeline cannot benefit from existence of such additional information resources to be further empowered by human expertise in shaping ontologies.
Mots-clés Ontology Learning, Knowledge Representation, Formal Concept Analysis, OWL, Unstructured Data
Citation Jabbari, S. (2019). Ontology engineering using formal concept analysis from unstructured texual data, Doctorat, Neuchâtel, Neuchâtel.
Type Thèse (Anglais)
Année 2019
Departement academique Faculté des sciences économiques, IMI
Université Neuchâtel (Neuchâtel)
Degré Doctorat