Methodology for Mining Comprehensible Rules from Sequential Data
Responsable du projet Kilian Stoffel
Collaborateur Paul Cotofrei
Résumé The purpose of this project is to respond to an actual necessity -- the need to discover knowledge from huge data collection comprising multiple sequences that evolve over time -- by proposing a methodology for temporal rule extraction. To obtain what we called temporal rules, a discretisation phase that extracts events from raw data is applied first, followed by an inference phase, where classification trees are constructed based on these events. The discrete and continuous characteristics of an event, according to its definition, allow the use of statistical tools as well as of techniques from artificial intelligence on the same data.

A theoretical framework for this methodology, based on first-order temporal logic, is also defined. This formalism permits the definition of the main notions (event, temporal rule, constraint) in a formal way. The concept of consistent linear time structure allows us to introduce the notions of general interpretation, of support and of confidence, the lasts two measures being the expression of the two similar concepts used in data mining. These notions open the possibility to use statistical approaches in the design of algorithms for inferring higher order temporal rules, denoted temporal meta-rules.

The capability of the formalism is extended to "capture" the concept of time granularity. To keep an unitary viewpoint of the meaning of the same formula at different time scales, the usual definition of the interpretation for a predicate symbol, in the frame of a temporal granular logic, is changed: it returns now the degree of truth (a real value between zero and one) and not the meaning of truth (one of the values true or false).

Finally, a probabilistic model is attached to the initial formalism to define a stochastic first-order temporal logic. By using advanced theorems from the stochastic limit theory, it was possible to prove that a certain amount of dependence (called near-epoch dependence) is the highest degree of dependence which is sufficient to induce the property of consistency.
Mots-clés temporal data mining, formalism of temporal rules
Page internet http://www2.unine.ch/imi/page-18327.html
Type de projet Recherche fondamentale
Domaine de recherche computer science
Source de financement FNS
Etat Terminé
Début de projet 1-4-2001
Fin du projet 30-9-2003
Budget alloué 103068
Contact Kilian Stoffel