Repository logo
Research Data
Publications
Projects
Persons
Organizations
English
Français
Log In(current)
  1. Home
  2. Publications
  3. Chapitre de livre (book part)
  4. Selecting optimal split-functions for large datasets

Selecting optimal split-functions for large datasets

Author(s)
Stoffel, Kilian  
Rectorat  
Raileanu, Laura Elena
Editor(s)
Bramer, Max
Preece, Alun
Coenen, Frans
Publisher
Godalming: Springer-Verlag London Ltd
Date issued
2001
In
Research and Development in Intelligent Systems Xvii
From page
62
To page
72
Serie
Research and Development in Intelligent Systems Xvii
Abstract
Decision tree induction has become one of the most popular methods for classification and prediction. The key step in the process of inferring decision trees is finding the right criteria for splitting the training set into smaller and smaller subsets so that, ideally, all elements of a subset finally belong to one class. These split criteria can be defined in different ways (e.g. minimizing impurity of a subset, or minimizing entropy in a subset), and therefore they emphasize different properties of the inferred tree, such as size or classification accuracy. In this paper we analyze if the split functions introduced in a statistical and machine learning context are also well suited for a KDD context. We selected two well known split functions, namely Gini Index (CART) and Information Gain (C4.5) and introduced our own family of split functions and tested them on 9,000 data sets of different sizes (from 200 to 20, 000 tuples). The tests have shown that the two popular functions are very sensitive to the variation of the training set sizes and therefore the quality of the inferred trees is highly dependent on the training set size. At the same time however, we were able to show that the simplest members of the introduced family of split functions behave in a very predictable way and, furthermore, the created trees were superior to the trees inferred using the Gini Index or the Information Gain based on our evaluation criteria.
Publication type
book part
Identifiers
https://libra.unine.ch/handle/20.500.14713/23696
Université de Neuchâtel logo

Service information scientifique & bibliothèques

Rue Emile-Argand 11

2000 Neuchâtel

contact.libra@unine.ch

Service informatique et télématique

Rue Emile-Argand 11

Bâtiment B, rez-de-chaussée

Powered by DSpace-CRIS

libra v2.1.0

© 2025 Université de Neuchâtel

Portal overviewUser guideOpen Access strategyOpen Access directive Research at UniNE Open Access ORCIDWhat's new