Repository logo
Research Data
Publications
Projects
Persons
Organizations
English
Français
Log In(current)
  1. Home
  2. Publications
  3. Article de recherche (journal article)
  4. Simple and efficient classification scheme based on specific vocabulary

Simple and efficient classification scheme based on specific vocabulary

Author(s)
Savoy, Jacques  
Institut d'informatique  
Kummer, Olena  
Chaire de linguistique computationnelle  
Date issued
2012
In
Computational management science
Vol
9
No
3
From page
401
To page
415
Subjects
Statistics in lexical analysis Corpus linguistics Text categorization Machine learning Natural language processing (NLP)
Abstract
Assuming a binomial distribution for word occurrence, we propose computing a standardized Z score to define the specific vocabulary of a subset compared to that of the entire corpus. This approach is applied to weight terms (character <i>n</i>-gram, word, stem, lemma or sequence of them) which characterize a document. We then show how these Z score values can be used to derive a simple and efficient categorization scheme. To evaluate this proposition and demonstrate its effectiveness, we develop two experiments. First, the system must categorize speeches given by B. Obama as being either electoral or presidential speech. In a second experiment, sentences are extracted from these speeches and then categorized under the headings electoral or presidential. Based on these evaluations, the proposed classification scheme tends to perform better than a support vector machine model for both experiments, on the one hand, and on the other, shows a better performance level than a Naïve Bayes classifier on the first test and a slightly lower performance on the second (10-fold cross validation).
Publication type
journal article
Identifiers
https://libra.unine.ch/handle/20.500.14713/65915
DOI
10.1007/s10287-012-0149-z
File(s)
Loading...
Thumbnail Image
Download
Name

Savoy_Jacques-Simple_and_efficient_classification-20130104.pdf

Type

Main Article

Size

7.21 MB

Format

Adobe PDF

Université de Neuchâtel logo

Service information scientifique & bibliothèques

Rue Emile-Argand 11

2000 Neuchâtel

contact.libra@unine.ch

Service informatique et télématique

Rue Emile-Argand 11

Bâtiment B, rez-de-chaussée

Powered by DSpace-CRIS

libra v2.1.0

© 2026 Université de Neuchâtel

Portal overviewUser guideOpen Access strategyOpen Access directive Research at UniNE Open Access ORCIDWhat's new