Voici les éléments 1 - 4 sur 4
Pas de vignette d'image disponible
Publication
Métadonnées seulement

Simple and efficient classification scheme based on specific vocabulary

2012, Savoy, Jacques, Zubaryeva, Olena

Vignette d'image
Publication
Accès libre

Simple and efficient classification scheme based on specific vocabulary

, Savoy, Jacques, Zubaryeva, Olena

Assuming a binomial distribution for word occurrence, we propose computing a standardized Z score to define the specific vocabulary of a subset compared to that of the entire corpus. This approach is applied to weight terms (character n-gram, word, stem, lemma or sequence of them) which characterize a document. We then show how these Z score values can be used to derive a simple and efficient categorization scheme. To evaluate this proposition and demonstrate its effectiveness, we develop two experiments. First, the system must categorize speeches given by B. Obama as being either electoral or presidential speech. In a second experiment, sentences are extracted from these speeches and then categorized under the headings electoral or presidential. Based on these evaluations, the proposed classification scheme tends to perform better than a support vector machine model for both experiments, on the one hand, and on the other, shows a better performance level than a Naïve Bayes classifier on the first test and a slightly lower performance on the second (10-fold cross validation).

Vignette d'image
Publication
Accès libre

Investigation in statistical language-independent approaches for opinion detection in English, Chinese and Japanese

2009, Zubaryeva, Olena, Savoy, Jacques

In this paper we present a new statistical approach to opinion detection and its' evaluation on the English, Chinese and Japanese corpora. Besides, the proposed method is compared with three baselines, namely Naïve Bayes classifier, a language model and an approach based on significant collocations. These models being language independent are improved with the use of language-dependent technique on the example of the English corpus. We show that our method almost always gives better performance compared to the considered baselines.

Vignette d'image
Publication
Accès libre

Classification automatique d’opinions dans la blogosphère

, Savoy, Jacques, Zubaryeva, Olena

Cette communication traite de la classification automatique opinions dans la blogosphère. Sur la base d'une liste de phrases jugées pertinentes, le système doit déterminer si elles contiennent une opinion ou non. Afin d’atteindre ce but, plusieurs représentations et modèles de catégorisation peuvent être utilisés. Comme système de référence, nous avons retenu une classification basée sur le modèle Naïve Bayes. L'emploi d'une stratégie SVM (avec une représentation tf idf) permet d'accroître la performance du système. Le système que nous proposons détecte l'usage d'un vocabulaire spécifique à chaque catégorie en recourant à un score normalisé (score z). Ces valeurs nous permettent de déterminer si une phrase contient ou non une opinion. Le système proposé a été implémenté et évalué grâce à la collection test NCTIR écrite en langue anglaise. Celle évaluation indique que notre modèle apporte clairement la meilleure performance. Le recours à un thesaurus spécialisé permet d'accroître encore la performance de catégorisation., This paper describes the problem of classifying opinion from blogs. After retrieving relevant sentences, the search system must categorize them as opinionated or factual. To achieve this objective, different representations and automatic categorization models could be used. As baseline system, we have used the Naïve Bayes approach to classify the retrieved sentences as opinionated or not. As a second model, we have used an SVM model (based on a tf idf representation) showing an increase in the overall performance. We suggest using a normalized score (Z score) for catch term according to its presence or absence in opinionated sentences. Based on these Z-scores we can determine whether a given sentence belongs to opinionated or not- opinionated category. The proposed system has been evaluated using the NCTIR English test-collection. We show that the suggested classification method performs significantly better than other approaches. Using a specialized thesaurus, we can further improve the overall categorization performance.