An Approach for Optimal Feature Subset Selection using a New Term Weighting Scheme and Mutual Information
With the development of the web, large numbers of documents are available on the Internet and they are growing drastically day by day. Hence automatic text categorization becomes more and more important for dealing with massive data. However the major problem of document categorization is the high dimensionality of feature space. The measures to decrease the feature dimension under not decreasing recognition effect are called the problems of feature optimum extraction or selection. Dealing with reduced relevant feature set can be more efficient and effective. The objective of feature selection is to find a subset of features that have all characteristics of the full features set. Instead Dependency among features is also important for classification. During past years, various metrics have been proposed to measure the dependency among different features. A popular approach to realize dependency is maximal relevance feature selection: selecting the features with the highest relevance to the target class. A new feature weighting scheme, we proposed have got a tremendous improvements in dimensionality reduction of the feature space. The experimental results clearly show that this integrated method works far better than the others.
Feature selection; Web page Classification; Feature subset selection; Mutual Information
- There are currently no refbacks.
Published by INSIGHT - Indonesian Society for Knowledge and Human Development