Identification of a sufficient number of the best attributes in the intuitionistic fuzzy models

: Dimension reduction of the models, i


Introduction
Dimensionality reduction of the models is an important task that refers to reducing the number of input variables (attributes, features) in a dataset.Reduced but still sufficient number of the input variables makes a model more transparent and simpler from the point of view of calculations.The problem is challenging from both theoretical and practical points of view.The existing methods have their pros and cons but there is not one "best method".There are two approaches to model reduction, namely, feature (attribute) extraction and feature (attributes) selection.Feature extraction uses a combination of features (attributes), deriving some new ones.The results of feature extraction can be difficult to interpret.The second method of model reduction, namely, feature (attributes) selection boils down to the selection of the most relevant features.In this paper, we will examine attribute selection for the sets of data expressed by the Atanassov's intuitionistic fuzzy sets (IFSs, for short).
Atanassov's intuitionistic fuzzy sets (Atanassov [2][3][4]) are a generalization of the fuzzy sets (Zadeh [48]).The IFSs can be viewed as a tool that may help better model the systems in the presence of a lack of knowledge.An advantage of the IFSs is an inherent possibility to take a lack of knowledge into account by using the so-called hesitation margin or intuitionistic fuzzy index.
Certainly, the problem of too many variables occurs for the IFSs models as for other types of models.The counterpart of the well-known Principal Component Analysis (PCA) (Jackson [9], Jolliffe [10], Marida et al. [12]) for the IFSs (cf.Szmidt and Kacprzyk [37]), Szmidt [15]) gives correct results but, again, it is complicated from the point of view of calculations, and the final result is not transparent enough for some users.
Here we analyze a simple method of feature selection for the data sets which are expressed by intuitionistic fuzzy sets (IFSs).We make use of the three term representation of IFSs enabling us to construct a convincing, simple and efficient, transparent, and easy from the point of view of calculations method of feature selection.Moreover, the considered here approach makes it possible to rank the attributes (not all methods enable it).
The discussed method is tested on well-known benchmark data from the UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/datasets).We deal with classification tasks trying to reduce the number of input attributes and still obtain satisfactory results.
The results of our approach are compared to Principal Component Analysis (cf.Jackson [9], Jolliffe [10], Marida et al. [12]) and with the method using the well known Gain Ratio (Quinlan [13]).Additionally, we propose to reduce the number of calculations by using a graphical representation of the proposed method.

A brief introduction to IFSs
One of the possible generalizations of a fuzzy set in X (Zadeh [48]) given by where µ A ′ (x) ∈ [0, 1] is the membership function of the fuzzy set A ′ , is an intuitionistic fuzzy set ) A is given by where: and µ A (x), ν A (x) ∈ [0, 1] denote a degree of membership and a degree of non-membership of x ∈ A, respectively.(See Szmidt and Baldwin [16] for assigning memberships and non-memberships for IFSs from data.)Obviously, each fuzzy set may be represented by the following IFS: An additional concept for each IFS in X, that is not only an obvious result of ( 2) and (3) but which is also relevant for applications, we will call (Atanassov [3]) a hesitation margin of x ∈ A, which expresses a lack of knowledge of whether x belongs to A or not (cf.Atanassov [3]).It is obvious that 0 ≤ π A (x) ≤ 1, for each x ∈ X.

Three term representation of the IFSs as a basis for attribute selection
In this paper we use the three term representation of the IFSs, i.e., take into account membership values µ, non-membership values ν, and hesitation margins π.The tree term representation is very useful especially from practical points of view (cf.Szmidt [15], Szmidt and Kacprzyk [21, 22, 26, 27, 34-36, 38, 39]).We also used an algorithm [16] of how to derive IFS parameters of a model from relative frequency distributions (histograms) but in further consideration it is assumed that the parameters are known.
Having in mind the interpretation of the three terms we can indicate the most relevant attributes.As the values of each attribute A k , k = 1, . . ., K for different instances are different, an attribute can be described by average values of memberships (5), non-memberships (6), and hesitation margins (7), that are obtained by the weight operator W (cf. [4]), i.e.: where n is a number of instances.Description of the attributes by ( 5)-( 7) makes it possible to indicate the most discriminative attributes.An intuitionistic fuzzy attribute A k is most discriminative if its average intuitionistic fuzzy index (7) is as small as possible, and the difference between average membership value and average non-membership value |µ A k − ν A k | is as big as possible.The simplest function which makes it possible to find out the most relevant attributes, i.e., the one fulfilling conditions for π and |µ Function f (A k ) (8) has the following properties 4. If a value of π is fixed, f (A k ) behaves dually to a very simple sort of entropy measure The shape of (8), and its contour plot are in Figure 1.Making use of the characteristic of each attribute f (A k ) (8) we find "the best" attribute where A k is the k-th attribute , k = 1, . . ., K.
We can rank all K attributes from the most to the least discriminative by repeating (9) K − 1 times.
The Diabetic Retinopathy dataset contains features extracted from the Messidor image set [1].The Diabetic Retinopathy dataset has 20 attributes.The last (20th) attribute is the classification of whether an image contains signs of diabetic retinopathy or not.There are 1151 instances.
The order of the first 10 best attributes and respective values of measure ( 8) are in Table 1.In Figure 2 there are all the attributes evaluated by (8) and presented in descended order from the best to the worst one.Besides the classification accuracy (total proper identification of the instances belonging to the classes considered), we have also paid attention to the area under ROC curve [8].The results are in Table 2.
The accuracy by the best algorithms and all the attributes (Table 2) is equal to 74.62% obtained by a function Logistic.Accuracy of the other algorithms with best results, namely, tree LMT, Multilayer Perceptron, and Random Forest is equal to: 71.95%, 71.16%, 68.66%, respectively.We wished to see how many attributes are redundant, i.e., for how many attributes after selection we will still have high accuracy.We started the calculation from only one, the best attribute, and in the next steps we were adding one by one the next "the best" attribute verifying accuracy obtained.The procedure of adding the attributes was continued until obtaining satisfactory accuracy.For data set "Diabetic Retinopathy", taking into account only 4 "best" attributes (Table 3) we obtained accuracy 74.07%for function Logistic, 72.19% for tree LMT, 72.02% for Multilayer Perceptron, and 67.82% for the fourth algorithm (Random Forest).
In the same (Table 3) we have results obtained by PCA, and by the Gain Ratio.The PCA results are worse for the three first algorithms than the results obtained by the measure (8).
Summing up, the selecting algorithm ( 8)-( 9) meets our expectations for Diabetic Retinopathy data set.The advantage of selecting the attributes by the measure (8) in comparison with selecting the attributes by the Gain Ration is illustrated in Figures 2 and 3  First, we can see that the order of the attributes obtained by the measure (8) in Figure 2, and the order of the attributes obtained by the Gain Ratio in Figure 3 are different.The best attributes by (8) are: 3, 4, 5, 6 whereas by the Gain Ratio are quite different: 1, 16, 15, 14.We have calculated cumulative percentage participation of the four attributes in both measures (cf.Table 4 and Table 5).It turns out that: • cumulative percentage participation of the four attributes (3,4,5,6) in the measure ( 8) is equal to 60.9%, • the cumulative percentage participation of the four attributes (1,16,15,14) in the Gain Ratio is equal to 54.1%.
In other words, the four attributes selected by the measure (8) "cover" more area (60.9%) under the curve in Figure 2 than the four attributes selected by the Gain Ratio (cover less area -54.1% under the curve in Figure 3).This result explains why the accuracy obtained by the four attributes selected by the measure (8) is better than the accuracy obtained by the four attributes selected by the Gain Ratio (cf.Table 3).Table 4. "Diabetic Retinopathy" -the order of the attributes by f (A k ) (8), the values of the measure (8), and the cumulative value of (8)  The presented method gave promising results also for other data sets (cf.[41].However, to determine the satisfactory number of attributes selected, we were performing calculations using WEKA -for many algorithms and for every algorithm we have started from only one attribute adding in the next steps other attributes, one by one, in the order pointed out by the proposed function f (A k ) (8).The approach enables to indicate a satisfactory number of the attributes.However, having the promising method (8)-( 9) of the attributes selection, it is still profitable to simplify the calculations.We have observed that the number of the selected attributes giving satisfactory results can be found out using a figure showing the order of the attributes selected by f (A k ) (8).For the data set Diabetic Retinopathy we use Figure 2. The function f (A k ) (8) is a decreasing one.We can observe regions of decreasing followed by the regions of similar values f (A k ) (8) for the ordered attributes.For example, for attributes: 3, 4, 5, 6 the function decreases, whereas for the next attributes: 6 and 19 is almost the same, next, the function decreases for attributes 19, 8, and 7, next, for attributes 7 and 10 is very similar, and next, decreases for attributes 10 and 11, to be almost the same for attributes 11, 12, 17, etc.The idea of simplifying the calculations is to verify the successive subsets of the attributes for which the function f (A k ) (8) decreases instead of verifying the accuracy adding the attributes one by one.In Figure 2 we can see that the first subset of such attributes is {3, 4, 5, 6}, next subsets are: {19, 8, 7}, {10, 11}, {17, 15}, etc.As we have already verified, the first subset of the attributes {3, 4, 5, 6} is enough to obtain satisfactory accuracy of classification (74.1% (Table 3) instead of 74.6% for all the attributes (Table 2)).The method of finding out a satisfactory number of the attributes graphically does not work in the case of the Gain Ration (cf. Figure 3).We have observed similar dependencies testing other data sets.

Figure 3 .
Figure 3.The values of the Gain Ratio for all the Diabetic Retinopathy attributes ranked from the best to the worst

Table 1 .
"Diabetic Retinopathy" -first ten attributes selected by f (A k ) (8) www.cs.waikato.ac.\-nz/ml/weka/) we evaluated the accuracy of different 12 classifiers using all 19 attributes (without selection).A simple cross-validation method was applied with 10 experiments of the 10-fold cross-validation.The best results of the classification were obtained for the algorithms: Figure 2. The values of (8) for all the Diabetic Retinopathy attributes ranked from the best to the worst Next, using WEKA (http://

Table 2 .
"Diabetic Retinopathy" -comparison of the classification accuracy by different classifiers with all 19 attributes

Table 3 .
"Diabetic Retinopathy" -comparison of the classification accuracy with 4 attributes pointed out by: f (A k ) (9), PCA, and the Gain Ratio

Table 5 .
"Diabetic Retinopathy" -the order of the attributes by the Gain Ratio (GR), the values of the measure Gain Ratio, and the cumulative values of the Gain Ratio[%]