Selection of the attributes in intuitionistic fuzzy models

: We present a novel method of attribute selection for data bases which are expressed via intuitionistic fuzzy sets (IFSs, for short). We use the three term representation of the IFSs which makes it possible to construct a transparent and justiﬁed function that makes it possible to select attributes for widely understood decision making, e


Introduction
The problem of model dimensionality reduction has been investigated for long and in spite of new approaches constantly proposed (with their pros and cons), the process continues as there is no overall "best method". We have two possible approaches to the model dimensionality reduction. First, so called feature (attribute) extraction when the dimensionality is reduced by using a combination of features (attributes) which may result in difficulties with model interpretation. Second, the so called feature (attribute) selection when only the most relevant features are selected and used. Here we will consider the latter, the attribute selection for data sets which are expressed by intuitionistic fuzzy sets (IFSs for short).
The intuitionistic fuzzy sets [1][2][3], are a very convenient tool for the modeling of systems in the presence of a lack of knowledge which is crucial for decision making and the same time difficult to foresee. The IFSs, being an extension of Zadeh's fuzzy sets [28], can make it possible to take into account a lack of knowledge by making use of the so-called hesitation margin or intuitionistic fuzzy index.
However, the IFS models can again be described by too many variables to efficiently perform simulations. So, here again, we face the well known problem of the reduction of dimensionality of data. The well known Principal Component Analysis (PCA) for the IFSs [14,23] gives correct results but, again, it is quite complicated from the point of view of calculations, and the final result is not transparent enough for some users.
Here we propose a novel and simple method of feature selection for the data sets which are expressed by intuitionistic fuzzy sets (IFSs). The three term representation of the IFSs make possible a simple and efficient feature selection process. The method is transparent and easy from the point of view of calculations. Moreover, the proposed approach makes it possible to rank the attributes (not all methods can do this).
We test the proposed method using an example well known from the literature. The results are compared with those obtained by other methods of dimensionality reduction.

A brief introduction to the IFSs
One of the possible generalizations of a fuzzy set in X [28] given by where µ A (x) ∈ [0, 1] is the membership function of the fuzzy set A , is an IFS [1][2][3] A is given by where: µ A : X → [0, 1] and ν A : X → [0, 1] such that and µ A (x), ν A (x) ∈ [0, 1] denote a degree of membership and a degree of non-membership of x ∈ A, respectively. (See Szmidt and Baldwin [15] for deriving memberships and non-memberships for A-IFSs from data.) An additional concept for each IFS in X, that is not only an obvious result of (2) and (3) but which is also relevant for applications, we will call [2] a hesitation margin of x ∈ A which expresses a lack of knowledge of whether x belongs to A or not (cf. [2]). It is obvious that 0 ≤ π A (x) ≤ 1, for each x ∈ X.
Below, because of space limitation we present only necessary materials directed a reader to respective literature.

Three term representation of the IFSs as a foundation for selecting attributes
In [15] we have presented an algorithm of how to derive IFS parameters of a model from relative frequency distributions (histograms). To justify the (automatic) method, we have shown some similarities/parallels between the intuitionistic fuzzy set theory and mass assignment theory -a well known tool for dealing with both the probabilistic and fuzzy uncertainties (the proof is in Baldwin at al. [5]). Next step of our approach was to recall a semantics for membership functions -the interpretation having its roots in the possibility theory. Finally, in [15] we have proposed the automatic algorithm assigning all three terms (memberships, non-memberships and hesitation margins) describing the intuitionistic fuzzy sets.
In the intuitionistic fuzzy model considered in this paper the attributes are described by the above mentioned three terms. Having in mind the interpretation of the three terms we can try to point out the most relevant attributes. As the values of each attribute A k , k = 1, . . . , K for different instances are different, an attribute can be described by average values of memberships (5), non-memberships (6), and hesitancy margins (7), i.e.: where n is a number of instances. The most relevant attributes should be most discriminative. For a specific intuitionistic fuzzy attribute A k it means that its average intuitionstic fuzzy index (7) should be as small as possible, and the difference between average membership value and average non-membership value |µ A k − ν A k | should be as big as possible. The simplest function which fulfills such conditions for A k is: Having in mind that function f (A k ) is in fact f (A µ k ,ν k ,π k ), the properties of (8) are: 4. For a fixed value of π, f (A k ) behaves dually to a very simple sort of entropy measure |µ k − ν k | (i.e., as 1 − (|µ k − ν k |)).
In Figure 1 we can see the shape of (8), and its contour plot. It is worth noticing that the components in (8) are independent, i.e. |µ k − ν k | is independent on π since the shape of |µ k − ν k | is always the same in spite of π.
From (8) we find "the best" attribute arg max where A k is the k-th attribute , k = 1, . . . , K. Repeating (9) K −1 times we can order all K attributes: from the most to the least discriminative.

Results
As an illustration how the proposed method works, we will recall a well known problem formulated by Quinlan [13] but expressed in terms of the IFSs. The Quinlan's example, the so-called "Saturday Morning" example, considers the classification with nominal data. This example is small enough and illustrative, yet is a challenge to many classification and machine learning methods. The main idea of solving the example by Quinlan was to select the best attributes (variables) to split the training set (Quinlan used a so-called Information Gain which was a dual measure to Shannon's entropy). In Quinlan's example [13] (Table 1) we have objects described by attributes. Each attribute represents a feature and takes on discrete, mutually exclusive values. For example, if the objects were "Saturday Mornings" and the classification involved the weather, possible attributes might be [13]: The limitation of space does not let us discuss the method of deriving the IFS counterpart of Quinlan's example (Table 2) in detail (cf. Szmidt and Kacprzyk [22]) and we only present here the final results.   Table 3. Characteristic of the "Saturday Morning" attributes In the last column of Table 3 there are values obtained from (8) for each attribute. Making use of (9) we can order the attributes (the bigger value from (8) the better), namely: Outlook, Humidity, Windy, Temperature. We can notice that that the values (8) are not too big what is the result of both substantial values of hesitation margins (second column of Table 3) and not very big differences between µ and ν.
It is worth noticing that Quinlan [13] obtained 100% classification accuracy, and the optimal solution (the minimal possible ID3 tree) using the first three attributes as pointed out by our method (Outlook, Humidity, Windy).
Another selection of the attributes obtained via Hellwig's method [8] adapted to IFS data (cf. Szmidt and Kacprzyk [24]) and verified using the Quinlan's example [24] again gave the same result.
Finally, we compared the new results for the same example but obtained by Principal Component Analysis (PCA) -one of the best known and widely used linear dimension reduction technique [9][10][11] in the sense of mean-square error.
After performing the PCA adapted to the data expressed via the IFSs (cf. Szmidt and Kacprzyk [23]), we have noticed that the first three eigenvalues explain most of variability of the data (85%), and summarize the most important features of the data. It is again a sort of confirmation of our result while using the novel method (9) presented here (PCA as a method of reduction points out a combination of attributes not the initial attributes).
Clearly, our example is just for illustration as feature reduction makes sense for large problems (very many features) and then the selection of the attributes is usually considerable and very welcome.

Conclusions
We have proposed a novel method for feature selection for data bases for which the IFS model is justified. We have used the three term model of IFSs which made it possible to formulate a very natural and understandable function being a foundation of the selection process. The method is transparent, simple from the point of view of calculation, intuitively appealing and gives promising results. We plan to verify the proposed method on bigger data bases.