Interval-valued intuitionistic fuzzy sets as tools for evaluation of data mining processes

: Intuitionistic Fuzzy Sets (IFSs), proposed in 1983, are extensions of fuzzy sets. Some years after their introduction, interval-valued IFSs (IVIFSs) were introduced. During the last 30 years, their properties were studied and these sets were used as tool for evaluation of different objects and processes from the area of the Artiﬁcial Intelligence. Short review of these legs of research is offered, with some concrete ideas of possible new directions of study. On this basis, a non-formal discussion is raised on the beneﬁts of applying various elements of IVIFSs as tools for evaluation of Data Mining processes


Introduction
This paper is a continuation of the author's paper [8]. Here, we discuss the origin, current state of research and applications in the area of Data Mining (DM) of one extension of Intuitionistic Fuzzy Sets (IFSs) and Logics (IFLs), called Interval-Valued IFSs (IVIFSs) and Logics (IVIFLs).
The first research, related to IVIFSs started in 1988 -1989 [3,13]. Their basic definitions and the definitions of the operations, relations and operators, defined over them are described in [6] and in a series of papers, e.g., [10,11]. Here, we use some these definitions.
The components in the IVIFS-and IVIFL-definitions give more and larger evaluating possibilities and determine the place of the IVIFSs and IVIFLs among the separate types of fuzzy sets. In the last 25 years the IVIFSs have been used for evaluating of processes in a wide range of areas, e.g. of Systems Theory (ST), Artificial Intelligence (AI) and Intelligent Systems (IS), medicine, chemical industry, ecology, etc.
Here we describe some of the IVIFS-applications in the AI and IS, and their benefits and discuss the possibilities for application of the IVIFSs as tools for evaluating of DM-processes.
2 IVIFSs and Data Mining -possibilities for the future Following [8], we ask: "What is Data Mining"? The answer of this question is so unclear, as well as the answer of the question for the areas of the AI. Again, there are different answers in respect of the opinions of the specialists, giving answers. For example: "The aim of DM is to make sense of large amounts of mostly unsupervised data, in some domain" [25]; "The aim of DM is to extract implicit, previously unknown and potentially useful (or actionable) patterns from data. DM consists of many up-to-date techniques such as classification (decision trees, naive Bayes classifier, k-nearest neighbor, NNs), clustering (k-means, hierarchical clustering, density-based clusteering), association (one-dimensional, multi-dimensional, multilevel association, constraint-based association)" [63]; "DM stands at the confluence of the fields of statistics and machine learning" [52]; "DM is a term that covers a broad range of techniques being used in a variety of industries" [50]; "DM is the core of the knowledge discovery in databases process, involving the inferring of algorithms that explore the data, develop the model and discover previously unknown patterns" [43].
DM is a process of finding reasonable correlations, repeating patterns and trends in large Data Bases (DBs) and Big Data (BD). As a basis of our research, we use the publications [18-21, 24-27, 29-31, 33-35, 35-41, 43-53, 62, 63, 66, 67]. In the literature, different areas of the AI are determined as components of the DM. For example, the algorithms of decision making, pattern recognition, neural networks, genetic algorithms, etc.
Extending and modifying [5,8], here we make a review of some of the problems related to the above ones, those already existing, and those planned for future research. Everywhere we emphasize on: • the way of the IVIFS-estimation of the process (object) up to now (if any); • other ways for IVIFS-realization of this estimation; • possible extensions or generalizations of already existing IFL-estimations of the corresponding processes (objects) and ways for their modifications.
2.1 IVIF-estimations in expert systems, data bases, data warehouses, big data, OLAP-structures As the author mentioned in [8], "A lot of colleagues already assert that the Expert Systems (ESs) are dying. The author supports the idea that they will live their "Renaissance", obtaining a special place in the instrumentation of DM. Preserving their basic purpose to generate a new knowledge by answering to hypotheses, we can essentially extend the area of their possibilities. When some unclear situation arises in a process controled by DM-tools, and when some hypotheses for its future development are generated, then the new type of ESs can help." In [4], the concept of an Intuitionistic Fuzzy ES (IFES) was introduced. It was essentially extended in [5,22,23,42]. In these ESs, each fact F has an IF-estimations µ(F ), ν(F ) , determining its degrees of validity and non-validity. So, the answer whether a given hypothesis is valid or not, obtains essentially more precise evaluation. In near future, we will introduce an extension of the IFES which facts will have the IVIF-estimations . A next step of the extensions will be the introduction of facts that contain moments of time, when they became valid, and moments in which they stopped being valid (a sequence of time-moments t 1 , t 2 , ..., t n ). Then (cf. [5]), on one hand, we can answer to time related questions ("at the moment", "once", "sometimes", "for long/short time", "often", "rarely", "for short period", "for long period", etc.). On the other hand, the IVIFES rules can have essentially complex forms, containing different logical operations (conjunction, disjunction, implication, negation,...), quantifiers ("for existence" and "for all") and modal operators in their antecedents. In addition, the facts and rules can have priorities that will determine whether a given fact or rule can stay in the DB or must be changed with another one.
In future, the ES-answers can be additionally, so they can have optimistic, pessimistic, or another form. Similar directions for extensions of the DBs, Data Warehouses (DWs), BD, OLAPstructures, etc. can be realized.
As we assumed in [8], writing for IFESs and now -for IVIFESs, solving each of the above problems or, of course, all of them, will promote not only the theory and application of IVIFSs, but also the research in the area of DM, too.
In the next section, an example is given that can be used as an illustration for determining of M -and N -evaluations of the facts.

IVIF-estimations of a procedure for inductive reasoning
As it is mentioned in [31], "the rule induction is one of the fundamental tools of DM. Usually Let p be the number of degrees µ i that are equal to 1, q be the number of degrees ν i that are equal to 1, r be the number of degrees µ i that satisfy 1 > µ i > 1 2 , s be the number of degrees ν i that satisfy 1 > ν i > 1 2 . Obviously, p + q + r + s ≤ n. Hence, we obtain more precise estimation for the validity of the procedure for inductive reasoning than the cases of standard, fuzzy and intuitionistic fuzzy inductive reasoning. If in the beginning we determine some threshold of validity t v , then we can assert that a decision is positive sufficiently valid, if sup M > t v and it is strongly positive sufficiently valid, if inf M > t v . On the other hand, if we determine some treshold of non-validity t n then we can assert that a decision is negative sufficiently valid, if inf N < t n and it is is strongly negative sufficiently valid, if sup N < t n .

IVIF-estimations in decision making procedures
The procedures for decision making include multi-criteria decision making procedures, that can be re-organized so that they to use IVIF-estimations. For example, let us have s experts who must estimate some object or process. Let m of them estimate it as "perfect", "the best" or "very good"; n of them -as "worst" or "very bad"; r -as "good", "suitable" or "useful"; and s are "bad", "unsuitable" or "useless", then we can estimate the object or process by IVIF-estimations, using the formulas from the previous Section.
In [8], a new type of decision making procedure is discussed, based on the apparatus of the intercriteria analysis (see, e.g., [12,14]). It is called intercriteria decision making. Its aim is to find dependences among the used criteria. For example, it is very suitable when separate experts offer different criteria for use in concrete procedure. Now, after finishing of the procedure, we can determine whether there are connections between some of these criteria. In IFS-case, this procedure is discussed in [17], while for the IVIFS-case similar research is appeared. The new method is based on the apparatus of the index matrices (see [2,7]).

IVIFS-estimations in pattern recognition procedures
The apparatus of the IVIFSs is suitable for estimation of different pattern recognition procedures. Here, we give the following two short examples, inspired by [9]. Example 1. Let us have the original pattern -in our example, triangle ABC that must be compared to an other pattern -e.g., triangle AF G (see Fig. 1). Let the section BC be fuzzified, i.e., it be modified to the region BCED.
Let us denote by #X the surface of region X and let Obviously, a + b + c + d + e = s. More complex is the following example.
Example 2. Let us have (see Fig. 2) the original pattern -in the new example, again triangle ABC that must be compared to the other pattern -now triangle AGI. Let the sections BC and GI be fuzzified, i.e., they are modified to the regions BCED and F GIH. Let Obviously, a + b + c + d + e + f + g + h = s.

IF-estimations in neural networks and evolutionary algorithms
The first results related to the IF-estimations in neural networks date back to the year 1990 [32] and they are continued in [15, 16, 54-61, 64, 65]. These estimations are related, from one side, to the initial values in the input vectors that now will have the form M i , N i for the i-th input neuron, where M i , N i ⊆ [0, 1] and sup M i + sup N i ≤ 1. From the other hand, the weight coefficients of the connections between the nodes with the form V i,j , W i,j for the i-th and jth neurons lying in sequential layers, where V i,j , W i,j ⊆ [0, 1] and sup V i,j + sup W i,j ≤ 1.
When some of these coefficients have IF-truth-value [0, 0], [1,1] , then we can interpret that the respective object (nodes or arcs between nodes) does not exists. So, we can modify the neural network structure in current time.
In [28], it is mentioned that "The paradigm of Evolutionary Algorithms (EAs) consists of stochastic search algorithms inspired by the process of neo-Darwinian evolution. ... There are several kinds of EAs, such as Genetic Algorithms, Genetic Programming, Classifier Systems, Evolution Strategies, Evolutionary Programming, Estimation of Distribution Algorithms, etc." In this direction of research, in near future the focus will be oriented to the mentioned above EAs.
In [8], some other areas of DM that can use IF-estimations, were described. All they can use IVIF-estimations, too. Some of these areas are the following.
• machine and e-learning • clusterisation and classification of data • knowledge discovery processes • processes for imputation (filling in) of missing data and others.

Conclusion
The present paper aims to offer a new look on different aspects and procedures of DM from the point of view of IVIFSs.
Having in mind that elements of each IVIFSs have four parameters (inf M, sup M, inf N, sup N ), we can mention that in all the commented areas of DM, we see the application of intervalvalued intuitionistic fuzziness as a tool for more precise estimation, which takes into account possibly simultaneously opposite patterns of behaviour, as well as uncertainty.
In [68], an idea for a new direction in AI was formulated by L. Zadeh, based on the concept of a granule. But, by the moment there is not a good formal definition of this concept. Probably, the estimations of the IVIFS-elements can be used for a model. Really, the geometrical interpretation of an IVIFS-element x is given in Figure 3. Of course, this is only the first step of the development of this idea that has the potential to develop in future.