Big data, intuitionistic fuzzy sets and MapReduce operators

One of the main restrictions of the relational data model is the lack of support for flexible, imprecise and vague information in data encoding and retrieval. Fuzzy set theory and more specifically intuitionistic fuzzy sets provides an effective solution to model the data imprecision in relational databases. Several works in the last 30 years have used fuzzy set theory to extend relational data model to permit representation and retrieval of imprecise data. However, to the best of our knowledge, such approaches have not been designed to scale-up to very large datasets. In this paper, we develop MapReduce algorithms to enhance the standard relational operations with IFS predicates.


Introduction
The mainstream relational database queries use a Boolean logic to characterize users' answers. This means that the query condition is either satisfied or not satisfied. The use of Boolean logic poses a restriction in expressing preference or ranking of query results. For instance, it seems quite natural for an online hotel-room search to answer questions such as: "Give me all hotelrooms which are not too expensive and are close to city centre".
Intuitionistic fuzzy sets (IFS) [2,3], and possibility theory [4] provide an effective solution to represent and process imprecise relational information. The IFS theory is an extension of the classical fuzzy set theory [12]. Each element of an intuitionistic fuzzy set has degrees of membership (µ) and non-membership (ν), which potentially do not sum up to 1.0 thus leaving a degree of indefiniteness or hesitation margin (π). As extension to the classical definition of a fuzzy set is given by where: µA(x)∈ [0, 1] is the membership function of the fuzzy set Ã , an Intuitionistic fuzzy set A is given by ] denote a degree of membership and a degree of non-membership of x ∈ A, respectively. Obviously, each fuzzy set may be represented by the following intuitionistic fuzzy set For each Intuitionistic fuzzy set in X, we will call πA(x) = 1 − µA(x) − vA(x) an intuitionistic fuzzy index (or a hesitation margin) of x ∈ A which expresses a lack of knowledge of whether x belongs In [7,8] we defined the main operations over intuitionistic fuzzy relations (IFR) such as projection, selection and join. Let R be an (IFR), i.e., is an ordered tuple belonging to a given universe X, {col1, …, coln} is the set of attributes of the elements of X, µR(x) is the degree of membership of x in the relation R. In other words, R is an intuitionistic fuzzy subset of X with membership and non-membership functions µR and νR respectively.
The selection operation defines a relation, which contains only those tuples from R for which a certain predicate is satisfied. We can say that the selection modifies the degrees of membership and non-membership of R depending on the corresponding value of the predicate: where P is the predicate, i.e., the elements of the result relation have degree of membership, which is logically AND-ed with the corresponding value of the predicate P.
The traditional project operator ∏f (R) selects all attributes f from all tuples in R leaving out other attributes not in f. The semantics of a intuitionistic fuzzy project operator πP{⟨x, µR(x), νR(x) ⟩ | x ∈ X} should be that it selects all attributes f from all possibilities in R.
Projection does not affect the associated intuitionistic fuzzy measures.
The intuitionistic fuzzy operator union merges two relations possibly containing possibilities for the same real world objects. To properly calculate the Intuitionistic fuzzy measures in the answer, it is beneficial to enumerate the possible worlds, i.e., consider each possibility of an element existing or not in the operand sets. The intersection and difference can be determined analogously.
A Cartesian product of two relations R × S is identical to the Cartesian product operation defined in the intuitionistic fuzzy sets theory [8], which uses the logical AND between the degrees of membership.
Let S be another intuitionistic fuzzy relation: S = {⟨ y, µS(y), νS(y) ⟩ | y ∈ Y}, then: The definition of these operations is based on the notion of a probabilistic conjunction (logical AND). This type of conjunction is applied when the operands carry probabilistic semantics, i.e. they express a probability, not a degree of membership.
The focus of this paper is to develop MapReduce algorithms to scale-up the IFR operations to large scale crisp datasets. We formulate t-selection, projection, union, difference, intersection and Cartesian product operations with IFS predicates.

MapReduce framework
MapReduce is one of the most common platforms for processing big data, based their functioning on the primitive operations map and reduce [10], defined initially as part of the functional programming language, LISP. Many standard algorithms have been extended to comply [6,9]

Fuzzy relational operations in MapReduce
This section describes MapReduce algorithms for IFR operations on a large crisp dataset. The selection, union, intersection, and difference operations do not require much modifications to the crisp counterparts.

IFR selection
The selection operator is a map-only job that reads a record r from relation R, computes the degree ⟨µR(x), νR(x)⟩ to which r satisfies a given condition and emits r and d = min(µR(x), µ(P(x))), max(νR(x), ν(P(x))) as key and value, respectively.

The cost of the selection operation is the cost of the Map function across all records, O(|R|),
where |R| is the number of records in R.

IFR Projection
The Map function for IFR projection πF(R) reads each input record r in R and emits the F attributes of r.F, as key and the membership and non-membership degrese of r in R as values. A reducer receives a key r.F, produced by any of the map tasks, and a set of membership and nonmembership degrees associated with it. It emits r.F and the maximum of its membership degrees and the minimum of its non-membership degrees. The cost of IFR projection is: M + I + R = |R| + |πF(R)| + |πF(R)|

IFR Union
The Map function for fuzzy union R ∪ S reads an input record r (from R or S) and emits r and its ⟨membership, non-membership⟩ degrees (in R or S). The reduce function computes the maximum membership and minimum non-membership degree for each input record it receives as a key.

IFR Difference
The Map function for IFR difference R-S reads an input record r and emits r as the key. For value, it emits the name of the relation (R or S) to which r belongs as well as its membership and non-membership degree. The reduce function receives a record r as the key and a list of its associated relations and degrees of membership in each relation. If r belongs to R but not S, it will emit r and its membership degree in R. If r belongs to both R and S it will emit r and the minimum of its membership degrees in R and S complement. Each reducer performs a linear operation on the values it receives. Hence, the total cost for fuzzy union, intersection and difference is: emit key = r, value = ⟨min(μR(r), μS(r)), max(μR(r), μS(r))⟩ end if

IFR Join
The fuzzy R ⊗ S join operation takes a IFR comparator (such as equal, greater than and fuzzy less than) and computes the degree to which every pair (r ∈ R, s ∈ S) satisfies the join condition. The IFR join algorithm divides the domain of the join attribute into a number of ranges. Each range is called a partition and records from each relation get assigned to these partitions based on which range their join attribute value falls into. If the dataset is small, then one could also assume that each reducer could handle the data sent to it in short order. Unfortunately, all too often [1] neither is the case and some sub-partitioning of the data is needed to ensure load balancing.

Conclusions
This paper reports the implementation of a scalable flexible relational algebra in MapReduce based on IFS set theory. The IFS operations discussed are the IFR (selection, projection, union, difference and equal join. The cost of each algorithm is discussed in terms of the cost of the map and reduces functions as well as the communication cost. For future direction of this work, we will focus on two possible extensions of the current scalable IFR relational algebra specification.
The IFR join algorithm presented in this paper can only be applied when the join condition is fuzzy equal. An extension of the current framework will be considered when the join condition includes IFS greater than or less than comparators.