Unstructured texts are a very popular data type and still widely unexplored in the privacy preserving data mining field. We consider the problem of providing public information about a set of confidential documents. To that end we have developed a method to protect a Vector Space Model (VSM), to make it public even if the documents it represents are private. This method is inspired by microaggregation, a popular protection method from statistical disclosure control, and adapted to work with sparse and high dimensional data sets.
Record Linkage is used to link records of two different files corresponding to the same individuals. These algorithms are used for database integration. In data privacy, these algorithms are used to evaluate the disclosure risk of a protected data set by linking records that belong to the same individual. The degree of success when linking the original (unprotected data) with the protected data gives an estimation of the disclosure risk.
In this paper we propose a new parameterized aggregation operator and a supervised learning method for disclosure risk assessment. The parameterized operator is a symmetric bilinear form and the supervised learning method is formalized as an optimization problem. The target of the optimization problem is to find the values of the aggregation parameters that maximize the number of re-identification (or correct links). We evaluate and compare our proposal with other non-parametrized variations of record linkage, such as those using the Mahalanobis distance and the Euclidean distance (one of the most used approaches for this purpose). Additionally, we also compare it with other previously presented parameterized aggregation operators for record linkage such as the weighted mean and the Choquet integral. From these comparisons we show how the proposed aggregation operator is able to overcome or at least achieve similar results than the other parameterized operators. We also study which are the necessary optimization problem conditions to consider the described aggregation functions as metric functions.
We prove a decomposition theorem for hesitant fuzzy sets, which states that every typical hesitant fuzzy set on a set can be represented by a well-structured family of fuzzy sets on that set. This decomposition is expressed by the novel concept of hesitant fuzzy set associated with a family of hesitant fuzzy sets, in terms of newly defined families of their cuts. Our result supposes the first representation theorem of hesitant fuzzy sets in the literature. Other related representation results are proven. We also define two novel extension principles that extend crisp functions to functions that map hesitant fuzzy sets into hesitant fuzzy sets.
Since the notion of hesitant fuzzy set was introduced, some clustering algorithms have been proposed to cluster hesitant fuzzy data. Beside of hesitation in data, there is some hesitation in the clustering (classification) of a crisp data set. This hesitation may be arise in the selection process of a suitable clustering (classification) algorithm and initial parametrization of a clustering (classification) algorithm. Hesitant fuzzy set theory is a suitable tool to deal with this kind of problems. In this study, we introduce two different points of view to apply hesitant fuzzy sets in the data mining tasks, specially in the clustering algorithms.
In this paper, we define hesitant fuzzy partitions (H-fuzzy partitions) to consider the results of standard fuzzy clustering family (e.g. fuzzy c-means and intuitionistic fuzzy c-means). We define a method to construct H-fuzzy partitions from a set of fuzzy clusters obtained from several executions of fuzzy clustering algorithms with various initialization of their parameters. Our purpose is to consider some local optimal solutions to find a global optimal solution also letting the user to consider various reliable membership values and cluster centers to evaluate her/his problem using different cluster validity indices.
Microaggregation is an anonymization technique consistingon partitioning the data into clusters no smaller thankelements andthen replacing the whole cluster by its prototypical representant. Mostof microaggregation techniques work on numerical attributes. However,many data sets are described by heterogeneous types of data, i.e., nu-merical and categorical attributes. In this paper we propose a new mi-croaggregation method for achieving a compliantk-anonymous maskedfile for categorical microdata based on generalization. The goal is to builda generalized description satisfied by at leastkdomain objects and toreplace these domain objects by the description. The way to constructthat generalization is similar that the one used in growing decision trees.Records that cannot be generalized satisfactorily are discarded, thereforesome information is lost. In the experiments we performed we prove thatthe new approach gives good results.
Generalization and Suppression are two of the most used techniques to achieve k-anonymity. However, the generalization concept is also used in machine learning to obtain domain models useful for the classification task, and the suppression is the way to achieve such generalization. In this paper we want to address the anonymization of data preserving the classification task. What we propose is to use machine learning methods to obtain partial domain theories formed by partial descriptions of classes. Differently than in machine learning, we impose that such descriptions be as specific as possible, i.e., formed by the maximum number of attributes. This is achieved by suppressing some values of some records. In our method, we suppress only a particular value of an attribute in only a subset of records, that is, we use local suppression. This avoids one of the problems of global suppression that is the loss of more information than necessary.
Data science applications often need to deal with data that does not fit into the standard entity-attribute-value model. In this chapter we discuss three of these other types of data. We discuss texts, images and graphs. The importance of social media is one of the reason for the interest on graphs as they are a way to represent social networks and, in general, any type of interaction between people. In this chapter we present examples of tools that can be used to extract information and, thus, analyze these three types of data. In particular, we discuss topic modeling using a hierarchical statistical model as a way to extract relevant topics from texts, image analysis using convolutional neural networks, and measures and visual methods to summarize information from graphs.
Non-additive measures generalize additive measures, and have been utilized in several applications. They are used to represent different types of uncertainty and also to represent importance in data aggregation. As non-additive measures are set functions, the number of values to be considered grows exponentially. This makes difficult their definition but also their interpretation and understanding. In order to support understability, this paper explores the topic of visualizing discrete non-additive measures using node-link diagram representations.
The problem of anonymization in large networks and the utility of released data are considered in this paper. Although there are some anonymization methods for networks, most of them cannot be applied in large networks because of their complexity. In this paper, we devise a simple and efficient algorithm for k-degree anonymity in large networks. Our algorithm constructs a k-degree anonymous network by the minimum number of edge modifications. We compare our algorithm with other well-known k-degree anonymous algorithms and demonstrate that information loss in real networks is lowered. Moreover, we consider the edge relevance in order to improve the data utility on anonymized networks. By considering the neighbourhood centrality score of each edge, we preserve the most important edges of the network, reducing the information loss and increasing the data utility. An evaluation of clustering processes is performed on our algorithm, proving that edge neighbourhood centrality increases data utility. Lastly, we apply our algorithm to different large real datasets and demonstrate their efficiency and practical utility.
Recently, a huge amount of social networks have been made publicly available. In parallel, several definitions and methods have been proposed to protect users’ privacy when publicly releasing these data. Some of them were picked out from relational dataset anonymization techniques, which are riper than network anonymization techniques. In this paper we summarize privacy-preserving techniques, focusing on graph-modification methods which alter graph’s structure and release the entire anonymous network. These methods allow researchers and third-parties to apply all graph-mining processes on anonymous data, from local to global knowledge extraction.
Based on the link between Sugeno integrals and fuzzy measures, we discuss several algebraic properties of discrete Sugeno integrals. We recall that the composition of Sugeno integrals is again a Sugeno integral, and that each Sugeno integral can be obtained as a composition of binary Sugeno integrals. In particular, we discuss the associativity, dominance, commuting and bisymmetry of Sugeno integrals.
The Analytical Hierarchy Process (AHP) has been extensively used to interview experts in order to find the weights of the criteria. We call AHP-like matrices relative preferences of weights. In this paper we propose another type of matrix that we call a absolute preference matrix. They are also used to find weights, and we propose that they can be applied to find the weights of weighted means and also of the Choquet integral.
Aggregation functions are extensively used in decision making processes to combine available information. Arithmetic mean and weighted mean are some of the most used ones. In order to use a weighted mean, we need to define its weights. The Analytical Hierarchy Process (AHP) is a well known technique used to obtain weights based on interviews with experts. From the interviews we define a matrix of pairwise comparisons of the importance of the weights. We call these AHP-like matrices absolute preferences of weights. We propose another type of matrix that we call a relative preference matrix. We define this matrix with the same goal—to find the weights for weighted aggregators. We discuss how it can be used for eliciting the weights for the weighted mean and define a similar approach for the Choquet integral.
The rapid dynamics of COVID-19 calls for quick and effective tracking of virus transmission chains and early detection of outbreaks, especially in the "phase 2" of the pandemic, when lockdown and other restriction measures are progressively withdrawn, in order to avoid or minimize contagion resurgence. For this purpose, contact-tracing apps are being proposed for large scale adoption by many countries. A centralized approach, where data sensed by the app are all sent to a nation-wide server, raises concerns about citizens' privacy and needlessly strong digital surveillance, thus alerting us to the need to minimize personal data collection and avoiding location tracking. We advocate the conceptual advantage of a decentralized approach, where both contact and location data are collected exclusively in individual citizens' "personal data stores", to be shared separately and selectively (e.g., with a backend system, but possibly also with other citizens), voluntarily, only when the citizen has tested positive for COVID-19, and with a privacy preserving level of granularity. This approach better protects the personal sphere of citizens and affords multiple benefits: it allows for detailed information gathering for infected people in a privacy-preserving fashion; and, in turn this enables both contact tracing, and, the early detection of outbreak hotspots on more finely-granulated geographic scale. The decentralized approach is also scalable to large populations, in that only the data of positive patients need be handled at a central level. Our recommendation is two-fold. First to extend existing decentralized architectures with a light touch, in order to manage the collection of location data locally on the device, and allowthe user to share spatio-temporal aggregates - if and when they want and for specific aims - with health authorities, for instance. Second, we favour a longerterm pursuit of realizing a Personal Data Store vision, giving users the opportunity to contribute to collective good in the measure they want, enhancing self-awareness, and cultivating collective efforts for rebuilding society.
The necessity of dealing with uncertainty in real world problems has been a long-term research challenge which has originated different methodologies and theories. Recently, the concept of Hesitant Fuzzy Sets (HFSs) has been introduced to model the uncertainty that often appears when it is necessary to establish the membership degree of an element and there are some possible values that make to hesitate about which one would be the right one. Many researchers have paid attention on this concept who have proposed diverse extensions, relationships with other types of fuzzy sets, different types of operators to compute with this type of information, applications on information fusion and decision-making, etc.
Nevertheless, some of these proposals are questionable, because they are straightforward extensions of previous works or they do not use the concept of HFSs in a suitable way. Therefore, this position paper studies the necessity of HFSs and provides a discussion about current proposals including a guideline that the proposals should follow and some challenges of HFSs.
Since the concept of fuzzy set was introduced, different extensions and generalizations have been proposed to manage the uncertainty in different problems. This chapter is focused in a recent extension so-called hesitant fuzzy set. Many researchers have paid attention on it and have proposed different extensions both in quantitative and qualitative contexts. Several concepts, basic operations and its extensions are revised in this chapter.
This chapter gives a general introduction to data science as a concept and to the topics covered in this book. First, we present a rough definition of data science, and point out how it relates to the areas of statistics, machine learning and big data technologies. Then, we review some of the most relevant tools that can be used in data science ranging from optimization to software. We also discuss the relevance of building models from data. The chapter ends with a detailed review of the structure of the book.
In this work we present an algorithm for k-anonymization of datasets that are changing over time. It is intended for preventing identity disclosure in dynamic datasets via microaggregation. It supports adding, deleting and updating records in a database, while keeping k-anonymity on each release. We carry out experiments on database anonymization. We expected that the additional constraints for k-anonymization of dynamic databases would entail a larger information loss, however it stays close to MDAV's information loss for static databases. Finally, we carry out a proof of concept experiment with directed degree sequence anonymization, in which the removal or addition of records, implies the modification of other records.
Several methods for providing edge and node-differential privacy for graphs have been devised. However, most of them publish graph statistics, not the edge-set of the randomized graph. We present a method for graph randomization that provides randomized response and allows for publishing differentially private graphs. We show that this method can be applied to sanitize data to train collaborative filtering algorithms for recommender systems. Our results afford plausible deniability to users in relation to their interests, with a controlled probability predefined by the user or the data controller. We show in an experiment with Facebook Likes data and psychodemographic profiles, that the accuracy of the profiling algorithms is preserved even when they are trained with differentially private data. Finally, we define privacy metrics to compare our method for different parameters of e with a k-anonymization method on the MovieLens dataset for movie recommendations.
Recently, we have found that the concept of P-stability has interesting applications in network privacy. In the context of Online Social Networks it may be used for obtaining a fully polynomial randomized approximation scheme for graph masking and measuring disclosure risk. Also by using the characterization for P-stable sequences from Jerrum, McKay and Sinclair (1992) it is possible to obtain optimal approximations for the problem of k-degree anonymity. In this paper, we present results on P-stability considering the additional restriction that the degree sequence must not intersect the edges of an excluded graph X, improving earlier results on P-stability. As a consequence we extend the P-stable classes of scale-free networks from Torra et al. (2015), obtain an optimal solution for k-anonymity and prove that all the known conditions for P-stability are sufficient for sequences to be graphic. (C) 2016 Elsevier B.V. All rights reserved.
Mobility data mining can improve decision making, from planning transports in metropolitan areas to localizing services in towns. However, unrestricted access to such data may reveal sensible locations and pose safety risks if the data is associated to a specific moving individual. This is one of the many reasons to consider trajectory anonymization. Some anonymization methods rely on grouping individual registers on a database and publishing summaries in such a way that individual information is protected inside the group. Other approaches consist of adding noise, such as differential privacy, in a way that the presence of an individual cannot be inferred from the data. In this paper, we present a perturbative anonymization method based on swapping segments for trajectory data (SwapMob). It preserves the aggregate information of the spatial database and at the same time, provides anonymity to the individuals. We have performed tests on a set of GPS trajectories of 10,357 taxis during the period of Feb. 2 to Feb. 8, 2008, within Beijing. We show that home addresses and POIs of specific individuals cannot be inferred after anonymizing them with SwapMob, and remark that the aggregate mobility data is preserved without changes, such as the average length of trajectories or the number of cars and their directions on any given zone at a specific time.
Real-time mobility data is useful for several applications such as planning transports in metropolitan areas or localizing services in towns. However, if such data is collected without any privacy protection it may reveal sensible locations and pose safety risks to an individual associated to it. Thus, mobility data must be anonymized preferably at the time of collection. In this paper, we consider the SwapMob algorithm that mitigates privacy risks by swapping partial trajectories. We formalize the concept of sufficient sanitizer and show that the SwapMob algorithm is a sufficient sanitizer for various statistical decision problems. That is, it preserves the aggregate information of the spatial database in the form of sufficient statistics and also provides privacy to the individuals. This may be used for personalized assistants taking advantage of users’ locations, so they can ensure user privacy while providing accurate response to the user requirements. We measure the privacy provided by SwapMob as the Adversary Information Gain, which measures the capability of an adversary to leverage his knowledge of exact data points to infer a larger segment of the sanitized trajectory. We test the utility of the data obtained after applying SwapMob sanitization in terms of Origin-Destination matrices, a fundamental tool in transportation modelling.
In this paper we study conditions to approximate a given graph by a regular one. We obtain optimal conditions for a few metrics such as the edge rotation distance for graphs, the rectilinear and the Euclidean distance over degree sequences. Then, we require the approximation to have at least kk copies of each value in the degree sequence, this is a property proceeding from data privacy that is called kk-degree anonymity.
We give a sufficient condition in order for a degree sequence to be graphic that depends only on its length and its maximum and minimum degrees. Using this condition we give an optimal solution of kk-degree anonymity for the Euclidean distance when the sum of the degrees in the anonymized degree sequence is even. We present algorithms that may be used for obtaining all the mentioned anonymizations.
Fuzzy measures are used to express background knowledge of the information sources. In fuzzy rule-based models, the rule confidence gives an important information about the final classes and their relevance. This work proposes to use fuzzy measures and integrals to combine rules confidences when making a decision. A Sugeno $$\lambda $$ -measure and a distorted probability have been used in this process. A clinical decision support system (CDSS) has been built by applying this approach to a medical dataset. Then we use our system to estimate the risk of developing diabetic retinopathy. We show performance results comparing our system with others in the literature.
Most of the privacy-preserving techniques suffer from an inevitable utility loss due to different perturbations carried out on the input data or the models in order to gain privacy. When it comes to machine learning (ML) based prediction models, accuracy is the key criterion for model selection. Thus, an accuracy loss due to privacy implementations is undesirable. The motivation of this work, is to implement the privacy model "integral privacy" and to evaluate its eligibility as a technique for machine learning model selection while preserving model utility. In this paper, a linear regression approximation method is implemented based on integral privacy which ensures high accuracy and robustness while maintaining a degree of privacy for ML models. The proposed method uses a re-sampling based estimator to construct linear regression model which is coupled with a rounding based data discretization method to support integral privacy principles. The implementation is evaluated in comparison with differential privacy in terms of privacy, accuracy and robustness of the output ML models. In comparison, integral privacy based solution provides a better solution with respect to the above criteria.
Membership inference attacks (MIA) have been identified as a distinct threat to privacy when sensitive personal data are used to train the machine learning (ML) models. This work is aimed at deepening our understanding with respect to the existing black-box MIAs while introducing a new label only MIA model. The proposed MIA model can successfully exploit the well generalized models challenging the conventional wisdom that states generalized models are immune to membership inference. Through systematic experimentation, we show that the proposed MIA model can outperform the existing attack models while being more resilient towards manipulations to the membership inference results caused by the selection of membership validation data.
Data analysis is expected to provide accurate descriptions of the data. However, this is in opposition to privacy requirements when working with sensitive data. In this case, there is a need to ensure that no disclosure of sensitive information takes place by releasing the data analysis results. Therefore, privacy-preserving data analysis has become significant. Enforcing strict privacy guarantees can significantly distort data or the results of the data analysis, thus limiting their analytical utility (i.e., differential privacy). In an attempt to address this issue, in this paper we discuss how “integral privacy”; a re-sampling based privacy model; can be used to compute descriptive statistics of a given dataset with high utility. In integral privacy, privacy is achieved through the notion of stability, which leads to release of the least susceptible data analysis result towards the changes in the input dataset. Here, stability is explained by the relative frequency of different generators (re-samples of data) that lead to the same data analysis results. In this work, we compare the results of integrally private statistics with respect to different theoretical data distributions and real world data with differing parameters. Moreover, the results are compared with statistics obtained through differential privacy. Finally, through empirical analysis, it is shown that the integral privacy based approach has high utility and robustness compared to differential privacy. Due to the computational complexity of the method we propose that integral privacy to be more suitable towards small datasets where differential privacy performs poorly. However, adopting an efficient re-sampling mechanism can further improve the computational efficiency in terms of integral privacy. © 2019, The Author(s).
Privacy attacks targeting machine learning models are evolving. One of the primary goals of such attacks is to infer information about the training data used to construct the models. “Integral Privacy” focuses on machine learning and statistical models which explain how we can utilize intruder's uncertainty to provide a privacy guarantee against model comparison attacks. Through experimental results, we show how the distribution of models can be used to achieve integral privacy. Here, we observe two categories of machine learning models based on their frequency of occurrence in the model space. Then we explain the privacy implications of selecting each of them based on a new attack model and empirical results. Also, we provide recommendations for private model selection based on the accuracy and stability of the models along with the diversity of training data that can be used to generate the models.
Data anonymization irrecoverably transforms the raw data into a protected version by eliminating direct identifiers and removing sufficient details from indirect identifiers in order to minimize the risk of re-identification when there is a requirement for data publishing. Nevertheless, data protection laws (i.e., GDPR) do not consider anonymized data as personal data thus allowing them to be freely used, analysed, shared and monetized without a compliance risk. Motivated by the above advantages, it is plausible that the data controllers anonymize the data before releasing them for any data analysis tasks such as machine learning (ML); which is applied in a wide variety of domains where personal data are used. Moreover, in recent research, it has shown that ML models are vulnerable to privacy attacks as they retain sensitive information from the training data. Taking all of these facts into consideration, in this work we explore the interplay between data anonymization and ML with the ultimate aim of clarifying whether data anonymization is sufficient to achieve privacy for ML under different adversarial scenarios. We also discuss the challenges and opportunities of integrating these two domains. As per our findings, it is conspicuous that in order to substantially minimize the privacy risks in ML, existing data anonymization techniques have to be applied with high privacy levels that cause a deterioration in model utility.
“Rounding” can be understood as a way to coarsen continuous data. That is, low level and infrequent values are replaced by high-level and more frequent representative values. This concept is explored as a method for data privacy with techniques like rounding, microaggregation, and generalisation. This concept is explored as a method for data privacy in statistical disclosure control literature with perturbative techniques like rounding, microaggregation and non-perturbative methods like generalisation. Even though “rounding” is well known as a numerical data protection method, it has not been studied in depth or evaluated empirically to the best of our knowledge. This work is motivated by three objectives, (1) to study the alternative methods of obtaining the rounding values to represent a given continuous variable, (2) to empirically evaluate rounding as a data protection technique based on information loss (IL) and disclosure risk (DR), and (3) to analyse the impact of data rounding on machine learning based models. Here, in order to obtain the rounding values we consider discretization methods introduced in the unsupervised machine learning literature along with microaggregation and re-sampling based approaches. The results indicate that microaggregation based techniques are preferred over unsupervised discretization methods due to their fair trade-off between IL and DR.
In the light of stringent privacy laws, data anonymization not only supports privacy preserving data publication (PPDP) but also improves the flexibility of micro-data analysis. Machine learning (ML) is widely used for personal data analysis in the present day thus, it is paramount to understand how to effectively use data anonymization in the ML context. In this work, we introduce an anonymization framework based on the notion of “probabilistic k-anonymity” that can be applied with respect to mixed datasets while addressing the challenges brought forward by the existing syntactic privacy models in the context of ML. Through systematic empirical evaluation, we show that the proposed approach can effectively limit the disclosure risk in micro-data publishing while maintaining a high utility for the ML models induced from the anonymized data.
In data privacy, the evaluation of the disclosure risk has to take into account the fact that several releases of the same or similar information about a population are common. In this paper we discuss this issue within the scope of k-anonymity. We also show how this issue is related to the publication of privacy protected databases that consist of linked tables. We present algorithms for the implementation of k-anonymity for this type of data.
In this paper we discuss some tools for graph perturbation with applications to data privacy. We present and analyse two different approaches. One is based on matrix decomposition and the other on graph partitioning. We discuss these methods and show that they belong to two traditions in data protection: noise addition/microaggregation and k-anonymity.
In this paper we discuss the relations between clustering and error correcting codes. We show that clustering can be used for constructing error correcting codes. We review the previous works found in the literature about this issue, and propose a modification of a previous work that can be used for code construction from a set of proposed codewords.
In this article we provide a formal framework for reidentification in general. We define n-confusion as a concept for modeling the anonymity of a database table and we prove that n-confusion is a generalization of k-anonymity. After a short survey on the different available definitions of k-anonymity for graphs we provide a new definition for k-anonymous graph, which we consider to be the correct definition. We provide a description of the k-anonymous graphs, both for the regular and the non-regular case. We also introduce the more flexible concept of (k, l)-anonymous graph. Our definition of (k, l)-anonymous graph is meant to replace a previous definition of (k, l)-anonymous graph, which we here prove to have severe weaknesses. Finally, we provide a set of algorithms for k-anonymization of graphs.
Data privacy studies methods to ensure that disclosure of sensitive information does not take place. Masking methods are applied to databases prior to their release so that intruders cannot access sensitive information. Masking methods modify the data reducing its quality. Information loss measures have been defined to evaluate in what extent data is still useful for particular analysis. In the case of big data, masking data and evaluating its utility is a complex problem. In this paper we focus on information loss measurement and we explore if we can estimate or give bounds of information loss for large data sets using only random subsets of the whole data set.
The Choquet integral permits us to integrate a function with respect to a non-additive measure. When the measure is additive it corresponds to the Lebesgue integral. This integral was used recently to define families of probability-density functions. They are the exponential family of Choquet integral (CI) based class-conditional probability-density functions, and the exponential family of Choquet– Mahalanobis integral (CMI) based class-conditional probability-density functions. The latter being a generalization of the former, and also a generalization of the normal distribution.
In this paper we study some properties of these distributions, and study the application of a few normality tests.
k-Anonymity and differential privacy can be considered examples of Boolean definitions of disclosure risk. In contrast, record linkage and uniqueness are examples of quantitative measures of risk. Record linkage is a powerful approach because it can model different types of scenarios in which an adversary attacks a protected database with some information and background knowledge. Transparency holds in data privacy when data is published together with details on their processing. This includes the data protection method used and its parameters. Intruders can use this information to improve their attacks. Specific record linkage algorithms can be defined to take into account this information, and to define more accurate disclosure risk measures. Machine learning and optimization techniques also permits us to increase the effectiveness of record linkage algorithms. This talk will be focused on disclosure risk measures based on record linkage. We will describe how we can improve the performance of the algorithms under the transparency principle, as well as using machine learning and optimization techniques.
Transparency has an important effect on disclosure risk. In general, masking methods have to be evaluated taking into account that intruders can use all available information to attack the data. When the masking method as well as their parameters are disclosed, this information can also be used by an intruder. In this talk we will review results on the effects of transparency in disclosure risk assessment for microdata giving special emphasis to microaggregation.
Masking methods are used in data privacy to avoid the disclosure of sensitive information. Microaggregation is a perturbative masking method that has been proven effective. Data masked using microaggregation can be attacked when the intruder has information of the masking method and the parameters used. Publishing this information is usual under the transparency principle. Fuzzy microaggregation was introduced a few years ago to avoid this type of transparency attacks. In this paper we propose a new simpler method for microaggregation based on fuzzy c-means. We discuss the effectiveness of the approach. One of the advantages of this approach is its computational complexity.
Andness directed aggregation is about selecting aggregators from a desired andness level. In this paper we consider operators of the OWA and WOWA families: aggregation functions that permit us to represent some degree of compensation of the input values. In addition to compensation, WOWA permits us to represent importance (weights) of the input values. Selection of appropriate parameters given an andness level will be based on families of fuzzy quantifiers.
Choquet integrals integrate functions with respect to fuzzy measures. From a mathematical point of view these integrals generalize the Lebesgue integrals when the measures are additive. From a point of view of aggregation functions, one of the relevant aspects is that they generalize the weighted mean and the OWA. Choquet integrals have been successfully used in decision making problems when there are interactions between criteria. In this setting we can learn or identify the measures from a set of decisions. This fact seems to indicate that we can consider data as generated from distributions based on the Choquet integral. We will present some results on these types of distributions and on their generalizations.
Priorities are essential in the analytic hierarchy process (AHP). Several approaches have been proposed to derive priorities in the framework of the AHP. Priorities correspond to the weights in the weighted mean as well as in other aggregation operators as the ordered weighted averaging (OWA) operators, and the quasi-arithmetic means.
Derivation of priorities for the AHP typically starts by eliciting a preference matrix from an expert and then using this matrix to obtain the vector priorities. For consistent matrices, the vector of priorities is unique. Nevertheless, it is usual that the matrix is not consistent. In this case, different methods exist for extracting this vector from the matrix.
This article introduces a method for this purpose when the cells of the matrix are not a single value but a set of values. That is, we have a set-valued preference matrix. We discuss the relation of this type of matrices and hesitant fuzzy preference relations.
In a recent paper we introduced a definition of f-divergence for non-additive measures. In this paper we use this result to give a definition of entropy for non-additive measures in a continuous setting. It is based on the KL divergence for this type of measures. We prove some properties and show that we can use it to find a measure satisfying the principle of minimum discrimination.