Examples Of Discrimination In Data Mining

1570 Words7 Pages

Discrimination Prevention in Data Mining for Intrusion and Crime Detection PUSHKAR ASWALE, BHAGYASHREE BORADE,SIDDHARTH BHOJWANI, NIRAJ GOJUMGUNDE DEPT. OF COMPUTER ENGINEERING MIT ACADEMY OF ENGINEERING ALANDI(D) PUNE Abstract— Data mining involves the extraction of implicit previously unknown and potentially useful knowledge from large databases. The important issue in data mining is discrimination. Discrimination can be viewed as the act of unfairly treating people on the basis that they belong to a specific group. For instance, individuals may be discriminated because of their ideology, gender, age etc. In Economics and Social Sciences, discrimination has been studied for over half a century. There are several decision-making …show more content…

The new solution to the CND problem by introducing a sampling scheme for making the discrimination free instead of relabeling the data set. The algorithm is used in this paper is classification algorithm. The goal of classification is to accurately predict the target class for each care in the data. It predicts categorical labels and classify the data based on the training set and the values in a classifying attribute and uses it in classifying new data. The techniques used in this paper is Pre-processing, Preferential sampling, Over sampling, Uniform sampling. In preprocessing there are a lot tangential and excess data present or noisy and so knowledge uncovering during the aiming stage are a lot of elaborated . Data preparation and filtering steps can contain considerable amount of processing period. Data pre-processing includes cleaning, normalization, transformation and characteristic extraction selection. In Preferential sampling the process that determines the data location and the process being modeled are stochastically dependent. In the over sample are the action of sample importantly higher than the doubly the twice the band width or peak relative frequency of the signal comprising sampled. Over sampling sample aids avoid aliasing, answer and brings down noise. The equation is used fs=2bb Where fs are the sample relative frequency and b are the bandwidth or maximum relative frequency of signal. Then …show more content…

The problems outlined above can be eliminated when production data is collected automatically. When production data is collected automatically as it happens, you can be assured that it is timely, accurate, and unbiased. Until recently, automatically collecting production data was a costly and unreliable proposition. There are, however, negative social perceptions about data mining, among which potential privacy invasion and potential discrimination. The latter consists of unfairly treating people on the basis of their belonging to a specific group. Automated data collection and data mining techniques such as classification rule mining have paved the way to making automated decisions, like loan granting/denial, insurance premium computation, etc. If the training data sets are biased in what regards discriminatory (sensitive) attributes like gender, race, religion, etc., discriminatory decisions may ensue. For this reason, antidiscrimination techniques including discrimination discovery and prevention have been introduced in data mining. Services in the information society allow for automatic and routine collection of large amounts of data. Those data are often used to train association/classification rules in view of making automated decisions, like loan granting/denial, insurance premium computation, personnel selection, etc. At first sight, automating decisions may give a sense of fairness: classification rules do not guide themselves by personal