Analysis of Association Rules for Big Data Using Apriori and FP-Growth Techniques
Abstract
There is huge collection of data from which information mining is little difficult so the analysis and decision making is made easy by proposing the association rules. Association rule mining plays an important role in data mining as it is one of the most popular methods. There are so many examples of association rule mining and one of the most famous examples is market basket analysis. The relationship between items of a data set is shown by association rules. In this paper, we analyze the performance of two techniques for different number of instances in data set. There are many tools and software for mining the data such as R, MEXL, SAS, and XLMINER
…show more content…
Storing and using the large data is not an issue, but getting the appropriate information from that data is quite a difficult job to do. The analysis of that collected data is made possible by many data mining techniques. In data mining we find the relation and patterns between the sets of items of larger relational databases which can help in predicting and improving the performance of the system. The relations between the data in data mining are found by a well-known approach, that is, association rule mining. Many association rules are found that relates the dependency of data on each other. Large number of association rules is generated by which we can also classify the kinds or class of database instances.
Association rule mining can define all the relationships even in moderate dataset. But the motive of association rule mining is not finding all the relationships but the set of interesting ones. The interestingness depends on the application. Therefore the set of rules are generated and are pruned to get rid of unnecessary association rules. Two strategically measures of association rule are support and confidence. These are the user defined measures of interestingness. The two terms support and confidence are the statistical significance of a rule and degree of certainty,
…show more content…
Association rules can help the doctors in decision making and medical diagnosis on the basis of relation of tests performed for the particular disease. Breast cancer Wisconsin Dataset of 699 instances and 10 attributes has been used for the extraction of association rules which provides high accuracy.
The use of association rule mining play an important role for the analysis of road accidents in India. As discussed in paper [3] and [4], apriori algorithm is applied to the road accident data set and the causes of accident are considered as the attributes. The large data set is classified into number of clusters and then the association rule mining techniques are applied to them to generate more efficient rules. It can help to reduce accident happening, find main factor and circumstances of causing accidents so that we can try to avoid