A. Group Assignment
a. Discuss the two data mining methodologies
The process of going through massive sets of data looking out for unsuspected patterns which can provide us with advantageous information is known as data mining. With data mining, it is more than possible or helping us predict future events or even group populations of people into similar characteristics.
Cross Industry Standard Process for Data Mining (CRISP-DM) is a 6-phase model of the entire data mining process which is commonly used across industries for a wide array of data mining projects and provides a structured approach to planning a data mining project. The 6 phases are:
Business Understanding – Focuses and understand what the project objectives, requirements
…show more content…
From this phase, we generated a dashboard from SAS with the data set which we already had. From the dashboard, we used the different data to create four different charts to create our hypothesis. The four different charts are pie, simple bar, stacked bar and needle plot. While creating the charts, we were given the chance to select the two different corresponding data so that we can obtain a chart which made most sense to us. Data Preparation – Decides the data used and covers all activities to construct the final dataset from the initial raw data with the relevant data mining goals, quality and technical constraints.
In this phase, we had to select the data which we wanted to input and reject. Under this phase, we can choose certain variables to reject as they are not relevant to our data and it would not help us in concluding for our hypothesis.
Modelling – Various specific modelling techniques are selected and applied. Their parameters are calibrated to obtain the optimal
…show more content…
This bar graph shows the quantity sold of the products. Product 2822 sold the most, contributing mostly to the total sales. Hence, most of the product bought during the month of March is product 2822.
The graph shows that product 2816 has more total sales value than 2822. Even though a large amount of 2822 was sold during the outbreak as compared to 2816, it still did not bring in the most sales.
Thus, the hypothesis is wrong.
Deployment:
The company should not focus on products that cure the specific disease during an outbreak to increase sales. During outbreaks, the company should not purposely stock up and promote more of the specific cure for the outbreak. They should focus on other products that bring in more total sales as an outbreak is not a major factor in bringing in more total sales, revenue for the company.
Individual – Jan
After studying all the data, I have come up with the hypothesis that when the employee SalesRepFN130 sells the product Item5 CAP 110 MG 10’s, more revenues will be earned. Clustering technique is used for the pharmaceutical data set. With the hypothesis, I have decided to use the variables Employee ID, Item ID and PNR 8030 to generate the charts