Harrisburg University Of Science And Technology Hi**We aren't endorsed by this school
Course
ANALYSIS 500
Subject
Information Systems
Date
Dec 18, 2024
Pages
15
Uploaded by CoachJaguar4619
CorrelationDaniela Atayde2024-08-28Citations: ChatGPT, Google, Chap 8 SlidesTitle: Big Data Analytics Services for Enhancing Business IntelligenceAbstract: This article examines how to use big data analytics services to enhance business intelligence (BI). More specifically, this article proposes an ontology of big data analytics and presents a big data analytics service-oriented architecture (BASOA), and then applies BASOA to BI, where our surveyed data analysis shows that the proposed BASOA is viable for enhancing BI and enterprise information systems. This article also explores temporality, expectability, and relativity as the characteristics of intelligence in BI. These characteristics are what customers and decision makers expect from BI in terms of systems, products, and services of organizations. The proposed approach in this article might facilitate the research and development of business analytics, big data analytics, and BI as well as big data science and big data computing.Dataset:- Gender of the participant surveyed on these topics- Temporality: an average score of the rated ability to adapt to change over time 1 (not changing) to 7 (changing a lot)- Expectability: a rated degree of satisfaction with the BI- Relativity: average score rating of how much better one system is than another in BI 1 (not very good) to 7 (very good)- Positive emotion: how positive participants felt about BI (higher scores are more positive, ranges from 1 to 7).##Load the data correldata =read.csv("07_data.csv")Data Screening:Accuracy:a. Include output that indicates if the data are or are not accurate.b. If the data are not accurate, delete the inaccurate scores.c. Include a summary that shows that you fixed the inaccurate scores.summary(correldata)
## gender temporality expectability relativity ## Length:300 Min. :1.737 Min. :0.000 Min. :-2.301 ## Class :character 1st Qu.:2.823 1st Qu.:2.000 1st Qu.: 2.439 ## Mode :character Median :3.581 Median :3.000 Median : 3.564 ## Mean :3.532 Mean :3.643 Mean : 3.569 ## 3rd Qu.:4.225 3rd Qu.:5.000 3rd Qu.: 4.731 ## Max. :5.184 Max. :9.000 Max. :10.508 ## NA's :9 NA's :9 NA's :9 ## positive ## Min. :-0.9128 ## 1st Qu.: 2.0175 ## Median : 3.0780 ## Mean : 3.1446 ## 3rd Qu.: 4.2614 ## Max. : 8.2126 ## NA's :9correldata$relativity[ correldata$relativity <1] =NAcorreldata$relativity[ correldata$relativity >7] =NAcorreldata$positive[ correldata$positive <1] =NAcorreldata$positive[ correldata$positive >7] =NAsummary(correldata)## gender temporality expectability relativity ## Length:300 Min. :1.737 Min. :0.000 Min. :1.065 ## Class :character 1st Qu.:2.823 1st Qu.:2.000 1st Qu.:2.625 ## Mode :character Median :3.581 Median :3.000 Median :3.618 ## Mean :3.532 Mean :3.643 Mean :3.650 ## 3rd Qu.:4.225 3rd Qu.:5.000 3rd Qu.:4.676 ## Max. :5.184 Max. :9.000 Max. :6.952 ## NA's :9 NA's :9 NA's :44 ## positive ## Min. :1.053 ## 1st Qu.:2.337 ## Median :3.250 ## Mean :3.423 ## 3rd Qu.:4.396 ## Max. :6.918 ## NA's :41
Missing:a. Since any accuracy errors will create more than 5% missing data, exclude all data pairwise for the rest of the analyses. nomissing =correldatasummary(nomissing)## gender temporality expectability relativity ## Length:300 Min. :1.737 Min. :0.000 Min. :1.065 ## Class :character 1st Qu.:2.823 1st Qu.:2.000 1st Qu.:2.625 ## Mode :character Median :3.581 Median :3.000 Median :3.618 ## Mean :3.532 Mean :3.643 Mean :3.650 ## 3rd Qu.:4.225 3rd Qu.:5.000 3rd Qu.:4.676 ## Max. :5.184 Max. :9.000 Max. :6.952 ## NA's :9 NA's :9 NA's :44 ## positive ## Min. :1.053 ## 1st Qu.:2.337 ## Median :3.250 ## Mean :3.423 ## 3rd Qu.:4.396 ## Max. :6.918 ## NA's :41Outliers:a. Include a summary of your mahal scores.b. What are the df for your Mahalanobis cutoff? 4c. What is the cut off score for your Mahalanobis measure? 18.46683d. How many outliers did you have? No outliersmahal =mahalanobis(nomissing[ , -1],colMeans(nomissing[ , -1], na.rm =TRUE),cov(nomissing[ , -1], use = "pairwise.complete.obs"))cutoff =qchisq(1-.001, ncol(nomissing[, -1]))cutoff## [1] 18.46683ncol(nomissing[,-1])## [1] 4summary(mahal <cutoff)## Mode TRUE NA's ## logical 207 93
Assumptions:Linearity:a. Include a picture that shows how you might assess multivariate linearity.b. Do you think you've met the assumption for linearity? YesNM =nomissingrandom =rchisq(nrow(NM), 7)fake =lm(random ~., data =NM)standardized =rstudent(fake)fitted =scale(fake$fitted.values){qqnorm(standardized)abline(0,1)}Normality:a. Include a picture that shows how you might assess multivariate normality.b. Do you think you've met the assumption for normality? Yeshist(standardized, breaks =15)
Homogeneity and Homoscedasticity:a. Include a picture that shows how you might assess multivariate homogeneity.b. Do you think you've met the assumption for homogeneity? Yesc. Do you think you've met the assumption for homoscedasticity? Yes{plot(fitted, standardized)abline(0,0)abline(v =0)}
Hypothesis Testing / Graphs:Create a scatter plot of temporality and relativity.a. Be sure to check x/y axis labels and length.b. What type of relationship do these two variables appear to have? Positivelibrary("ggplot2")cleanup <-theme(panel.grid.major =element_blank(),panel.grid.minor =element_blank(),panel.background =element_blank(),axis.line.x =element_line(color ="black"),axis.line.y =element_line(color ="black"),legend.key =element_rect(fill ="white"),text =element_text(size =15))scatter<-ggplot(NM, aes(temporality,relativity)) scatter+geom_point()+geom_smooth(method ="lm",color ="red")+xlab("Temporality")+ylab("Relativity")
## `geom_smooth()` using formula = 'y ~ x'## Warning: Removed 52 rows containing non-finite outside the scale range## (`stat_smooth()`).## Warning: Removed 52 rows containing missing values or values outside the scale range## (`geom_point()`).cor(NM$relativity, NM$temporality)## [1] NACreate a scatter plot of expectability and positive emotion.a. Include a linear line on the graph. b. Be sure to check x/y axis labels and length.c. What type of relationship do these two variables appear to have? Two variables have correlation chlose to 0 that means they have almostno relation. scatter<-ggplot(NM, aes(expectability , positive )) scatter+cleanup+geom_point()+geom_smooth(method ="lm",color ="navy")+xlab("expectability")+
ylab("positive")+coord_cartesian(ylim=c(0.5,7.5))## `geom_smooth()` using formula = 'y ~ x'## Warning: Removed 49 rows containing non-finite outside the scale range## (`stat_smooth()`).## Warning: Removed 49 rows containing missing values or values outside the scale range## (`geom_point()`).Create a scatter plot of expectability and relativity, grouping by gender.a. Include a linear line on the graph. b. Be sure to check x/y axis labels and length.c. What type of relationship do these two variables appear to have for each group? No relationship = women, while men slight positivescp2=ggplot(NM, aes(expectability, relativity, color =gender)) scp2 +cleanup+geom_point()+geom_smooth(method ="lm")+xlab("Expectability rate")+ylab("Relativity rate")+
coord_cartesian(ylim =c(0.5,7.5))+scale_fill_discrete(name ="Gender",labels =c("Men","Women"))+scale_fill_discrete(name ="Gender",labels =c("Men","Women"))## Scale for fill is already present.## Adding another scale for fill, which will replace the existing scale.## `geom_smooth()` using formula = 'y ~ x'## Warning: Removed 52 rows containing non-finite outside the scale range## (`stat_smooth()`).## Warning: Removed 52 rows containing missing values or values outside the scale range## (`geom_point()`).Include a correlation table of all of the variables (cor).a. Include the output for Pearson.b. Include the output for Spearman.c. Include the output for Kendall.d. Which correlation was the strongest? Temporatility & Gendere. For the correlations with gender, would point biserial or biserialbe more appropriate? Why? Point Biserial since they are from different groups.
## ## Pearson's product-moment correlation## ## data: relativity and temporality## t = 4.0776, df = 246, p-value = 6.147e-05## alternative hypothesis: true correlation is not equal to 0## 95 percent confidence interval:## 0.1311602 0.3647511## sample estimates:## cor ## 0.2516164Calculate the difference in correlations for 1) temporality and expectbility and 2) temporality and positive emotion.a. Include the output from the test through Pearson's test.b. Is there a significant difference in their correlations? Yes, P value is less than 0.5 #install.packages("ppcor")library(ppcor)## Loading required package: MASSwith(NM, cor.test(temporality, expectability)) ## ## Pearson's product-moment correlation## ## data: temporality and expectability## t = 5.4191, df = 281, p-value = 1.289e-07## alternative hypothesis: true correlation is not equal to 0## 95 percent confidence interval:## 0.1981097 0.4095120## sample estimates:## cor ## 0.3076019with(NM, cor.test(temporality, positive)) ## ## Pearson's product-moment correlation## ## data: temporality and positive## t = -4.1958, df = 249, p-value = 3.785e-05## alternative hypothesis: true correlation is not equal to 0## 95 percent confidence interval:## -0.3690471 -0.1375268## sample estimates:## cor ## -0.2569702
#install.packages("cocor")library(cocor)cocor(~temporality +expectability |temporality +positive,data = NM[,-1])## ## Results of a comparison of two overlapping correlations based on dependent groups## ## Comparison between r.jk (temporality, expectability) = 0.2999 and r.jh (temporality, positive) = -0.2421## Difference: r.jk - r.jh = 0.542## Related correlation: r.kh = -0.0468## Data: NM[, -1]: j = temporality, k = expectability, h = positive## Group size: n = 244## Null hypothesis: r.jk is equal to r.jh## Alternative hypothesis: r.jk is not equal to r.jh (two-sided)## Alpha: 0.05## ## pearson1898: Pearson and Filon's z (1898)## z = 6.4324, p-value = 0.0000## Null hypothesis rejected## ## hotelling1940: Hotelling's t (1940)## t = 6.2783, df = 241, p-value = 0.0000## Null hypothesis rejected## ## williams1959: Williams' t (1959)## t = 6.2766, df = 241, p-value = 0.0000## Null hypothesis rejected## ## olkin1967: Olkin's z (1967)## z = 6.4324, p-value = 0.0000## Null hypothesis rejected## ## dunn1969: Dunn and Clark's z (1969)## z = 6.0761, p-value = 0.0000## Null hypothesis rejected## ## hendrickson1970: Hendrickson, Stanley, and Hills' (1970) modification of Williams' t (1959)## t = 6.2777, df = 241, p-value = 0.0000## Null hypothesis rejected## ## steiger1980: Steiger's (1980) modification of Dunn and Clark's z (1969) using average correlations## z = 5.9687, p-value = 0.0000## Null hypothesis rejected## ## meng1992: Meng, Rosenthal, and Rubin's z (1992)
## z = 5.8685, p-value = 0.0000## Null hypothesis rejected## 95% confidence interval for r.jk - r.jh: 0.3706 0.7423## Null hypothesis rejected (Interval does not include 0)## ## hittner2003: Hittner, May, and Silver's (2003) modification of Dunnand Clark's z (1969) using a backtransformed average Fisher's (1921) Zprocedure## z = 5.9685, p-value = 0.0000## Null hypothesis rejected## ## zou2007: Zou's (2007) confidence interval## 95% confidence interval for r.jk - r.jh: 0.3709 0.7019## Null hypothesis rejected (Interval does not include 0)Calculate the difference in correlations for gender on temporality and relativity.a. Include the output from the test.b. Is there a significant difference in their correlations? Yes, P value is less than 0.5 library(cocor)men =subset(NM, gender =="men")women =subset(NM, gender =="women")genderlist =list(men, women)cocor(~temporality +relativity |temporality +relativity, data = genderlist)## ## Results of a comparison of two correlations based on independent groups## ## Comparison between r1.jk (temporality, relativity) = 0.1688 and r2.hm (temporality, relativity) = -0.0666## Difference: r1.jk - r2.hm = 0.2355## Data: genderlist: j = temporality, k = relativity, h = temporality,m = relativity## Group sizes: n1 = 130, n2 = 118## Null hypothesis: r1.jk is equal to r2.hm## Alternative hypothesis: r1.jk is not equal to r2.hm (two-sided)## Alpha: 0.05## ## fisher1925: Fisher's z (1925)## z = 1.8427, p-value = 0.0654## Null hypothesis retained## ## zou2007: Zou's (2007) confidence interval
## 95% confidence interval for r1.jk - r2.hm: -0.0153 0.4764## Null hypothesis retained (Interval includes 0)Calculate the partial and semipartial correlations for all variables, and include the output. a.Are any of the correlations significant after controlling for all other relationships? Partial correlationinstall.packages =("ppcor")library(ppcor)summary(NM)## gender temporality expectability relativity ## Length:300 Min. :1.737 Min. :0.000 Min. :1.065 ## Class :character 1st Qu.:2.823 1st Qu.:2.000 1st Qu.:2.625 ## Mode :character Median :3.581 Median :3.000 Median :3.618 ## Mean :3.532 Mean :3.643 Mean :3.650 ## 3rd Qu.:4.225 3rd Qu.:5.000 3rd Qu.:4.676 ## Max. :5.184 Max. :9.000 Max. :6.952 ## NA's :9 NA's :9 NA's :44 ## positive ## Min. :1.053 ## 1st Qu.:2.337 ## Median :3.250 ## Mean :3.423 ## 3rd Qu.:4.396 ## Max. :6.918 ## NA's :41data_NM =na.omit(NM)summary(data_NM)## gender temporality expectability relativity ## Length:207 Min. :1.737 Min. :0.00 Min. :1.065 ## Class :character 1st Qu.:2.865 1st Qu.:2.00 1st Qu.:2.634 ## Mode :character Median :3.522 Median :3.00 Median :3.591 ## Mean :3.511 Mean :3.57 Mean :3.661 ## 3rd Qu.:4.188 3rd Qu.:5.00 3rd Qu.:4.684 ## Max. :5.102 Max. :8.00 Max. :6.952 ## positive ## Min. :1.053 ## 1st Qu.:2.284 ## Median :3.193 ## Mean :3.395 ## 3rd Qu.:4.378 ## Max. :6.918
Theory:- What are we using as our model for understanding the data in a correlational analysis?> R or B- How might we determine model fit?> Confidence interval- What is the difference between correlation and covariance?> Covariance measures the direction of the relationship between two variables, while correlation measures both the strength and direction of the relationship between those variables.- What is the difference between R and r?> R, multi regression > r, simple bivariate - When would I want to use a nonparametric correlation over Pearson's correlation?> Pearson's When data is not normall distributed, non linear in relationships, has outlier, ordinal. > Nonparametric when you need more accurate measure of relationships - What is the distinction between semi-partial and partial correlations? > Partial correlation is removing an influence of one or more variables in both other variables, checking how much both variables are related to each other.> Semi-partial, is only removing influence on one variable.