BM9717-Data Management and Visualization - 23046942

.docx
School
Northumbria University**We aren't endorsed by this school
Course
MARKETING NX0474
Subject
Marketing
Date
Jan 9, 2025
Pages
32
Uploaded by ProfParrotPerson1279
1BM9717DATA MANAGEMENT AND VISUALIZATIONFull Name of Student: Nandhini SathyamoorthyID Number: W23046942Programme Name: MSc Business AnalyticsWord Count: 3052
Background image
2AbstractThe study conducts a complete investigation of the dataset on Adidas, presenting crucial bits of knowledge for vital direction. The dataset includes customer conduct, market patterns, monetary pointers, and the effect of Adidas' showcasing procedures. The literature review explores these subjects, giving a comprehension of the athletic apparel industry. The findings highlight the dynamic interaction between consumer preferences, market dynamics, and strategic marketing in Adidas' business planning. Pricing, customer preferences, and predictive modeling performance were all uncovered by analyzing the Adidas dataset. The investigation covered information synopsis, model assessment measurements, and representations, offering an extensive view for key independent direction. The discoveries improve comprehension of market elements, helping with informed business systems for Adidas. At last, this examination contributes fundamental points of view for Adidas to explore a quickly developing business sector, encouraging supported development and competitiveness.
Background image
3Table of ContentsIntroduction......................................................................................................................................4Literature review of related work....................................................................................................4Consumer Behavior and Preferences.......................................................................................4Market Trends and Dynamics..................................................................................................6Economic Indicators and Consumer Spending.......................................................................7Data Analytics and Predictive Modeling.................................................................................8Data exploration...............................................................................................................................9Experiments...................................................................................................................................12Results............................................................................................................................................24Discussion, Conclusions, and Future Work...................................................................................27References......................................................................................................................................28
Background image
4Introduction The dataset being referred to a complete assessment of consumer behavior and market patterns, it is used to organize the revealed significant bits of knowledge into contemporary market elements. This dataset contains a wide range of variables, including consumer preferences, purchasing patterns, and economic indicators, making it an essential resource for both analysts and indicators. Its importance lies in its ability to enable partners with a comprehension of market influences, encouraging informed direction and strategic planning.Implemented inside this dataset are priceless bits of knowledge that enlighten the complexities of buyer conduct, revealing insight into arising patterns and idle different opportunities. By presenting into the fine subtleties of buyer decisions and preferences, organizations can purify their promoting methodologies, tailor item contributions, and gain an upper hand (Yang et al.2022). The dataset's capacity to perceive examples and connections among factors empowers the recognizable proof of different away market elements, working with the expectation of future patterns. Data-driven direction is vital in the period, this dataset arises as a foundation for organizations trying to explore the complicated scene of buyer markets. Data SourceKaggle- https://www.kaggle.com/datasets/whenamancodes/adidas-us-retail-products-datasetLiterature review of related work Consumer Behavior and PreferencesIn the domain of Adidas' market segments, a present investigation of buyer conduct and preferences are fundamental. This literature review present on this subject envelops a complete assessment of studies and insightful articles present in the elements that shape purchaser decisions inside the active apparel industry. Understanding purchaser conduct includes taking apart the mental and humanistic elements impacting people while choosing Adidas items over contenders (Breunig et al.2020). This subject investigates the effect of brand loyalty, way of life patterns, social impacts, and the job of advertising in forming customer attraction. By incorporating existing examination, one can acquire a comprehension of the inspirations driving to select Adidas inside the different active sportswear market.
Background image
5Figure 1: Addidas market trends analysis (Source: Qin et al.2020)This essential part present in molding buyer choices, and a literature review based on this in this space examines the elements impacting why certain socioeconomics favor explicit this Adidas items (Qin et al.2020). This incorporates an examination of plan execution, the meaning of presentable practices, and the appeal of mechanical advancements coordinated into Adidas stock. Businesses can present valuable insights into the ever-evolving landscape of consumer behavior and preferences by conducting this comprehensive literature review (Kudale et al.2022). This can be assist in the formulation of market strategies that resonate with the diverse and dynamic preferences of the target audience.Market Trends and DynamicsA literature review present in on market patterns and elements inside the setting of Adidas' strategies discloses an embroidery of experiences basic for exploring the competitive active scene. It have shown the developing patterns that shape purchaser assumptions and industry progressions, present insight into the complexities of market elements (Mohapatra et al2023). This research investigate the effect of the technological developments on item improvement, featuring the meaning of keeping up to date with the progressions to keep an upper hand (Ostadabbas et al. 2021). Market patterns encompass shifts in buyer interest, the rising it on manageable and morally obtained items, a feature that has presently observable recent years.
Background image
6Figure 2: Addidas online market trend analysis(Source: Gomes et al.2020)The dynamic of the athletic sportswear market is examined corresponding to monetary variables, worldwide occasions, and social movements. Adapting this strategy to appeal to customers and maintain relevance in a rapidly changing environment requires Adidas can understand these trends. Writing in this area frequently takes apart contextual analyses, industry reports, and shopper overviews to remove examples and projections, supporting organizations in expecting and answering really to advertise shifts (Gomes et al.2020). By drawing in with existing exploration, organizations can gather significant experiences into the beat of the market, empowering them to create systems that line up with arising patterns and elements, enabling supported development and market administration.Economic Indicators and Consumer SpendingThe investigation of financial pointers and shopper spending in the literature is principal for associations like Adidas planning to explore the unpredictable landscape of the athletic apparel industry. It has tested the proper relationship which is connected in between financial markets and shopper conduct, developing the effect of macroeconomic factors on spending designs. Various Concentrates frequently present how elements like Gross domestic product development, expansion rates, and business levels impact buyers' buying power, in this manner forming their inclination to spend on excessive things like athletic apparel (Gomes et al.2020). Companies like
Background image
7Adidas need to be aware of economic forecasts and indicators because it can be used as a barometer to present potential shifts in consumer spending behavior, as this literature demonstrates.Figure 3: Addidas economic Development(Source: Zhang et al.2022)This research reveals insight into the mental subtleties of customer spending during financial variances. It investigates how Adidas' sales and market position are affected by perceptions of economic stability or uncertainty. understanding the complicated transaction between financial markets and shopper spending is instrumental in planning versatile business techniques (Zhang et al.2022). The company is able to adapt pricing, marketing, and product positioning strategies in response to economic strategies thanks to this knowledge, ensuring agility in a market that is influenced by macroeconomic conditions.Data exploration In the domain of data investigation, the expert embraced a complete examination of the dataset, containing 845 records, utilizing R programming and different libraries, including readr, dplyr, ggplot2, and tidyr. The dataset incorporated different properties, URL, item subtleties, evaluating data, and straight-out orders. An initial investigation was carried out by the analyst, who able to addressed missing values and carried out data manipulations as well as delving into the structure and statistics of the dataset. It led regression and classification to predicting the selling costs and item classes, individually (Yang et al.2022). Assessment measurements like RMSE, MAE, disarray grids, and accuracy review f1 scores were utilized to check model execution. In general, this careful information investigation disentangled the dataset's complexities as well as prepared for informed displaying and bits of knowledge into the basic examples inside the information.
Background image
8Figure 5: Basic structure of the datasetThe dataset, comprising of 845 records, shows an organized configuration with different properties giving experiences into Adidas items. The URL, product name, SKU, selling price, original price, currency, availability, color, category, source, and more are all included. The dataset is coordinated efficiently, working with exhaustive information investigation. Key mathematical properties like selling cost, unique cost, normal rating, and audits count uncover the dataset's fluctuation. Downright factors, including cash, accessibility, variety, class, and nation, add to an obvious order plot.
Background image
9Figure 6: Summary statisticsThe summary statistics for the Adidas dataset, involving 845 records, disclose a different scope of item credits. With selling costs going from 9 to 240, the middle at 48 proposes a differed valuing scene. Specifically, the initial price consistently registers as zero, pointing to potential data anomalies or particular instances of unrecorded original prices. The number of reviews varies widely from one to 11,750, indicating varying levels of customer engagement, and the average rating ranges from 1.0 to 5.0, demonstrating the diversity of product satisfaction. All out experiences uncover a prevailing presence of USD cash, basically accessible items ('InStock': 842), and predominant tones like White (222) and Dark (187).
Background image
10Figure 7: Checking null valuesDuring the data examination of the Adidas dataset, a cautious evaluation for missing characteristics was driven. The dataset, including 845 records, was analyzed for the presence of invalid characteristics across its various attributes. The investigation present in the certain sections, such as original_price, consistently displayed an invalid value of 0 throughout, suggesting possible information errors or specific instances in which this data was not recorded. This perception prompted the use of information attribution techniques, which resulted in the complete conversion of original_price to a numerical value and the replacement of missing qualities with 0.Experiments This part mainly executed various periods of information arranging, control, assessment, and portrayal on the Adidas dataset. The dataset, containing 845 records, went through exact cleaning strategies, including the difference in data types, managing missing characteristics, and making new features like discount_percentage (Yang et al. 2022). Out and out factors were unequivocally different over totally to factors, and special cases in selling_price were recognized through boxplots.Figure 8: Handling null values
Background image
11A cautious evaluation uncovered missing characteristics in various sections, inciting a purposeful method for managing them. Using the limits of libraries, for instance, readr, dplyr, ggplot2, and tidyr, the master executed data attribution strategies, supplanting missing characteristics in the original price area with zeros. Figure 9: Converting columns and creating new featuresThis factors like tone, classification, and accessibility into factor types using data change procedures. This step worked with sulk examinations and model design. In addition, it demonstrated originality by incorporating novel highlights and highlighting the discount_percentage variable. The introduction of the markdown rate feature upgraded the dataset with critical information for following exploratory and accurate assessments, showing the analyst's capacity in making huge elements to extend the dataset's logical profundity.Figure 10: Checking for outliers
Background image
12Demonstrating the data investigation it deliberately recognized exceptions in the 'selling_price' variable through fastidious boxplot examination. This graphical investigation divulged possible oddities in the dispersion of selling costs, featuring values fundamentally digressing from the standard. Such watchfulness in anomaly discovery guarantees the dependability of resulting examinations and models, underlining the expert's obligation to information quality. Figure 11: Dropping unnecessary columns and summary statistics of numerical and categorical dataThe dataset for centered investigations, superfluous sections, for example, 'url,' were wisely dropped. Following this, a succinct summary of categorical data (currency, availability, color, category, country) and numerical data (selling price, original price, average rating, reviews count, discount percentage) was presented. This essential decrease cleans up the dataset as well as works with a more clear comprehension of key factors.
Background image
13Figure 12: Distribution of Selling PricesThe distribution of selling costs was investigated through histograms, giving a visual description of the recurrence of various cost ranges. This examination uncovered the spread of costs across the dataset, featuring normal value stretches and expected exceptions. The histograms portrayed the grouping of items inside unambiguous cost sections, empowering a fast handle of the dataset's evaluating scene.Figure 13: Distribution of Average RatingAnalyzing the appropriation of normal evaluations in the dataset exhibited the recurrence of different rating levels. The values went from 1.0 to 5.0, exhibited the pervasiveness of specific rating scores. This study provided some insight into the levels of customer loyalty and emphasized the general opinion and quality mainly associated with the Adidas products in the dataset.
Background image
14Figure 14: Selling Price Distribution by CategoryThis distribution process about the selling costs across different thing groupings was explored, uncovering knowledge into assessing plans inside the dataset. By taking apart the selling costs inside each class, significant encounters were obtained into the assessing components and assortments among embellishments, clothing, and shoes. The consequences of this examination gave us a more inside and out cognizance of what the Adidas dataset's different item classes could mean for evaluating methodologies.Figure 15: Count of Products by Color
Background image
15This investigation shed light on the predominant use of particular hues, revealing valuable information about consumer preferences and potential trends. By understanding the dispersion of items across different varieties, scientists or examiners could make determinations about the ubiquity and market interest for various variety choices inside the Adidas item range.Figure 16: Reviews Count vs. Average RatingThe relationship presence in between Reviews Count and average Rating value was analyzed, revealing insight into the relationship between these two fundamental measurements in the Adidas dataset. This examination considered a comprehension of how the quantity of reviews connects with the typical rating of items. Such bits of knowledge can be significant for surveying consumer loyalty and commitment, offering a brief look into the elements between client criticism and item evaluations inside the Adidas item index.
Background image
16Figure 17: Correlation Matrix for Numeric VariablesThe relationship network for numeric factors in the Adidas dataset was determined and pictured. This investigation gave a complete outline of the connections between mathematical highlights, uncovering expected examples and relationship inside the information.Figure 18: Total Sales by Category
Background image
17Total sales deals by classification were investigated, uncovering bits of knowledge into item conveyance. The script created informative visualizations, such as bar plots, to show variations in sales and grouped data by category. This investigation shed light on the prominence and interest for Embellishments, Apparel, and Shoes inside the Adidas dataset.Figure 19: Distribution of Average RatingsThe script inspected the conveyance of normal evaluations across Adidas items, giving a far-reaching outline of consumer loyalty. Using histograms or comparative perceptions, it explored the spread and convergence of evaluations, offering important bits of knowledge into the generally seen nature of the items in the dataset.
Background image
18Figure 20: Selling Price vs Average RatingThis research present into the connection between selling costs and normal evaluations for Adidas items. Visualizations, disperse plots or other appropriate charts, it showed whether more expensive things interface with in general typical evaluations. This examination gives significant pieces of information into purchaser direct, determining if client’s accomplice more extreme expenses with better thing quality or features.
Background image
19Figure 21: Stock Availability by CategoryThe Adidas dataset's examination of stock availability by category revealed insights into footwear, clothing, and accessories stock availability. Through discernments or even depictions, the assessment showed how stock availability shifts inside each arrangement, uncovering understanding into potential interest models and supply considerations for different thing types.Figure 22: Category-wise Price Distribution with AverageThe characterization wise cost values with typical in the Adidas dataset gave a nuanced cognizance of assessing components across Extras, Dress, and Shoes. This assessment included researching the central penchant of selling costs inside each characterization, uncovering potential assessing examples or assortments that could affect buyer choices and purchasing approaches to acting for different sorts of items.
Background image
20Figure 23: Total Sales and Average Ratings by CategoryThe assessment of the various arrangements and typical assessments by grouping in the Adidas dataset shed light on the thing characterizations' noticeable quality and shopper dedication. This examination considered a careful perception of buyer tendencies, showing which orders attract extra arrangements as well as stay aware of higher typical assessments.Results This section of this research study shows the results and findings where the model has been fitted and accuracy has been generated. The images of the software section have been done and an explanation has also been provided here. Figure 24: Predicting Selling Price The above image that has been represented shows the prediction of the selling price. The results have been calculated for the Root Mean Squared Error which is 25.01865 and the Mean Absolute Error is also found out and that has result in 18.66069. The last result that has been found is R Squared and that has resulted in 0.2978753. The prediction capacity of a regression model at
Background image
21selling costs is displayed in Figure 20. An examination of the genuine selling costs and the qualities of the model is probably included. The representation assesses precision and distinguishes potential regions for improvement by offering bits of knowledge about how well the model matches genuine information.Figure 25: Regression Model: Actual vs Predicted PricesThe above image that is shown here is a graph or a scatter plot that provides a comparison between the regression model and predicted prices. The comparison between both has been done with the help of the dataset which is of Adidas. Based on this dataset the regression model has been analyzed and then compared with the predicted prices.
Background image
22Figure 26: Residuals vs. PredictedThe above image that is presented above shows the comparison by a scatter plot with the residuals and predicted. These two are compared based on the attributes of the dataset and variables which have resulted in a scatter plot to provide clear perception. Figure 27: Predicting category
Background image
23The above image that is presented here gives a clear picture of finding the accuracy with the help of the chosen dataset. The accuracy has resulted in 76 percent which is a very decent number for doing the software implementation section. Discussion, Conclusions, and Future Work The research study has used a dataset called the Adidas dataset which has been implemented and used with the help of the R Programming language. The machine learning techniques like regression have been followed up for doing the task. The regression has been fitted where the comparison has been depicted by using different kinds of plots like scatter a plot which gives a clear picture of the results (Saura et al. 2023). The result section has ended up by explaining about the accuracy part which has resulted in 76 percent which is considered to be a very decent number for doing the software section with the help of the R Programming language. The future recommendation of this study can be taken in use of the CNN model which will be a great use of showing perfect accuracy. CNN is also considered to be a great model for evaluating and performing model fitting, so by using this model the research can get a good accuracy for the future. ReferencesBreunig, M., Bradley, P.E., Jahn, M., Kuper, P., Mazroob, N., Rösch, N., Al-Doori, M., Stefanakis, E. and Jadidi, M., 2020. Geospatial data management research: Progress and future directions. ISPRS International Journal of Geo-Information, 9(2), p.95.Gomes, V.C., Queiroz, G.R. and Ferreira, K.R., 2020. An overview of platforms for big earth observation data management and analysis. Remote Sensing, 12(8), p.1253.Kudale, H.S., Phadnis, M.V., Chittar, P.J., Zarkar, K.P. and Bodhke, B.K., 2022. A Review Of Data Analysis And Visualization Of Olympics Using Pyspark And Dash-Plotly. Int. Res. J. Mod. Eng. Technol. Sci., 4, pp.2093-2097.Li, H., 2023. RETRACTED ARTICLE: Intelligent business framework for interactive data visualization of small and medium-sized enterprises in developing countries. Annals of Operations Research, 326(Suppl 1), pp.141-141.Mohapatra, S., Sainath, B., KC, A., Lal, H., K, N.R., Bhandari, G. and Nyika, J., 2023. Application of blockchain technology in the agri-food system: a systematic bibliometric visualization analysis and policy imperatives. Journal of Agribusiness in Developing and Emerging Economies.Ostadabbas, H., Merz, H. and Weippert, H., 2021. Integration of Urban Spatial Data Management and Visualization with Enterprise Applications Using Open-Source Software. The International
Background image
24Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 43, pp.307-312.Qin, X., Luo, Y., Tang, N. and Li, G., 2020. Making data visualization more efficient and effective: a survey. The VLDB Journal, 29, pp.93-117.Saiz-Rubio, V. and Rovira-Más, F., 2020. From smart farming towards agriculture 5.0: A review on crop data management. Agronomy, 10(2), p.207.Saura, J.R., Palacios-Marqués, D. and Barbosa, B., 2023. A review of digital family businesses: setting marketing strategies, business models and technology applications. International Journal of Entrepreneurial Behavior & Research, 29(1), pp.144-165.Vellido, A., 2020. The importance of interpretability and visualization in machine learning for applications in medicine and health care. Neural computing and applications, 32(24), pp.18069-18083.Yang, Y., Qu, G., Hua, L. and Wu, L., 2022. Knowledge mapping visualization analysis of research on blockchain in management and economics. Sustainability, 14(22), p.14971.Zhang, X., Zhi, Y., Xu, J. and Han, L., 2022. Digital Protection and Utilization of Architectural Heritage Using Knowledge Visualization. Buildings, 12(10), p.1604.AppendixBelow is the complete R codelibrary(readr)library(dplyr)library(ggplot2)library(tidyr)# Import the datasetadidas_data <- read_csv("adidas.csv")# Data Exploration and Preparationstr(adidas_data)summary(adidas_data)
Background image
25sum(is.na(adidas_data))missing_values <- sapply(adidas_data, function(x) sum(is.na(x)))missing_data_df <- data.frame(Column = names(missing_values),MissingValues = missing_values)ggplot(missing_data_df, aes(x = Column, y = MissingValues)) + geom_bar(stat = "identity", fill = 'blue') +theme(axis.text.x = element_text(angle = 90, hjust = 1)) +labs(title = "Missing Values in Each Column", x = "Column", y = "Number of Missing Values")# Data Manipulationadidas_data$original_price <- as.numeric(adidas_data$original_price)adidas_data$original_price[is.na(adidas_data$original_price)] <- 0sum(is.na(adidas_data))summary(adidas_data)adidas_data$crawled_at <- as.Date(adidas_data$crawled_at)adidas_data$discount_percentage <- (1 - adidas_data$selling_price / adidas_data$original_price) * 100adidas_data$color <- as.factor(adidas_data$color)adidas_data$category <- as.factor(adidas_data$category)adidas_data$availability <- as.factor(adidas_data$availability)
Background image
26adidas_data$country <- as.factor(adidas_data$country)adidas_data$currency <- as.factor(adidas_data$currency)boxplot(adidas_data$selling_price, main = "Selling Price Boxplot")adidas_data$description <- tolower(gsub("[^[:alnum:][:space:]]", "", adidas_data$description))adidas_data$url <- NULLsummary(select(adidas_data, where(is.numeric)))summary(select(adidas_data, where(is.factor)))# Histogram for Selling Priceggplot(adidas_data, aes(x = selling_price)) + geom_histogram(bins = 30, fill = "blue", color = "black") +ggtitle("Distribution of Selling Prices")# Histogram for Average Ratingggplot(adidas_data, aes(x = average_rating)) + geom_histogram(bins = 10, fill = "green", color = "black") +ggtitle("Distribution of Average Ratings")# Boxplots for Price by Categoryggplot(adidas_data, aes(x = category, y = selling_price)) + geom_boxplot() +ggtitle("Selling Price Distribution by Category") +theme(axis.text.x = element_text(angle = 45, hjust = 1))# Bar Plot for Count of Products by Colorggplot(adidas_data, aes(x = color)) + geom_bar(fill = "purple") +
Background image
27ggtitle("Count of Products by Color") +theme(axis.text.x = element_text(angle = 45, hjust = 1))# Scatter Plot for Reviews Count vs. Average Ratingggplot(adidas_data, aes(x = reviews_count, y = average_rating)) + geom_point(alpha = 0.5) +ggtitle("Reviews Count vs. Average Rating")adidas_numeric_data <- adidas_data %>% select(selling_price, original_price, average_rating, reviews_count)cor_matrix <- cor(adidas_numeric_data, use = "complete.obs")cor_data <- as.data.frame(as.table(cor_matrix))names(cor_data) <- c("Variable1", "Variable2", "Correlation")ggplot(cor_data, aes(x = Variable1, y = Variable2, fill = Correlation)) +geom_tile() +scale_fill_gradient2(low = "blue", high = "red", mid = "white", midpoint = 0) +theme_minimal() +theme(axis.text.x = element_text(angle = 45, hjust = 1)) +ggtitle("Correlation Matrix for Numeric Variables")adidas_data %>%group_by(category) %>%summarise(Total_Sales = sum(selling_price)) %>%ggplot(aes(x = category, y = Total_Sales, fill = category)) +geom_bar(stat = "identity") +theme_minimal() +labs(title = "Total Sales by Category", x = "Category", y = "Total Sales")
Background image
28ggplot(adidas_data, aes(x = average_rating)) +geom_histogram(binwidth = 0.5, fill = "blue", color = "white") +theme_minimal() +labs(title = "Distribution of Average Ratings", x = "Average Rating", y = "Count")ggplot(adidas_data, aes(x = selling_price, y = average_rating)) +geom_point(aes(color = category)) +geom_smooth(method = "lm") +theme_minimal() +labs(title = "Selling Price vs Average Rating", x = "Selling Price", y = "Average Rating")adidas_data %>%group_by(category, availability) %>%summarise(Count = n()) %>%ggplot(aes(x = category, y = Count, fill = availability)) +geom_bar(stat = "identity", position = position_dodge()) +theme_minimal() +labs(title = "Stock Availability by Category", x = "Category", y = "Count")ggplot(adidas_data, aes(x = reviews_count)) +geom_histogram(fill = "green", binwidth = 10) +theme_minimal() +labs(title = "Distribution of Reviews Count", x = "Reviews Count", y = "Frequency")adidas_category_analysis <- adidas_data %>%group_by(category, selling_price) %>%summarise(Average_Rating = mean(average_rating, na.rm = TRUE))ggplot(adidas_category_analysis, aes(x = selling_price, y = category, color = Average_Rating)) +geom_jitter(alpha = 0.5, size = 2.5, width = 0.1) +
Background image
29scale_color_gradient(low = "blue", high = "red") +theme_minimal() +labs(title = "Category-wise Price Distribution with Average Ratings", x = "Selling Price", y = "Category", color = "Average Rating")category_analysis <- adidas_data %>%group_by(category) %>%summarise(Total_Sales = sum(selling_price, na.rm = TRUE),Average_Rating = mean(average_rating, na.rm = TRUE)) %>%arrange(desc(Total_Sales))ggplot(category_analysis, aes(x = reorder(category, Total_Sales), y = Total_Sales)) +geom_bar(stat = "identity", position = position_dodge(), fill = "steelblue") +geom_point(aes(y = Average_Rating * max(Total_Sales) / max(Average_Rating), color = "Average Rating"), position = position_dodge(width = 0.9), size = 3) +scale_y_continuous(sec.axis = sec_axis(~ . * max(category_analysis$Average_Rating) / max(category_analysis$Total_Sales), name = "Average Rating")) +theme_minimal() +labs(title = "Total Sales and Average Ratings by Category", x = "Category", y = "Total Sales") +scale_color_manual("", values = "red") +theme(legend.position = "bottom")library(caret)library(ggplot2)adidas_data$availability <- as.factor(adidas_data$availability)adidas_data$category <- as.factor(adidas_data$category)adidas_data$color <- as.factor(adidas_data$color)set.seed(123)index <- createDataPartition(adidas_data$selling_price, p = 0.8, list = FALSE)
Background image
30train_set <- adidas_data[index, ]test_set <- adidas_data[-index, ]model <- lm(selling_price ~ average_rating + reviews_count + category + color, data = train_set)predictions <- predict(model, test_set)results <- data.frame(Actual = test_set$selling_price, Predicted = predictions)correlation <- cor(results$Actual, results$Predicted)ggplot(results, aes(x = Actual, y = Predicted)) +geom_point() +geom_abline(intercept = 0, slope = 1, color = "red") +labs(title = paste("Regression Model: Actual vs Predicted Prices (Correlation:", round(correlation, 2), ")"),x = "Actual Price", y = "Predicted Price") +theme_minimal()library(Metrics)rmse_value <- rmse(test_set$selling_price, predictions)mae_value <- mae(test_set$selling_price, predictions)ss_total <- sum((test_set$selling_price - mean(test_set$selling_price))^2)ss_residual <- sum((test_set$selling_price - predictions)^2)r_squared <- 1 - (ss_residual / ss_total)cat("RMSE (Root Mean Squared Error):", rmse_value, "\n")cat("MAE (Mean Absolute Error):", mae_value, "\n")cat("R-squared:", r_squared, "\n")ggplot(results, aes(x = Predicted, y = Actual - Predicted)) +geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
Background image
31geom_point(alpha = 0.5) +labs(title = "Residuals vs Predicted", x = "Predicted Price", y = "Residuals") +theme_minimal()adidas_data <- adidas_data %>% select(availability, selling_price, average_rating, reviews_count, category, color) %>% na.omit()adidas_data$category <- as.factor(adidas_data$category)adidas_data$color <- as.factor(adidas_data$color)adidas_data$availability <- as.factor(adidas_data$availability)library(nnet)set.seed(123) train_index <- sample(1:nrow(adidas_data), 0.7 * nrow(adidas_data))train_set <- adidas_data[train_index, ]test_set <- adidas_data[-train_index, ]multinom_model <- multinom(category ~ selling_price + average_rating + reviews_count, data = train_set)test_predictions <- predict(multinom_model, newdata = test_set)table(Predicted = test_predictions, Actual = test_set$category)conf_matrix <- table(Predicted = test_predictions, Actual = test_set$category)print(conf_matrix)accuracy <- sum(diag(conf_matrix)) / sum(conf_matrix)print(paste("Accuracy:", accuracy))calc_metrics <- function(conf_matrix, category) {true_positives <- conf_matrix[category, category]false_positives <- sum(conf_matrix[, category]) - true_positivesfalse_negatives <- sum(conf_matrix[category, ]) - true_positives
Background image
32precision <- true_positives / (true_positives + false_positives)recall <- true_positives / (true_positives + false_negatives)f1_score <- 2 * (precision * recall) / (precision + recall)return(c(precision, recall, f1_score))})metrics <- sapply(categories, function(cat) calc_metrics(conf_matrix, cat))print(metrics)
Background image