Mastering Correlation Analysis: Stock Prices Study Guide

School
Northeastern University**We aren't endorsed by this school
Course
IE 6400
Subject
Industrial Engineering
Date
Dec 11, 2024
Pages
20
Uploaded by CoachEelPerson1193
In [68]:Problem 1 : Correlation-Weighted Network Analysis of StockPricesIn [50]:Step 1: Data CollectionIn [53]:[*********************100%***********************] 20 of 20 completedimportpandas aspdimportmatplotlib.pyplot aspltimportseaborn assns#!pip install yfinanceimportyfinance asyfimportpandas aspd# List of 20 stock ticker symbolstickers =['AAPL', 'MSFT', 'GOOGL', 'AMZN', 'META', 'TSLA', 'NVDA', 'BRK-B','JNJ', 'V', 'WMT', 'JPM', 'UNH', 'MA', 'PG', 'HD', 'DIS', 'ADBE','NFLX', 'PYPL']# Fetch the data for the specified tickersdata =yf.download(tickers, start='2020-01-01', end='2020-12-31')['Close12/4/24, 1:09 PMCoding_Assessment - Jupyter Notebooklocalhost:8888/notebooks/School/6400/Quiz/Coding_Assessment.ipynb1/20
Background image
In [52]:Step 2: Data PreparationOut[52]:TickerAAPLADBEAMZNBRK-BDISGOOGLHDDate2020-01-0275.087502334.42999394.900497228.389999148.19999768.433998219.660004145.972020-01-0374.357498331.80999893.748497226.179993146.50000068.075996218.929993144.272020-01-0674.949997333.70999195.143997226.990005145.64999469.890503219.960007144.102020-01-0774.597504333.39001595.343002225.919998145.69999769.755501218.520004144.972020-01-0875.797501337.86999594.598503225.990005145.39999470.251999221.789993144.96........................2020-12-23130.960007496.910004159.263504224.240005173.55000386.411499269.809998151.942020-12-24131.970001499.859985158.634506226.529999173.72999686.708000270.920013152.472020-12-28136.690002498.950012164.197998228.410004178.86000188.697998269.250000153.192020-12-29134.869995502.109985166.100006229.570007177.30000387.888000266.190002154.132020-12-30133.720001497.450012164.292496229.649994181.16999886.812500265.260010156.05252 rows × 20 columnsdata12/4/24, 1:09 PMCoding_Assessment - Jupyter Notebooklocalhost:8888/notebooks/School/6400/Quiz/Coding_Assessment.ipynb2/20
Background image
In [58]:In [60]:Step 3: Calculate Correlation MatrixOut[58]:TickerAAPL 0ADBE 0AMZN 0BRK-B 0DIS 0GOOGL 0HD 0JNJ 0JPM 0MA 0META 0MSFT 0NFLX 0NVDA 0PG 0PYPL 0TSLA 0UNH 0V 0WMT 0dtype: int64Out[60]:TickerAAPLADBEAMZNBRK-BDISGOOGLHDcount252.000000252.000000252.000000252.000000252.000000252.000000252.000000252.0mean95.198889415.637342133.928454204.903651126.18448473.895361248.787778145.7std21.72522266.01876627.28160519.43837418.8311228.73282330.4807806.6min56.092499285.00000083.830498162.13000585.76000252.706501152.149994111.125%77.383127351.587502107.306000186.375004115.43499869.832748231.817501143.850%91.421249432.715012144.224998208.905006124.07000073.954998250.680000147.475%115.718122476.065002158.239494223.625004138.51999778.201376272.527504149.6max136.690002533.799988176.572495233.919998181.16999891.248497291.929993156.0data.isnull().sum()data.describe()12/4/24, 1:09 PMCoding_Assessment - Jupyter Notebooklocalhost:8888/notebooks/School/6400/Quiz/Coding_Assessment.ipynb3/20
Background image
In [62]:Step 4: Analyze the Correlation MatrixThe heatmap clearly illustrates the correlations between individual stocks. A value closer to1 indicates a strong positive correlation, while a value closer to -1 signifies an inversecorrelation. A value near 0 suggests minimal correlation between the two stocks.For example, in the case of AAPL, it shows a very strong correlation with ADBE and AMZN,while the correlation with JPM is very weak.Out[62]:TickerAAPLADBEAMZNBRK-BDISGOOGLHDJNJJPTickerAAPL1.0000000.9642410.9210360.5316420.6310620.8636350.8988020.4634430.0847ADBE0.9642411.0000000.9566580.4012450.5165900.8224110.9237240.440393-0.0251AMZN0.9210360.9566581.0000000.2261020.3591750.7418490.8671200.406171-0.2229BRK-B0.5316420.4012450.2261021.0000000.8427190.6516950.4841990.4647670.8062DIS0.6310620.5165900.3591750.8427191.0000000.7967080.5491400.5589860.7573GOOGL0.8636350.8224110.7418490.6516950.7967081.0000000.8334810.5702290.4088HD0.8988020.9237240.8671200.4841990.5491400.8334811.0000000.5593360.1134JNJ0.4634430.4403930.4061710.4647670.5589860.5702290.5593361.0000000.3519JPM0.084736-0.025118-0.2229380.8062010.7573340.4088610.1134040.3519111.0000MA0.7973720.7796350.6308290.7644790.7736790.8244710.8668180.6491640.4930META0.9496300.9465510.8905300.5147980.6328120.9056990.9421950.5180510.1442MSFT0.9304630.9671680.9619780.3419130.4956770.8416370.9191720.532764-0.0364NFLX0.8636030.9208880.9700580.1282220.3013610.6893990.8185720.418951-0.2834NVDA0.9571120.9634870.9567010.3675670.4535850.7910840.8846940.360147-0.1137PG0.9066690.8524960.7831610.7082160.6304340.8073170.8483080.5169210.2451PYPL0.9573960.9640690.9519770.3410310.5454450.8341420.8867660.433755-0.0470TSLA0.9377450.8948930.8577000.4974100.6407830.8359320.7720380.3852150.0786UNH0.8476710.8085100.7619010.5835780.7072680.9229080.8551220.6483530.2977V0.7330580.7215720.5685550.7281500.8088550.8643190.8243190.6779750.5590WMT0.8716350.8322730.8559160.4158550.4733150.7693590.7290540.412944-0.0560correlation_matrix =data.corr()correlation_matrix12/4/24, 1:09 PMCoding_Assessment - Jupyter Notebooklocalhost:8888/notebooks/School/6400/Quiz/Coding_Assessment.ipynb4/20
Background image
In [69]:Step 5: Construct the Correlation NetworkIn [70]:In [71]:Step 6: Visualize the Networkplt.figure(figsize=(12, 10))sns.heatmap(correlation_matrix, annot=True, fmt=".2f", cmap='coolwarm', plt.title('Correlation Matrix of Stock Prices')plt.show()importnetworkx asnxthreshold =0.6G =nx.Graph()fori inrange(len(correlation_matrix.columns)):forj inrange(i):ifabs(correlation_matrix.iloc[i, j]) >threshold:G.add_edge(correlation_matrix.columns[i], correlation_matrix12/4/24, 1:09 PMCoding_Assessment - Jupyter Notebooklocalhost:8888/notebooks/School/6400/Quiz/Coding_Assessment.ipynb5/20
Background image
In [88]:Plot with a correlation value labelsplt.figure(figsize=(12, 8))pos =nx.spring_layout(G) nx.draw_networkx_nodes(G, pos, node_size=700)edge_weights =[G[u][v]['weight'] foru, v inG.edges()]nx.draw_networkx_edges(G, pos, width=3, alpha=1) nx.draw_networkx_labels(G, pos, font_size=12)edge_labels =nx.get_edge_attributes(G, 'weight')plt.title('Correlation Network of Stock Prices')plt.axis('off') plt.show()12/4/24, 1:09 PMCoding_Assessment - Jupyter Notebooklocalhost:8888/notebooks/School/6400/Quiz/Coding_Assessment.ipynb6/20
Background image
In [94]:Step 7: Interpret and Analyze the NetworkMany nodes exhibit strong correlations with numerous other nodes, reflecting theinterconnected nature of the stock market, which is influenced by businesses in similarsectors and other factors.A group of central nodes, often referred to as hubs, includes AMZN, META, AAPL, andMSFT, which show strong correlations with other stocks and influence each other. Whilethere are no completely independent nodes unaffected by other stocks, nodes like JPMand JNJ exhibit relatively weaker connectivity with others.This relationship suggests that when stock prices fluctuate, related stocks are likely to beinfluenced as well.It is anticipated that time series analysis can be utilized to predict future stock prices.plt.figure(figsize=(12, 8))pos =nx.spring_layout(G) nx.draw_networkx_nodes(G, pos, node_size=700)nx.draw_networkx_edges(G, pos, width=1.0, alpha=0.5)nx.draw_networkx_labels(G, pos, font_size=12)edge_labels =nx.get_edge_attributes(G, 'weight')nx.draw_networkx_edge_labels(G, pos, edge_labels={k: f"{v:.2f}"fork, v plt.title('Correlation Network of Stock Prices')plt.axis('off') plt.show()12/4/24, 1:09 PMCoding_Assessment - Jupyter Notebooklocalhost:8888/notebooks/School/6400/Quiz/Coding_Assessment.ipynb7/20
Background image
While achieving high-performance results in time series analysis is notoriously challenging,incorporating data from various correlated nodes, as demonstrated here, can help addressone of the key limitations of time series models—insufficient data volume.12/4/24, 1:09 PMCoding_Assessment - Jupyter Notebooklocalhost:8888/notebooks/School/6400/Quiz/Coding_Assessment.ipynb8/20
Background image
Problem 3 : Analyzing Customer Sentiments for ProductImprovement12/4/24, 1:09 PMCoding_Assessment - Jupyter Notebooklocalhost:8888/notebooks/School/6400/Quiz/Coding_Assessment.ipynb9/20
Background image
In [2]:importpandas aspdimportrandomimportfakerdefgenerate_customer_comments(num_comments=500):# Initialize Faker for generating fake datafake =faker.Faker()# Define sample product names and IDsproduct_names =['AlphaPhone', 'BetaBook Pro', 'GammaPad', 'Delta Ea'Zeta Speakers', 'Eta Watch', 'Theta Keyboard', 'Iotproduct_ids =range(1001, 1011)# Define sample phrases for dynamic comment generationcomment_phrases =["Fantastic product, highly recommend!","Absolutely love this item!","Exceeded my expectations in every way.","Great value for money, very satisfied.","Impressive quality, will definitely buy again.","Very disappointed, not as described.","Poor quality, do not recommend.","Terrible experience, will not buy again.","Product broke within a week, very unhappy.","Customer service was unhelpful and rude.","It's okay, nothing special.","Average quality, not bad but not great.","Decent product for the price.","Works fine but has some minor issues.","Not sure if I would buy this again."]# Generate the datasetdata =[]for_ inrange(num_comments):customer_id =fake.random_number(digits=5)first_name =fake.first_name()last_name =fake.last_name()product_name =random.choice(product_names)product_id =random.choice(product_ids)comment =random.choice(comment_phrases)data.append([customer_id, first_name, last_name, product_name, product_id])# Create DataFrame with the columnsdf =pd.DataFrame(data, columns=['CustomerID', 'FirstName', 'LastName', 'ProductName', 'ProductID])returndf# Generate the datasetdf_comments =generate_customer_comments()12/4/24, 1:09 PMCoding_Assessment - Jupyter Notebooklocalhost:8888/notebooks/School/6400/Quiz/Coding_Assessment.ipynb10/20
Background image
In [4]:In [5]:CustomerID FirstName LastName ProductName ProductID \0 71892 Tyler Barrett Epsilon Charger 1004 1 74378 Cody Brown Delta Earbuds 1001 2 86482 Shannon Dodson Delta Earbuds 1007 3 47447 Melissa Shepherd Iota Mouse 1003 4 13530 Todd Taylor Theta Keyboard 1003 Comment 0 Not sure if I would buy this again. 1 Absolutely love this item! 2 Poor quality, do not recommend. 3 Poor quality, do not recommend. 4 Poor quality, do not recommend. print(df_comments.head())fromtextblob importTextBlobdefcalculate_sentiment_score(comment):analysis =TextBlob(comment)returnanalysis.sentiment.polaritydefcalculate_average_sentiment(df):df['SentimentScore'] =df['Comment'].apply(calculate_sentiment_scoreaverage_sentiment =df.groupby('ProductName')['SentimentScore'].meanreturnaverage_sentiment12/4/24, 1:09 PMCoding_Assessment - Jupyter Notebooklocalhost:8888/notebooks/School/6400/Quiz/Coding_Assessment.ipynb11/20
Background image
In [15]:1. Interpreting Sentiment Scores:Which products have the highest average sentiment scores? What does this imply aboutcustomer satisfaction for these products?Theta Keyboard has the highest average sentiment score which is 0.084725This result suggests that many customers were satisfied after purchasing this product,especially compared to other less satisfactory products.Out[15]:CustomerIDFirstNameLastNameProductNameProductIDCommentSentimentScore071892TylerBarrettEpsilonCharger1004Not sure if Iwould buythis again.-0.250000174378CodyBrownDelta Earbuds1001Absolutelylove thisitem!0.625000286482ShannonDodsonDelta Earbuds1007Poor quality,do notrecommend.-0.400000347447MelissaShepherdIota Mouse1003Poor quality,do notrecommend.-0.400000413530ToddTaylorThetaKeyboard1003Poor quality,do notrecommend.-0.400000........................4959927ChristinaJohnsonAlphaPhone1007Great valuefor money,verysatisfied.0.72500049635033StephenMooreGammaPad1001Productbroke withina week, veryunhappy.-0.78000049727600ChristinaWilliamsKappaMonitor1007Absolutelylove thisitem!0.62500049876728RichardGreenZeta Speakers1006Decentproduct forthe price.0.16666749995493MaryHarringtonThetaKeyboard1008Fantasticproduct,highlyrecommend!0.300000500 rows × 7 columnsdf_comments12/4/24, 1:09 PMCoding_Assessment - Jupyter Notebooklocalhost:8888/notebooks/School/6400/Quiz/Coding_Assessment.ipynb12/20
Background image
In [38]:In [22]:Which products have negative average sentiment scores? What might this indicate, andwhat actions could the company consider to address this?In [23]:Alphaphone, Delta Earbuds, Epsilon Charger, Eta Watch, lota Mouse, Kappa Monitor, andZeta Speakers have negative average sentiment scoresand Alphaphone has the lowest score.It indicates that many customers are dissatisfied with the product, and it is necessary toidentify the causes and work on improvements.The result of plotOut[38]:ProductNameSentimentScore0AlphaPhone-0.2596591BetaBook Pro-0.0803982Delta Earbuds-0.0157263Epsilon Charger-0.0512164Eta Watch-0.1138765GammaPad-0.0910306Iota Mouse0.0401877Kappa Monitor0.1224958Theta Keyboard0.0418809Zeta Speakers-0.198860Out[22]:ProductName Kappa MonitorSentimentScore 0.122495Name: 7, dtype: objectOut[23]:ProductName AlphaPhoneSentimentScore -0.259659Name: 0, dtype: objectaverage_sentiment =calculate_average_sentiment(df_comments)average_sentimentaverage_sentiment.sort_values(by='SentimentScore', ascending=False).ilocaverage_sentiment.sort_values(by='SentimentScore').iloc[0]12/4/24, 1:09 PMCoding_Assessment - Jupyter Notebooklocalhost:8888/notebooks/School/6400/Quiz/Coding_Assessment.ipynb13/20
Background image
In [41]:plt.figure(figsize=(10, 6))plt.bar(average_sentiment['ProductName'], average_sentiment['SentimentScoplt.ylabel('Average Sentiment Score')plt.title('Average Sentiment Score per Product')plt.ylim(-0.3, 0.3) plt.xticks(rotation=45) plt.grid(axis='y', linestyle='--', alpha=0.7) plt.show()12/4/24, 1:09 PMCoding_Assessment - Jupyter Notebooklocalhost:8888/notebooks/School/6400/Quiz/Coding_Assessment.ipynb14/20
Background image
In [48]:2. Sentiment Distribution Analysis:For which products is the sentiment distribution most positive? How does this align withthe average sentiment score?Kappa Monitor has the most positive sentiment result and also it align with the averagesentiment score.importpandas aspdimportmatplotlib.pyplot aspltsentiment_counts =df_comments.groupby('ProductName')['SentimentCategoryifnotsentiment_counts.empty:sentiment_counts.plot(kind='bar', color=['red', 'gray', 'blue'], figplt.ylabel('Count of Sentiments')plt.title('Sentiment Count per Product')plt.xticks(rotation=45) plt.legend(title='Sentiment Category', loc='upper right')plt.grid(axis='y', linestyle='--', alpha=0.7)plt.tight_layout() plt.show()else:print("The sentiment_counts DataFrame is empty.")12/4/24, 1:09 PMCoding_Assessment - Jupyter Notebooklocalhost:8888/notebooks/School/6400/Quiz/Coding_Assessment.ipynb15/20
Background image
In [20]:In [25]:Which products have the most negative sentiment distribution? How do these products'sentiment distributions compare to their average sentiment scores?AlphaPhone is the lowest sentiment score product. It has the most negative comments andalso it aligns with its average score.Out[20]:SentimentCategoryNegativeNeutralPositiveProductNameAlphaPhone25111BetaBook Pro25129Delta Earbuds20121Epsilon Charger21421Eta Watch32421GammaPad31129Iota Mouse17327Kappa Monitor16232Theta Keyboard25231Zeta Speakers31214Product(s) with the Highest Average Sentiment Score:ProductName SentimentScore7 Kappa Monitor 0.122495Product(s) with the Highest Count of Positive Sentiments:SentimentCategory Negative Neutral PositiveProductName Kappa Monitor 16 2 32sentiment_counts =df_comments.groupby('ProductName')['SentimentCategorysentiment_countsmost_positive_average =average_sentiment[average_sentiment['SentimentScomost_positive_count =sentiment_counts[sentiment_counts['Positive'] ==seprint("Product with the Highest Average Sentiment Score:")print(most_positive_average)print("\nProduct with the Highest Count of Positive Sentiments:")print(most_positive_count)12/4/24, 1:09 PMCoding_Assessment - Jupyter Notebooklocalhost:8888/notebooks/School/6400/Quiz/Coding_Assessment.ipynb16/20
Background image
In [26]:Identify a product with a balanced sentiment distribution (similar counts of Negative,Neutral, and Positive sentiments). What might this indicate about customer experiences,and what strategies could the company use to improve perceptions?Epsilor charger has balanced sentiment distribution. It indicates that the number of peoplewith positive and negative opinions about this product is similar, suggesting a highlikelihood of positive outcomes if the product quality is improved.Therefore, proactive quality improvements are necessary.In [27]:3. Strategic RecommendationsBased on the sentiment analysis, which products should the company prioritize forimprovement, and why?Product(s) with the Lowest Average Sentiment Score:ProductName SentimentScore0 AlphaPhone -0.259659Product(s) with the Highest Count of Negative Sentiments:SentimentCategory Negative Neutral PositiveProductName Eta Watch 32 4 21Product(s) with Balanced Sentiment Distribution:SentimentCategory Negative Neutral Positive BalanceProductName Epsilon Charger 21 4 21 17sentiment_counts =df_comments.groupby('ProductName')['SentimentCategorymost_negative_average =average_sentiment[average_sentiment['SentimentScomost_negative_count =sentiment_counts[sentiment_counts['Negative'] ==seprint("Product(s) with the Lowest Average Sentiment Score:")print(most_negative_average)print("\nProduct(s) with the Highest Count of Negative Sentiments:")print(most_negative_count)sentiment_counts['Balance'] =sentiment_counts[['Negative', 'Neutral', 'balanced_products =sentiment_counts[sentiment_counts['Balance'] ==sentprint("Product(s) with Balanced Sentiment Distribution:")print(balanced_products)12/4/24, 1:09 PMCoding_Assessment - Jupyter Notebooklocalhost:8888/notebooks/School/6400/Quiz/Coding_Assessment.ipynb17/20
Background image
AlphaPhone, Eta Watch, and Zeta Speakers were the products which have significantly lowresults.Such negative reviews can adversely affect the company's image and other products,making prompt improvements essentialIn [31]:Which products could be used in marketing campaigns as examples of customersatisfaction? How could positive reviews be leveraged in promotional materials?With high score product like Kappa Monitor, By promoting products that receive positivefeedback, the company can build a positive brand image.Products with Low Average Sentiment Scores:ProductName SentimentScore0 AlphaPhone -0.2596594 Eta Watch -0.1138769 Zeta Speakers -0.198860Products with High Counts of Negative Sentiments:SentimentCategory Negative Neutral Positive BalanceProductName AlphaPhone 25 1 11 24BetaBook Pro 25 1 29 28Delta Earbuds 20 1 21 20Epsilon Charger 21 4 21 17Eta Watch 32 4 21 28GammaPad 31 1 29 30Iota Mouse 17 3 27 24Kappa Monitor 16 2 32 30Theta Keyboard 25 2 31 29Zeta Speakers 31 2 14 29Balanced Products with Notable Negative Counts:Empty DataFrameColumns: [Negative, Neutral, Positive, Balance]Index: []low_average_sentiment =average_sentiment[average_sentiment['SentimentScohigh_negative_counts =sentiment_counts[sentiment_counts['Negative'] >=balanced_products =sentiment_counts[(sentiment_counts['Balance'] <5) &print("Products with Low Average Sentiment Scores:")print(low_average_sentiment)print("\nProducts with High Counts of Negative Sentiments:")print(high_negative_counts)print("\nBalanced Products with Notable Negative Counts:")print(balanced_products)12/4/24, 1:09 PMCoding_Assessment - Jupyter Notebooklocalhost:8888/notebooks/School/6400/Quiz/Coding_Assessment.ipynb18/20
Background image
In [29]:If the company wants to reduce negative feedback, which product should be addressedfirst? Could you provide a reasoned recommendation based on the sentiment distributionand average sentiment score?Imrovement of the product which has the most negative feedback would be effective.It is much easier to improve products with clearly negative feedback and reduce thenumber of such cases than to address products with a balance of positive and negativeopinions.In [32]:4. Product Development and Feedback IntegrationProduct(s) with the Highest Average Sentiment Score:ProductName SentimentScore7 Kappa Monitor 0.122495Product(s) with the Highest Count of Positive Sentiments:SentimentCategory Negative Neutral Positive BalanceProductName Kappa Monitor 16 2 32 30Products that should be addressed first to reduce negative feedback:ProductName SentimentScore0 AlphaPhone -0.2596594 Eta Watch -0.1138769 Zeta Speakers -0.198860most_positive_average =average_sentiment[average_sentiment['SentimentScomost_positive_count =sentiment_counts[sentiment_counts['Positive'] ==seprint("Product(s) with the Highest Average Sentiment Score:")print(most_positive_average)print("\nProduct(s) with the Highest Count of Positive Sentiments:")print(most_positive_count)low_average_sentiment =average_sentiment[average_sentiment['SentimentScohigh_negative_counts =sentiment_counts[sentiment_counts['Negative'] >=products_to_address =low_average_sentiment[low_average_sentiment['Produprint("Products that should be addressed first to reduce negative feedbaprint(products_to_address)12/4/24, 1:09 PMCoding_Assessment - Jupyter Notebooklocalhost:8888/notebooks/School/6400/Quiz/Coding_Assessment.ipynb19/20
Background image
How could the company use these sentiment analysis results to inform productdevelopment and customer service strategies?Clearly identifying the strengths and weaknesses of the company and focusing onenhancing strengths while addressing weaknesses is likely the most effective approach.Through this sentiment analysis, it is possible to identify which products generate positivefeedback and which products negatively impact the company.Suggest a method for continuously monitoring customer sentiment over time to trackimprovements or declines in customer satisfaction.To verify whether the methods applied during the improvement process had a positiveimpact, it is necessary to observe changes through A/B testing or similar approaches.Continuous testing can help identify which strategies are most effective in enhancing thepositive image of the company’s products.12/4/24, 1:09 PMCoding_Assessment - Jupyter Notebooklocalhost:8888/notebooks/School/6400/Quiz/Coding_Assessment.ipynb20/20
Background image