Process and Analyze Multiple Excel Workbooks with Python
School
York Early College Academy**We aren't endorsed by this school
Course
SS 101
Subject
Information Systems
Date
Dec 12, 2024
Pages
2
Uploaded by baojiaocuo
average_sales = pd.DataFrame(total_sales / number_of_sales) workbook_total_sales.append(total_sales) workbook_number_of_sales.append(number_of_sales) data = {'workbook': os.path.basename(workbook), ‘worksheet': worksheet_name, 'worksheet_total': total_sales, ‘worksheet_average': average_sales} worksheet_data_frames.append(pd.DataFrame(data, \ columns=[ 'workbook', 'worksheet', \ 'worksheet_total', 'worksheet_average'])) worksheets_data_frame = pd.concat(\ worksheet_data_frames, axis=0, ignore_index=True) workbook_total = pd.DataFrame(workbook_total_sales).sum() workbook_total_number_of_sales = pd.DataFrame(\ workbook_number_of_sales).sum() workbook_average = pd.DataFrame(\ workbook_total / workbook_total_number_of_sales) workbook_stats = {'workbook': os.path.basename(workbook), 'workbook_total': workbook_total, 'workbook_average': workbook_average} workbook_stats = pd.DataFrame(workbook_stats, columns=\ ['workbook', 'workbook_total', 'workbook_average']) workbook_data_frame = pd.merge(worksheets_data_frame, workbook_stats, \ on="workbook', how='left") data_frames.append(workbook_data_frame) all_data_concatenated = pd.concat(data_frames, axis=0, ignore_index=True) writer = pd.ExcelWriter(output_file) all_data_concatenated.to_excel(writer, sheet_name='sums_and_averages', \ index=False) writer.save() To run the script, type the following on the command line and hit Enter: python pandas_sum_average_multiple_workbooks.py "C:\Users\Clinton\Desktop"\ output_files\pandas_output.xls You can then open the output file, pandas_output.xls, to review the results. We've covered a lot of ground in this chapter. We've discussed how to read and parse an Excel workbook, navigate rows in an Excel worksheet, navigate columns in an Excel worksheet, process multiple Excel worksheets, process multiple Excel work- books, and calculate statistics for multiple Excel worksheets and workbooks. If you've followed along with the examples in this chapter, you have written 14 new Python scripts! The best part about all of the work you have put into working through the examples in this chapter is that you are now well equipped to navigate and process Excel files, one of the most common file types in business. Moreover, because many business Processing Multiple Workbooks | 141
divisions store data in Excel workbooks, you now have a set of tools you can use to process the data in these workbooks regardless of the number of workbooks, the size of the workbooks, or the number of worksheets in each workbook. Now you can take advantage of your computer’s data processing capabilities to automate and scale your analysis of data in Excel workbooks. The next data source we'll tackle is databases. Because databases are a common data store, it’s important for you to know how to access their data. Once you know how to access the data, you can process it in the same row-by-row fashion that you've learned to use when dealing with CSV and Excel files. Having worked through the examples in these two chapters, you're now well prepared to process data in databases, Chapter Exercises 1. Modify one of the scripts that filters rows based on conditions, sets, or regular expressions to print and write a different set of rows than the ones we filtered for in the examples. 2. Modify one of the scripts that filters columns based on index values or column headings to print and write a different set of columns that the ones we filtered for in the examples. 3. Create a new Python script that combines code from one of the scripts that filters rows or columns and code from the script that concatenates data from multiple workbooks to generate an output file that contains specific rows or columns of data from multiple workbooks. 142 | Chapter 3: Excel Files