Calculate Sum and Average from Multiple CSV Files with Python

School
Yale University**We aren't endorsed by this school
Course
ARCG 731
Subject
Computer Science
Date
Dec 12, 2024
Pages
2
Uploaded by zxrrmyt
To run the script, type the following on the command line and hit Enter: python pandas_concat_rows_from_multiple_files.py "C:\Users\Clinton\Desktop"\ output_files\pandas_output.csv You can then open the output file, pandas_output.csv, to review the results. Sum and Average a Set of Values per File Sometimes when you have multiple input files, you need to calculate a few statistics for each input file. The example in this section uses the three CSV files we created earlier and shows how to calculate a column sum and average for each input file. Base Python To calculate a column sum and average for multiple files with base Python, type the following code into a text editor and save the file as 10csv_reader_sum_aver- age_from_multiple_files: 1 #!/usr/bin/env python3 2 import 3 import 4 import 5 import 6 input_path = sys.argv[1] 7 output_file = sys.argv[Z] 8 output_header_list = ['file_name', 'total_sales', 'average_sales'] 9 csv_out_file = open(output_file, 'a', newline='') 10 filewriter = csv.writer(csv_out_file) 11 filewriter.writerow(output_header_list) 12 for input_file in glob.glob(os.path.join(input_path, 'sales _*')): 13 with open(input_file, 'r', newline='') as csv_in_file: 14 filereader = csv.reader(csv_in_file) 15 output_list = [ ] 16 output_list.append(os.path.basename(input_file)) 17 header = next(filereader) 18 total_sales = 0.8 19 number_of_sales = 0.6 20 for row in filereader: 21 sale_amount = row[3] 22 total_sales += float(str(sale_amount).strip('s$').replace(',','")) 23 number_of_sales += 1 24 average_sales = '{0:.2f}',format(total_sales / number_of_sales) 25 output_list.append(total_sales) 26 output_list.append(average_sales) 27 filewriter.writerow(output_list) 28 csv_out_file.close() Line 8 creates a list of the column headings for the output file. Line 10 creates the filewriter object, and line 11 writes the header row to the output file. Sum and Average a Set of Values per File | 97
Background image
Line 15 creates an empty list that will store each row of output that we’ll write to the output file. Because we want to calculate a sum and an average for each input file, line 16 appends the name of the input file into the output_list. Line 17 uses the next function to remove the header row from each input file. Line 18 creates a variable named total_sales and sets its value equal to zero. Similarly, line 19 creates a variable named number_of_sales and sets its value equal to zero. Line 20 is a for loop for iterating over the data rows in each of the input files. Line 21 uses list indexing to extract the value in the Sale Amount column and assigns it to the variable named sale_amount. Line 22 uses the str function to gnsure the value in sale_amount is a string and then uses the strip and replace functions to remove any dollar signs and commas in the value. Then it uses the float function to convert the value to a floating-point number, and adds the value to the value in total_sales. Line 23 adds one to the value in number_of_sales. Line 24 divides the value in total_sales by the value in number_of_sales to calcu- late the average sales for the input file and assigns this number, formatted to two deci- mal places and converted into a string, to the variable average_sales. Line 25 adds the total sales as the second value in output_list. The first value in the list is the name of the input file. This value is added to the list in line 17. Line 26 adds the average sales as the third value in output_list. Line 27 writes the values in out put_Llist to the output file. The script executes this code for each of the input files, so the output file will contain a column of filenames, a column of total sales, and a column of average sales corre- sponding to each of the input files. To run the script, type the following on the command line and hit Enter: python 10csv_reader_sum_average_from_multiple_files.py \ "C:\Users\Clinton\Desktop" output_files\10output.csv You can then open the output file, 10output.csv, to review the results. Pandas Pandas provides summary statistics functions, like sum and mean, that you can use to calculate row and column statistics. The following code demonstrates how to calcu- late two statistics (sum and mean) for a specific column in multiple input files and write the results for each input file to an output file. To calculate these two column statistics with pandas, type the following code into a text editor and save the file as pandas_ sum_average_from_multiple_files.py: #! /usr/bin/env python3 import ¢ as 98 | Chapter2: Comma-Separated Values (CSV) Files
Background image