Downloading and Initial Prepping of Spreadsheets for Category 1 Games 2014 to Present

From Wikibase.slis.ua.edu
Revision as of 18:09, 26 December 2018 by Admin (talk | contribs)
Jump to navigation Jump to search

Download JSON file and complete following steps to prepare FOUR spreadsheet (.csv) files:

  1. Go to online play-by-play data source and download a game's JSON file as follows:
    1. Click on desired year
    2. Click on JSON link
    3. Click on desired week
    4. Click on Full link
    5. Click on desired game
    6. Click on download link button in upper right hand corner
    7. The desired file will now be in your download folder
  2. To convert JSON to CSV, use this website: https://json-csv.com/ as follows:
    1. Click on Upload JSON file
    2. Navigate to your download folder and double click on JSON file downloaded in steps above
    3. Right click on downloaded file (lower left) and open
    4. Save file using year and the competing team names and annotate file name with "FULL_DATA_SPREADSHEET" as follows: 2017-ALA_AUB-FULL_DATA_SPREADSHEET
  3. Create "clean" version of the spreadsheet (that is, without hidden Excel codes) as follows:
    1. Highlight and copy all data (right click -> copy)
    2. Go to blank spreadsheet and paste special as "paste values" with no additional formatting
    3. Save file as csv: 2017-ALA_AUB-FULL_DATA_SPREADSHEET.csv
  4. Enter property numbers (PXX) at top of each relevant column:
    1. Best accomplished by copying and pasting row from known spreadsheet source
    2. PLEASE NOTE: For games from 2014 to 2016, there are two missing columns; THEREFORE, you must delete two cells from inserted row and shift cells to the left
  5. Add data to cells in these columns as follows:
    1. Column BA (P35): Overwrite existing data with drive sequence data (Do not include KOs as a drive)
    2. Column BB (P19): Overwrite existing data with play sequence data starting with Opening Kickoff as play #1
    3. Column BC (P36): Overwrite existing data with play sequence data (play # relative to each drive)
      1. End of periods, end of halfs, end of games, and timeouts are not counted as plays in either game or drive play sequences
  6. Transform data in these columns as follows:
    1. First, copy the entire column BT (P37) and paste it over the data in column BU (P46) (be sure to change the P37 heading of column BU back to P46)
    2. Column BT (P37): Strip out date data using the following example as an exemplar:
      1. For example, if the data in a column's cell is "2018-01-09T01:19:25Z"
      2. Then, do a "find and replace" in that column by pasting the string "2018-01-09T" into the FIND box and keep the REPLACE box empty
      3. Then, do another "find and replace" by typing "Z" into the FIND box and keep the REPLACE box empty
      4. Be sure to "replace all" values in each cell of the column for both find and replace
      5. This should leave you with just the time values in each column's cells ("01:19:25" would remain in the first cell of our example)
    3. Column BU (P46): Transform the data in this column from ISO to Unix time as follows:
      1. We will be using the following online utility to transform the data in this column: Epoch Batch Conversion Tool
      2. IMPORTANT: If there are any blank cells in column BU (P46), then this procedures must be done in parts without including blank cells in the batch conversion process
      3. In column BU (P46), highlight and copy cell with data stopping whenever coming upon a blank cell without data
      4. Paste the data in the box provided online at https://www.epochconverter.com/batch#human2epoch
      5. Change the output option from "Screen" to the second option ("(',' comma separated, use for Excel in the US)
      6. Click the "Convert" button at bottom
      7. Copy the transformed date data and paste it back over the copied cells into our spreadsheet in column BU (P46)
  7. Split column CO (P24/P25) into two columns by copying its contents and pasting it over the data in column CP
    1. Change the label of column CO (P24/P25) to column CO (P24)
    2. Change the label of column CP (P24/P25) to column CP (P25)
    3. Add data to column CP (P25) reflecting the other team from the data as shown in column CO (P24)
  8. Delete all columns right to left that are not labeled P(XX) leaving the following:
    1. Column AX (P8)
    2. Column AY (P10)
    3. Column AZ (P6)
    4. Column BA (P35)
    5. Column BB (P19)
    6. Column BC (P36)
    7. Column BD (P13)
    8. Column BF (P14)
    9. Column BI (P15)
    10. Column BL (P34)
    11. Column BN (P16)
    12. Column BR (P38)
    13. Column BS (P11)
    14. Column BT (P37)
    15. Column BU (P46)
    16. Column CE (P20)
    17. Column CF (P8)
    18. Column CI (P34)
    19. Column CJ (P13)
    20. Column AX (P8)
    21. Column AX (P8)
    22. Column AX (P8)
  9. Save FOUR copies of the spreadsheet as follows:
    1. Name the first copy to include "DRIVE_CREATION_SPREADSHEET" (e.g., 2017-ALA_AUB-DRIVE_CREATION_SPREADSHEET.csv)
    2. Name the second copy to include "DRIVE_DATA_SPREADSHEET" (e.g., 2017-ALA_AUB-DRIVE_DATA_SPREADSHEET.csv)
    3. Name the third copy to include "PLAY-BY-PLAY_CREATION_SPREADSHEET" (e.g., 2017-ALA_AUB-PLAY-BY-PLAY_CREATION_SPREADSHEET.csv)
    4. Name the fourth copy to include "PLAY-BY-PLAY_DATA_SPREADSHEET" (e.g., 2017-ALA_AUB-PLAY-BY-PLAY_DATA_SPREADSHEET.csv)



Navigation: