Downloading and Initial Prepping of Spreadsheets for Category 1 Games 2014 to Present

From Wikibase.slis.ua.edu
Jump to navigation Jump to search

First, make sure you have downloaded and installed a JSON to CSV converter (Desktop Edition) from https://json-csv.en.softonic.com/download

Download JSON file and complete following steps to prepare FOUR spreadsheet (.csv) files:

  1. Go to online play-by-play data source and download a game's JSON file as follows:
    1. Click on desired year
    2. Click on JSON link
    3. Click on desired week
    4. Click on Full link
    5. Click on desired game
    6. Click on download link button in upper right hand corner
    7. The desired JSON encoded file containing that game's statistics will now be in your download folder
  2. Conversion of downloaded file from JSON to CSV:
    1. Open the desktop JSON-CSV software that you downloaded above
      1. Click on "Trial Version"
      2. Click on "File(s)" button (upper left) and
      3. Navigate to your download folder and double click on JSON file downloaded in steps above
      4. This process will open the JSON file in an Excel spreadsheet
    2. Create "clean" version of the spreadsheet (that is, without hidden Excel codes) as follows:
      1. Highlight and copy all data (right click on triangle up in the very upper top left corner of spreadsheet -> copy)
      2. Open a new (empty) spreadsheet and from the upper left hand corner triangle, right click, mouse over the "Paste Special" menu, and then select the second "Paste Values" option ("Values and Number Formatting (A)")
    3. Save file to appropriate folder using the "CSV (Comma delimited)" format and rename the file using the following template changing out the "XX"s for month and data and the "ALA" and "AUB" for the teams that played that game: 2017-XX-XX-ALA_AUB-FULL_DATA_SPREADSHEET
  3. Row and column deletion and the importing of labels for remaining columns
    1. Delete all columns up to start of play by play columns (the header for the first spreadsheet column containing play-by-play data is labeled "drives__previous__plays__period__number" and it should be either column AW or column AX)
    2. Locate, copy and insert property numbers (PXX) as row #2 of your game's spreadsheet as follows:
      1. Locate utilities folder containing source column labeling data
      2. For games 2017 to present: Copy row #2 of the appropriate target file in the Utilities folder and insert the data as row #2 in your game's .csv spreadsheet
      3. For games from 2014 to 2016 do the same, but copy row #2 of the appropriate target file in the Utilities folder and insert the cells as row #2 in your game's .csv spreadsheet
      4. For games from before 2014, likewise copy row #2 of the appropriate target file in the Utilities folder and insert the cells as row #2 in your game's .csv spreadsheet
    3. Delete first row of spreadsheet leaving the line with Property numbers (i.e., PXX numbers) as the first row
    4. Carefully examine spreadsheet to find and DELETE any completely blank rows; that is, rows that have all blank cells stretching from column A to column Z (many times these rows will have blank cells that extend past column Z)
    5. Copy the entire column W (P37) and do the following:
      1. Paste (Ctrl v is fine) over the data in column X (P46). (After pasting data, be sure to change the P37 heading of column X back to P46)
      2. Paste (Ctrl v is fine) over the data in column Y (P51). (After pasting data, be sure to change the P37 heading of column Y back to P51)
    6. Save file as "CSV (Comma delimited)" as before
  4. Add play and other game actions sequence data to game's spreadsheet:
    1. Adding sequence data for plays and other game action:
      1. Play and other game actions sequence data will be added to cells of column R (P19) by overwriting existing data
      2. Each play or other game action corresponds to a row of cells between column A (P8) and column AH (P20) (inclusive)
      3. Starting with Opening Kickoff as play #1 and increment each cell of column R (P19) by 1 down to the bottom of the column
    2. Adding drive sequence data:
      1. Add drive sequence data to column D (P35) overwriting any data that might be there
      2. Each drive is a collection of 1 or more plays and game actions
      3. For each row associated with a play or game action, enter the number for the drive (that is, repeat the same number in each play's cell for each play occurring in a given drive)
      4. Use column K as the indicator as to when a new drive commences; that is, when the number in column K changes, then increment the number of the drive in column D (P35) and repeat the same value for every play's cell for each drive
  5. Transform date/time data in game's spreadsheet (IMPORTANT: You must know the local timezone for the game you are working on (see the item page for your game!):
    1. Transforming data in column W (P37) from UTC to displayed wallclock time for the correct timezone for your game for each cell in the column:
      1. Step 1: Insert 2 blank columns between column W (P37) and column X (P46) for use as a temporary workspace for this transformation process (temporary columns will be deleted); be sure to increase the width of these temporary columns
      2. Step 2: Transform data in column W (P37) from UTC to Excel date/time values in temporary column X:
        1. Highlight and copy the following calculation string and paste it into the second cell ("X2") in your newly created blank temporary column X (don't copy initial and ending quotation marks (""): "=DATEVALUE(LEFT(W2,10))+TIMEVALUE(MID(W2,12,8))+IF("Z"=MID(W2,20,LEN(W2)-19),0,(INT(MID(W2,20,LEN(W2)-19)/100)*60+MOD(MID(W2,20,LEN(W2)-19),100))/1440)" (IMPORTANT! If you are using a cell other than X2, then you must edit this string and replace the 8 W2 cell references with the correct reference for the cell that you are pasting the equation into.) NOTE: This step is adapted from here: http://www.kddart.org/help/kdsmart/html/excel-support.html
        2. Populate every cell in temporary column X by clicking and holding your mouse on the lower right corner of cell X2 (or the cell you pasted the formula into), and drag it down to apply the formula in that cell to every cell beneath it in the column.
      3. Step 3: Transform data in temporary column X to the local timezone of your game in temporary column Y:
        1. Highlight and copy the following calculation string and paste it into the top cell in blank temporary column Y that pertains to a row of play data (likely cell Y2) (don't copy initial and ending quotation marks (""): "=X2-5/24" (IMPORTANT! The "5" parameter in this equation pertains to the number of hours before UTC time; in this case, the "5" represents Eastern Standard Time (EST). ALSO: As above, you need to adjust the X2 cell reference with the correct reference for the cell you are pasting the equation into.)
        2. Populate every cell in temporary column Y by clicking and holding your mouse on the lower right corner of cell X2 (or the cell you pasted the formula into), and drag it down to apply the formula in that cell to every cell beneath it in the column.
      4. Step 4: Reformat cell data in temporary column Y:
        1. Highlight all of the cells in temporary column Y by clicking on the "Y" at the top of the column
        2. Right click and select "Format Cells"
        3. Select "Time" from Category list
        4. Select "1:30:55 PM" from Type list
        5. Click "OK"
      5. Step 5: Copy and paste special the data from temporary column Y and paste over the data in column W (P37) as follows:
        1. Highlight and copy all cells in temporary column Y by clicking on the Y at top of row
        2. Right click and select "Copy"
        3. Highlight all cells in column W (P37) by clicking on the W
        4. Right click and mouse over the "Paste Special" option
        5. From the "Paste Special" menu, select the second "Paste Values" option ("Values and Number Formatting (A)"), which will preserves the formatting of the copied time data from temporary column Y and will cause temporary column Y and temporary column X cells to change to an error code (#VALUE!). You may disregard.
        6. In the header of column W (P37)", retype "P37"
        7. Delete temporary column Y and temporary column X
      6. Step 6: Strip out date data in column X (P46) as follows:
        1. Click on first cell of column containing data; this will reveal "hidden" date data
        2. Highlight and copy the date data (including the space between the date and time data)
        3. Then, do a "find and replace" in that column by copying and pasting the date string values formatted as "XX-XX-XXXX " into the FIND box and keep the REPLACE box empty
        4. Be sure to "replace all" values in each cell of the column for both find and replace
        5. Be sure to visually check the remaining data in Column W (P37) to be sure that just the time data remains in each columns cells (sometimes the "find and replace" feature fails to completely transform data within a column)
        6. All that should remain in the cells of Column W (P37) is a time values formatted as "X:XX:XX PM"
    2. Column X (P46): Transform the data in this column from ISO to Unix time as follows:
      1. We will be using the following online utility to transform the data in this column: Epoch Batch Conversion Tool
      2. In column X (P46), highlight and copy cell data
      3. Paste the data in the box provided online at https://www.epochconverter.com/batch#human2epoch
      4. Change the output option from "Screen" to the second option ("(',' comma separated, use for Excel in the US)
      5. Click the "Convert" button at bottom. This will open a spreadsheet with 6 columns
      6. Widen the second column marked "Epoch timestamp"
      7. Highlight and copy all cells containing data
      8. Paste special (as above) this data back over the cells copied from our spreadsheet in column X (P46)
    3. Column Y (P51): Strip out clock time data leaving just the date data as follows:
      1. Click the first cell with data and highlight and then delete from just after the "T" to the end of the data string up to and including the "Z"
      2. Fill all of the cells of column Y (P51) with the same date data using autofill procedures used above
      3. All that should remain in the cells of Column Y (P51) is the exact same date value in each cell formatted as "XXXX-XX-XXT" (see exception to this in next instruction (3.4)
      4. IMPORTANT: Some games run past midnight, so adjust dates accordingly in corresponding cells from column W (P37)!
  6. Highlight all of the data in column AR (P24/P25) by clicking on "AR" at top of column and copy and paste the data over th data in column AS
    1. Change the label of column AR (P24/P25) to column AR (P24) (the P24 property is "Team on Offense")
    2. Change the label of column AS (P24/P25) to column AS (P25) (the P25 property is "Team on Defense")
    3. Add data to column AS (P25) reflecting the other team from the data as shown in column AR (P24) ... that is, the team on defense (column AS (P25)) should be the other team than the team on offense column AR (P24)
    4. Autofill all cell data beneath each cell until you reach the other team. Do this for both columns
    5. Now, do a "Find and Replace" separately on each full column by finding and replacing all of the team's alphabetic designation (e.g., "ALA") with that team's item "Q" number from our wikibase for the name of the university's football team (e.g., Q number for Alabama Crimson Tide football" is Q44, so find "ALA" and replace all with "Q44")
    6. When completed, all cells in column AR (P24) and column AS (P25) should be Q numbers
  7. Use column AB (P49) and column AC (P50) for adding location (stadium) data and lat/longitude data as follows:
    1. Overwrite data in all cells in column AB (P49) with the item "Q" number in our wikibase for the name of the stadium
    2. Overwrite data in all cells in column AC (P50) with the coordinates for the stadium (you will find the coordinates for each stadium on each stadium's item page in our wikibase)
  8. For the following columns, add item "Q" numbers from our wikibase as instructed:
    1. Column A (P8) - Perform a "Find and Replace all" to add item "Q" numbers to this column for each period of the game:
      1. IMPORTANT: Before doing the "Find and Replace all", make sure to click on the "Options" button and put a check in the box for "Match entire cell contents"
      2. Use the item "Q" number from list below
        1. First Quarter: Find "1" and Replace all with "Q10"
        2. Second Quarter: Find "2" and Replace all with "Q11"
        3. Third Quarter: Find "3" and Replace all with "Q12"
        4. Fourth Quarter: Find "4" and Replace all with "Q13"
        5. Overtime Quarter: Find "5" (and higher) and Replace all with "Q14"
    2. Column C (P6): - Perform a "Find and Replace all" to add item "Q" numbers to this column for the scoring result of each play using the following possibilities:
      1. IMPORTANT: Before doing the "Find and Replace all", make sure to click on the "Options" button and put a check in the box for "Match entire cell contents"
      2. No Scoring Result: Leave the "Find" box empty and Replace all with "Q15"
      3. Field Goal: Find "Field Goal" and Replace all with "Q16"
      4. For touchdowns and various extra point options, you need to first check the cells in column B (P10) AND column V (P11) that are in each Touchdown row to see if 6, 7, or 8 points were awarded (this number reflects the extra point)
        1. For 6 points (i.e., touchdowns with Failed Extra Point Attempt), corresponding cell in column C (P6) will be a "Q17"
        2. For 7 points (i.e., touchdown with Successful Point After Kick), corresponding cell in column C (P6) will be a "Q18"
        3. For 8 points (i.e., touchdown with Successful Two Point Conversion), corresponding cell in column C (P6) will be a "Q19"
      5. Safety: Search "Safety" and Replace all with "Q20"
    3. Column L (P15) - Perform a "Find and Replace all" to add item "Q" numbers to this column for each down type as follows:
      1. IMPORTANT: Before doing the "Find and Replace all", make sure to click on the "Options" button and put a check in the box for "Match entire cell contents"
      2. First downs: Find "1" in column and Replace all with "Q21"
      3. Second downs: Find "2" in column and Replace all with "Q22"
      4. Third downs: Find "3" in column and Replace all with "Q24"
      5. Fourth downs: Find "4" in column and Replace all with "Q25"
      6. For the remaining cells without item "Q" numbers in column L (P15), use the value in column Q (P16) as follows:
        1. For "End of game" cell data, replace 0 with "Q26"
        2. For "End of half" cell data, replace 0 with "Q27"
        3. For "End period" cell data, replace 0 with "Q28"
    4. Column Q (P16) - Perform "Find and Replace all" to add item "Q" numbers to this column for each type of play that occurred as follows:
      1. For the first Kickoff of the game, replace "Kickoff" with "Q221" in the top cell in the column
      2. For the opening kickoff of the second half, Find "End of half" in column Q (P16) and the row after, replace "Kickoff" with "Q306"
      3. For the remaining cells in column Q (P16), perform a "Find and Replace all" to add item "Q" numbers using the following possibilities:
        1. IMPORTANT: Before doing the "Find and Replace all", make sure to click on the "Options" button and put a check in the box for "Match entire cell contents"
        2. For "Pass Reception" cell data, replace with "Q29"
        3. For "Field Goal Good" cell data, replace with "Q30"
        4. For "Field Goal Missed" cell data, replace with "Q31"
        5. For "Kickoff" cell data, replace with "Q32"
        6. For "Pass Incompletion" cell data, replace with "Q33"
        7. For "Pass Interception Return" cell data, replace with "Q34"
        8. For "Passing Touchdown" cell data, replace with "Q35"
        9. For "Penalty" cell data, replace with "Q36"
        10. For "Punt" cell data, replace with "Q37"
        11. For "Rush" cell data, replace with "Q38"
        12. For "Rushing Touchdown" cell data, replace with "Q39"
        13. For "Sack" cell data, replace with "Q40"
        14. For "Timeout" cell data, replace with "Q41"
        15. For "End of game" cell data, replace with "Q26"
        16. For "End of half" cell data, replace with "Q27"
        17. For "End period" cell data, replace with "Q28"
  9. Delete all columns working from right to left in your spreadsheet and keeping only the columns that have P(XX) numbers, which should leave the following columns in this order:
    1. Column A (P8)
    2. Column B (P10)
    3. Column C (P6)
    4. Column BA (P35)
    5. Column BB (P19)
    6. Column BC (P36)
    7. Column BD (P13)
    8. Column BF (P14)
    9. Column BI (P15)
    10. Column BL (P34)
    11. Column BN (P16)
    12. Column BR (P38)
    13. Column BS (P11)
    14. Column BT (P37)
    15. Column BU (P46)
    16. Column CE (P20)
    17. Column CF (P8)
    18. Column CI (P34)
    19. Column CJ (P13)
    20. Column CO (P24)
    21. Column CV (P26)
    22. Column CW (P27)
    23. Column CX (P33)
    24. Column CY (P28)
  10. Save FOUR copies of the spreadsheet as follows:
    1. Name the first copy to include "DRIVE_CREATION_SPREADSHEET" (e.g., 2017-ALA_AUB-DRIVE_CREATION_SPREADSHEET.csv)
    2. Name the second copy to include "DRIVE_DATA_SPREADSHEET" (e.g., 2017-ALA_AUB-DRIVE_DATA_SPREADSHEET.csv)
    3. Name the third copy to include "PLAY-BY-PLAY_CREATION_SPREADSHEET" (e.g., 2017-ALA_AUB-PLAY-BY-PLAY_CREATION_SPREADSHEET.csv)
    4. Name the fourth copy to include "PLAY-BY-PLAY_DATA_SPREADSHEET" (e.g., 2017-ALA_AUB-PLAY-BY-PLAY_DATA_SPREADSHEET.csv)
  1. For Drives data
    1. Some rows correspond to "non-plays" in column Q are to be skipped in later procedures. Highlight these rows and fill them with yellow color. These non-plays occur in rows for which the values in column Q are one of the following (to find rows with these values in column Q, highlight the full column and "find" (Ctrl-F) the string matching each):
      1. End period
      2. End of half
      3. End of game
      4. Timeout
      5. Blank
    2. Generating and adding play sequence data relative to each drive to column F (P36)
      1. Each drive is a collection of 1 or more plays that need to be assigned sequencing data on a per drive basis
      2. Overwrite existing data in the cells of column F (P36) with play sequence data by counting each play relative to its drive starting with the first play after the Opening Kickoff as play #1 in the 1st drive
      3. As was the case when adding play sequence data described above, any cell highlighted in yellow (based on above's "non-play" processing) is to be left blank (that is, do not include those cells in the drive sequence data that you generate)

Navigation: