Difference between revisions of "Data Preparation Procedures"

From Wikibase.slis.ua.edu
Jump to navigation Jump to search
 
(44 intermediate revisions by the same user not shown)
Line 1: Line 1:
This page provides information on data preparation for using QuickStatement
+
This page provides information on data preparation for using QuickStatement.
 +
 
 +
Procedures vary based on the year that a game occurred.
  
 
'''IMPORTANT: Before creating an Item page, be to to search to be sure it doesn't already exist!'''
 
'''IMPORTANT: Before creating an Item page, be to to search to be sure it doesn't already exist!'''
 +
 +
 +
=== PRELIMINARY STEPS ===
  
 
# Make sure a season page exists for each team participating in the game: '''[[Indexing a Football Team Season]]'''
 
# Make sure a season page exists for each team participating in the game: '''[[Indexing a Football Team Season]]'''
 
# Creating an item page for a game: '''[[Indexing a Football Game]]'''
 
# Creating an item page for a game: '''[[Indexing a Football Game]]'''
# Dealing with spreadsheets for effective data upload: '''[[Spreadsheet Preparation Procedures]]'''
+
 
# Data prep required for creating and populate each drive's item page using QuickStatements... two steps process:
+
=== PROCEDURES ===
 +
 
 +
Spreadsheet preparation procedures - downloading/acquiring and then preparing spreadsheets based on time period within which each game occurred. Procedures include data cleaning and uploading cleaned data to our Wikibase instance using our QuickStatements tool.
 +
 
 +
==== Category 1 Games (with available JSON-encoded play-by-play data sources - 2001 to present) ====
 +
 
 +
Football games in this category occurred 2001 to present and have JSON-encoded play-by-play data sources. This category further subdivides based on the availability of wall clock data for each play (games occurring 2014 to present).
 +
 
 +
IMPORTANT: Need to first download and install a JSON to CSV converter from https://json-csv.en.softonic.com/download
 +
 
 +
# Games '''2016''' to present (JSON files with wall clock data):
 +
## '''[[Downloading and Initial Prepping of Spreadsheets for Category 1 Games 2014 to Present]]'''
 +
## '''[[Preparing Drive Creation Spreadsheets for Category 1 Games 2014 to Present]]'''
 +
## '''[[Preparing Play Creation Spreadsheets for Category 1 Games 2014 to Present]]'''
 +
## '''[[Preparing Drive Data Spreadsheets for Category 1 Games 2014 to Present]]'''
 +
## '''[[Preparing Play-by-play Data Spreadsheets for Category 1 Games 2014 to Present]]'''
 +
## Further preparation procedures to come later, including for player participation data (draft version available: [[Player Indexing]])
 +
# Games 2001 to 2013 (JSON files without wall clock data):
 +
## '''[[Downloading and Initial Prepping of Spreadsheets for Category 1 Games 2001 to 2013]]'''
 +
## '''[[Preparing Drive Creation Spreadsheets for Category 1 Games 2001 to 2013]]'''
 +
## '''[[Preparing Drive Data Spreadsheets for Category 1 Games 2001 to 2013]]'''
 +
## '''[[Preparing Play Creation Spreadsheets for Category 1 Games 2001 to 2013]]'''
 +
## '''[[Preparing Play-by-play Data Spreadsheets for Category 1 Games 2001 to 2013]]'''
 +
## Further preparation procedures to come later, including for player participation data (draft version available: [[Player Indexing]])
 +
 
 +
==== Category 2 Games (with play-by-play data sources requiring transcribing) ====
 +
 
 +
Football games in this category occurred prior to 2001 and have paper-based play-by-play data sources that require transcribing to spreadsheets.
 +
 
 +
# '''[[Transcribing Steps and Initial Prepping of Spreadsheets for Category 2 Games]]'''
 +
# '''[[Preparing Drive Creation Spreadsheets for Category 2 Games]]'''
 +
# '''[[Preparing Drive Data Spreadsheets for Category 2 Games]]'''
 +
# '''[[Preparing Play Creation Spreadsheets for Category 2 Games]]'''
 +
# '''[[Preparing Play-by-play Data Spreadsheets for Category 2 Games]]'''
 +
# Further preparation procedures to come later, including for player participation data (draft version available: [[Player Indexing]])
 +
 
 +
==== Category 3 Games (with play-by-play data requiring reconstruction from newspaper accounts) ====
 +
 
 +
Football games in this category occurred prior to 2001 and do not have any play-by-play data sources other than newspaper game accounts that require transcribing to spreadsheets.
 +
 
 +
# '''[[Data Gathering, Transcribing Steps, and Initial Prepping of Spreadsheets for Category 3 Games]]'''
 +
# '''[[Preparing Drive Creation Spreadsheets for Category 3 Games]]'''
 +
# '''[[Preparing Drive Data Spreadsheets for Category 3 Games]]'''
 +
# '''[[Preparing Play Creation Spreadsheets for Category 3 Games]]'''
 +
# '''[[Preparing Play-by-play Data Spreadsheets for Category 3 Games]]'''
 +
# Further preparation procedures to come later, including for player participation data (draft version available: [[Player Indexing]])
 +
 
 +
Data prep required for creating and populate each drive's item page using QuickStatements... two steps process:
 
## Create spreadsheet that will derive Q numbers for each drive and incorporating "instance of" statements: '''[[Drive Creation Procedure]]'''
 
## Create spreadsheet that will derive Q numbers for each drive and incorporating "instance of" statements: '''[[Drive Creation Procedure]]'''
 
## Create spreadsheet to populate data about each drive's (now) existing item page: '''[[Drive Data Preparation]]'''
 
## Create spreadsheet to populate data about each drive's (now) existing item page: '''[[Drive Data Preparation]]'''
# Next, create an item page for each play for a game ... two steps process:
+
# Play-by-play Data Preparation and Upload Procedures

Latest revision as of 20:05, 28 January 2019

This page provides information on data preparation for using QuickStatement.

Procedures vary based on the year that a game occurred.

IMPORTANT: Before creating an Item page, be to to search to be sure it doesn't already exist!


PRELIMINARY STEPS

  1. Make sure a season page exists for each team participating in the game: Indexing a Football Team Season
  2. Creating an item page for a game: Indexing a Football Game

PROCEDURES

Spreadsheet preparation procedures - downloading/acquiring and then preparing spreadsheets based on time period within which each game occurred. Procedures include data cleaning and uploading cleaned data to our Wikibase instance using our QuickStatements tool.

Category 1 Games (with available JSON-encoded play-by-play data sources - 2001 to present)

Football games in this category occurred 2001 to present and have JSON-encoded play-by-play data sources. This category further subdivides based on the availability of wall clock data for each play (games occurring 2014 to present).

IMPORTANT: Need to first download and install a JSON to CSV converter from https://json-csv.en.softonic.com/download

  1. Games 2016 to present (JSON files with wall clock data):
    1. Downloading and Initial Prepping of Spreadsheets for Category 1 Games 2014 to Present
    2. Preparing Drive Creation Spreadsheets for Category 1 Games 2014 to Present
    3. Preparing Play Creation Spreadsheets for Category 1 Games 2014 to Present
    4. Preparing Drive Data Spreadsheets for Category 1 Games 2014 to Present
    5. Preparing Play-by-play Data Spreadsheets for Category 1 Games 2014 to Present
    6. Further preparation procedures to come later, including for player participation data (draft version available: Player Indexing)
  2. Games 2001 to 2013 (JSON files without wall clock data):
    1. Downloading and Initial Prepping of Spreadsheets for Category 1 Games 2001 to 2013
    2. Preparing Drive Creation Spreadsheets for Category 1 Games 2001 to 2013
    3. Preparing Drive Data Spreadsheets for Category 1 Games 2001 to 2013
    4. Preparing Play Creation Spreadsheets for Category 1 Games 2001 to 2013
    5. Preparing Play-by-play Data Spreadsheets for Category 1 Games 2001 to 2013
    6. Further preparation procedures to come later, including for player participation data (draft version available: Player Indexing)

Category 2 Games (with play-by-play data sources requiring transcribing)

Football games in this category occurred prior to 2001 and have paper-based play-by-play data sources that require transcribing to spreadsheets.

  1. Transcribing Steps and Initial Prepping of Spreadsheets for Category 2 Games
  2. Preparing Drive Creation Spreadsheets for Category 2 Games
  3. Preparing Drive Data Spreadsheets for Category 2 Games
  4. Preparing Play Creation Spreadsheets for Category 2 Games
  5. Preparing Play-by-play Data Spreadsheets for Category 2 Games
  6. Further preparation procedures to come later, including for player participation data (draft version available: Player Indexing)

Category 3 Games (with play-by-play data requiring reconstruction from newspaper accounts)

Football games in this category occurred prior to 2001 and do not have any play-by-play data sources other than newspaper game accounts that require transcribing to spreadsheets.

  1. Data Gathering, Transcribing Steps, and Initial Prepping of Spreadsheets for Category 3 Games
  2. Preparing Drive Creation Spreadsheets for Category 3 Games
  3. Preparing Drive Data Spreadsheets for Category 3 Games
  4. Preparing Play Creation Spreadsheets for Category 3 Games
  5. Preparing Play-by-play Data Spreadsheets for Category 3 Games
  6. Further preparation procedures to come later, including for player participation data (draft version available: Player Indexing)

Data prep required for creating and populate each drive's item page using QuickStatements... two steps process:

    1. Create spreadsheet that will derive Q numbers for each drive and incorporating "instance of" statements: Drive Creation Procedure
    2. Create spreadsheet to populate data about each drive's (now) existing item page: Drive Data Preparation
  1. Play-by-play Data Preparation and Upload Procedures