Lab 1

Read in the `train.csv` data.

1. Initial Split

Split the data into a training set and a testing set as two named objects. Produce the class type for the initial split object and the training and test sets.

2. Use code to show the proportion of the `train.csv` data that went to each of the training and test sets.

3. k-fold cross-validation

Use 10-fold cross-validation to resample the training data.

4. Use `{purrr}` to add the following columns to your k-fold CV object:

analysis_n = the n of the analysis set for each fold
assessment_n = the n of the assessment set for each fold
analysis_p = the proportion of the analysis set for each fold
assessment_p = the proportion of the assessment set for each fold
sped_p = the proportion of students receiving special education services (sp_ed_fg) in the analysis and assessment sets for each fold

5. Please demonstrate that that there are no common values in the `id` columns of the `assessment` data between `Fold01` & `Fold02`, and `Fold09` & `Fold10` (of your 10-fold cross-validation object).

6. Try to answer these next questions without running similar code on real data.

For the following code vfold_cv(fictional_train, v = 20):

What is the proportion in the analysis set for each fold?
What is the proportion in the assessment set for each fold?

7. Use Monte Carlo CV to resample the training data with 20 resamples and .30 of each resample reserved for the assessment sets.

8. Please demonstrate that that there are common values in the `id` columns of the `assessment` data between `Resample 8` & `Resample 12`, and `Resample 2` & `Resample 20`in your MC CV object.

9. You plan on doing bootstrap resampling with a training set with n = 500.

What is the sample size of an analysis set for a given bootstrap resample?
What is the sample size of an assessment set for a given bootstrap resample?
If each row was selected only once for an analysis set:
- what would be the size of the analysis set?
- and what would be the size of the assessment set?

Lab 1

Resampling

Assigned 10/14/20, Due 10/21/20

Read in the `train.csv` data.

1. Initial Split

2. Use code to show the proportion of the `train.csv` data that went to each of the training and test sets.

3. k-fold cross-validation

4. Use `{purrr}` to add the following columns to your k-fold CV object:

5. Please demonstrate that that there are no common values in the `id` columns of the `assessment` data between `Fold01` & `Fold02`, and `Fold09` & `Fold10` (of your 10-fold cross-validation object).

6. Try to answer these next questions without running similar code on real data.

7. Use Monte Carlo CV to resample the training data with 20 resamples and .30 of each resample reserved for the assessment sets.

8. Please demonstrate that that there are common values in the `id` columns of the `assessment` data between `Resample 8` & `Resample 12`, and `Resample 2` & `Resample 20`in your MC CV object.

9. You plan on doing bootstrap resampling with a training set with n = 500.

Lab 1

Resampling

Assigned 10/14/20, Due 10/21/20

Read in the train.csv data.

1. Initial Split

2. Use code to show the proportion of the train.csv data that went to each of the training and test sets.

3. k-fold cross-validation

4. Use {purrr} to add the following columns to your k-fold CV object:

5. Please demonstrate that that there are no common values in the id columns of the assessment data between Fold01 & Fold02, and Fold09 & Fold10 (of your 10-fold cross-validation object).

6. Try to answer these next questions without running similar code on real data.

7. Use Monte Carlo CV to resample the training data with 20 resamples and .30 of each resample reserved for the assessment sets.

8. Please demonstrate that that there are common values in the id columns of the assessment data between Resample 8 & Resample 12, and Resample 2 & Resample 20in your MC CV object.

9. You plan on doing bootstrap resampling with a training set with n = 500.

Read in the `train.csv` data.

2. Use code to show the proportion of the `train.csv` data that went to each of the training and test sets.

4. Use `{purrr}` to add the following columns to your k-fold CV object:

5. Please demonstrate that that there are no common values in the `id` columns of the `assessment` data between `Fold01` & `Fold02`, and `Fold09` & `Fold10` (of your 10-fold cross-validation object).

8. Please demonstrate that that there are common values in the `id` columns of the `assessment` data between `Resample 8` & `Resample 12`, and `Resample 2` & `Resample 20`in your MC CV object.