MODULE 2.1

READING:

  1. thinkcspy: Chp 2.1-2.3

  2. thinkcspy: Chp 7.1 (READ UP TO Activity: 7.1.3; you do not need to go further)

  3. https://docs.python.org/3/reference/datamodel.html#index-10 (Read only the section on numbers.Integral).

  4. https://stackoverflow.com/questions/57828781/why-does-bool-exist-when-we-can-use-int (note that Python bools do not save space; communication of intent is what matters here)

  5. https://stackoverflow.com/questions/8169001/why-is-bool-a-subclass-of-int

TASK:

  1. Answer all questions.

MODULE 2.2

READING:

  1. thinkcspy Chp 9

  2. Schafer Python Tutorial Videos 2-3

  3. string.format()

  4. f-string

  5. Missing data in pandas

TASK:

  1. Complete P-Set 2

P-Set 2

Remember to document your code well. Report all results in a LaTeX-rendered PDF document. Submit your .py file and your .PDF file in a single compressed folder (.7z, .zip, or .rar).

TASK:

Problem 1:

When running statistical analyses on a database from the CHECC program, we encounter numerous issues and anomalies that don’t make sense. One of the researchers suggested that there are severe issues in the data that must be manually inspected. However, combing through all of the data is time-consuming so we want a way to manually inspect only the data that we believe contain problems.

(a) Download the CHECC data here and the initial code here. Ensure they are located in the same folder. Install the package pandas in order to run the code. pandas is a package that allows us to inspect, manipulate, and analyze data. View the data (in a software like Excel) and briefly comment on what you see.

(b) Request an input from the researcher running the code, clearly asking for their name. Then print a message describing the nature of the task using the variables defined in the code. Make sure that the messages are all center-justified. You will need to use either the .format() function or the newer f-string syntax (recommended). Assume the researcher is called Jane. The messages should look like:

Welcome! What is your name?

jaNe

Hello Jane, you are responsible for inspecting the data of children 1727 to 2010.
These children have a total of 4,501,723 combined assessments.

Additionally, ensure that

(i) The researcher name is capitalized properly in the message.

(ii) The number of assessments is separated by commas.

(iii) The input field is roughly aligned with the first welcome message and is on a new line. (Hint: use input() using a blank string with sufficient width)

Problem 2:

Now, we want to inspect every child one by one. A for loop is set up in the code that loops through all relevant children.

(a) Within the for loop, print a message telling the researcher the child’s ID and which cohort the child belongs to. A child’s cohort can be determined by the first digit in its ID. (Hint: use type casting/conversion).

(b) Within the for loop, print a message telling the researcher the child’s treatment status, removing the word “some” in all instances because any child that was treated at all is considered fully treated in our analysis. PK stands for pre-K and PA stands for parent academy. These are respectively a comprehensive pre-K curriculum and courses for parents to learn good parenting methods.

(c) Within the for loop, print a message telling the researcher the child’s non-cognitive score (“ncog”). The non-cognitive score of a child is calculated by a psychologist’s evaluation of the child’s non-cognitive skills, such as their willingness to share toys, honesty, etc. The evaluation is then compared to the evaluations of all other children and summarized as a percentile value (i.e. it should range from 0 to 1). For our analysis, we wish to use a logarithmic transformation of the non-cognitive score. The formula is:

sqrt(ln(1 + ncog))

Note that ln is short for natural log. The log function is math.log() and the constant e is math.e. Print a message telling the researcher the transformed score.

Problem 3:

Finally, run the code you produced and note down any anomalies you see. By looking at the raw data and finding the anomalies, try to assess why the anomalies are occurring.