A confidence interval for your random heights

edited June 18 in Assignments

(5pts)

In this problem, we're going to return to our fun web program that generates random CSV data for people. Recall that you can access it via Python like so:

%matplotlib inline
import pandas as pd
df = pd.read_csv('https://www.marksmath.org/cgi-bin/random_data.csv?username=mark')
df.tail()
first_name last_name age sex height weight income activity_level
0 Donna Dinan 35 female 65.37 164.26 1947 high
1 Antonia Davis 39 female 64.95 140.40 2188 none
2 Stephanie Buss 30 female 60.75 181.83 18108 high
3 Wendell Elmore 26 male 64.68 157.90 1935 moderate
4 Nina Mcilhinney 21 female 59.94 163.38 5675 none

Also recall that the data is randomly generated but the random number generator is seeded using the username query parameter in the URL. Thus, if I execute that command several times, I get the same result every time. That result depends upon the username, however. Thus, if you do it with your forum username, you'll get a different result. Thus, we all have our own randomly generated data file!

The problem: Using the code above with your username, generate your data file and then

  1. Compute the average value of the heights in your data (which you've done before),
  2. the standard deviation of the heights in your data,
  3. the standard error of the heights in your data,
  4. the margin of error to use the heights in your data to compute a $(100-s)\%$ confidence interval (where $s$ is your special number), and
  5. the resulting $(100-s)\%$ confidence interval for height

Be sure to include both the code that you typed, as well as the results in your post.

Comments

  • edited June 20

    %matplotlib inline
    import pandas as pd
    df = pd.read_csv('https://www.marksmath.org/cgi-bin/random_data.csv?username=jordan')
    df.tail()

    heights=df.height.sample(100,random_state=1)
    xbar=heights.mean()
    xbar
    %66.73939999999997%

    s= heights.std()
    s
    %4.122285043492017%

    se = s/sqrt(100)
    se = (s/10)
    se
    %0.41222850434920166%

    from scipy.stats import norm
    z=norm.ppf(0.045)
    z
    %-1.6953977102721358%

    from scipy.stats import norm
    z= norm.ppf(0.995)
    z
    %2.5758293035489004%

    me= z*se
    me
    %1.061830261260809%

    ci= [m-me, m+me]
    ci
    %[177.91816973873918, 180.0418302612608]%

    mark
  • edited June 19
    %matplotlib inline
    import pandas as pd
    df = pd.read_csv('https://www.marksmath.org/cgi-bin/random_data.csv?username=sarah')
    df.tail()
    

    1) Avg Values

    heights=df.height.sample(100,random_state=1)
    xbar=heights.mean()
    xbar
    

    65.5548

    2) Std Deviation

    s= heights.std()
    s
    

    4.113876610652872

    3) Std Error

    se = s/sqrt(100)
    se = (s/10)
    se
    

    0.4113876610652872

    4) Margin of Error

    from scipy.stats import norm
    z=norm.ppf(0.045)
    z
    

    z = -1.6953977102721358
    zstr=-z
    zstr
    zstr=1.6953977102721358

    me=zstr*se
    me
    

    0.6974656986042974

    5) Confidence Interval for 91%

    ci=[m-me,m+me]
    ci
    

    [64.8573343013957, 66.2522656986043]

    mark
  • benben
    edited June 18
    %matplotlib inline
    import pandas as pd
    df = pd.read_csv('https://www.marksmath.org/cgi-bin/random_data.csv?username=ben')
    df.tail()
    

    mean: 66.3
    sd: 3.89
    se: .39
    margin of error: .79
    z*: 2.05

    norm.ppf(.02)
    

    confidence interval for 96%: (65.51,67.09)

    mark
  • edited June 18

    Code for my data table:

    %matplotlib inline
    import pandas as pd
    df = pd.read_csv('https://www.marksmath.org/cgi-bin/random_data.csv? 
    username=lillian')
    df.tail()
    

    1.) Average Value of the Heights:

    heights = df.height.sample(100,random_state=1)
    xbar = heights.mean()
    xbar
    

    = 66.30110000000002

    2.) Standard Deviation of the Heights:

    s = heights.std()
    s
    

    = 3.7302560084917624

    3.) Standard Error of the Heights:

    se = s/sqrt(100)
    se
    

    = 0.37302560084917624

    4.) Margin of Error:

    me
    

    = 0.7460512016983525

    5.) Confidence Interval for Height:

    [xbar - me, xbar + me]
    

    = [65.55504879830167, 67.04715120169837]

    6.) z* Multiplier:

    I had a 94% CI, so my number was 6

    norm.ppf(0.03)
    

    = 1.880793608151251

    mark
  • edited June 20
    %matplotlib inline
    import pandas as pd
    df = pd.read_csv('https://www.marksmath.org/cgi-bin/random_data.csv? 
    username=alex') df.tail()
    

    1.

    m= df.height.mean()
    m
    

    m= 66.20770000000003

    2.

    s= df.height.std()
    s
    

    s= 3.490168255907652

    3.

    from numpy import sqrt
    s = df.height.std()
    se = s/sqrt(100)
    se
    

    se= 0.3490168255907652

    1. (100-1)%

    5.

    from scipy.stats import norm
    z= norm.ppf(0.995)
    z
    
    me= z*se
    me
    
    ci= [m-me, m+me]
    ci
    

    ci= [65.30869223321172, 67.10670776678835]

  • edited June 18

    First, I'll import my data and compute my mean and standard deviation:

    %matplotlib inline
    import pandas as pd
    df = pd.read_csv('https://www.marksmath.org/cgi-bin/random_data.csv?username=audrey')
    m = df.height.mean()
    s = df.height.std()
    [m,s]
    

    [66.49599999999998, 3.9782326922309807]

    Thus, my standard error is:

    se = s/10
    se
    

    0.39782326922309807

    and my %z^*%-multiplier is 1.96 since:

    from scipy.stats import norm
    norm.ppf(0.025)
    

    -1.9599639845400545

    jordan
  • edited June 20

    1.) Mean:

    df.height.mean()
    

    65.35810000000001

    2.) Standard Deviation:

        df.height.std()
    

    3.701679306324183

    3.) Standard error

        from numpy import sqrt
         s = heights.std()
         se = s/sqrt(100)
         se
    

    0.37016793063241826

    4.) Margin of error

        me = 2*se
        me
    

    0.7403358612648365

    5.) Confidence Interval

        [xbar - me, xbar + me]
    

    [64.61776413873517, 66.09843586126485]

    1. Z*= 2.33
    mark
  • m = df.height.mean()
    m
    

    mean=67.02449999999999

    s = df.height.std()
    s
    

    standard deviation=3.8097767306462633

    se = s/10
    se
    

    standard error=0.3809776730646263

    sp = (100-s)/100
    sp
    

    0.9619022326935374

    from scipy.stats import norm
    
    z = norm.ppf(sp)
    z
    

    z*=1.7732002261111544

    me = z*se
    me
    

    Margin of Error=0.6755496960214968

    ci = [m-me, m+me]
    ci
    

    Confidence Interval=[66.3489503039785, 67.70004969602148]

    mark
  • edited June 20

    This is the code importing my data along with the mean and standard deviation :

     %matplotlib inline
     import pandas as pd
     df = pd.read_csv('https://www.marksmath.org/cgi-bin/random_data.csv? 
     username=beau')
     df.tail()
     s = df.height.std()
     m = df.height.mean()
     [m,s]
    

    [65.81690000000002, 3.854679615661196]

    The standard error is: 0.3854679615661196

    se = s/10
    se
    

    The z multiplier is: 2.2

    from scipy.stats import norm
    norm.ppf(0.015)
    

    -2.1700903775845606

    The margin of error is: 0.8480295154454632

    me = 2.2 * se
    me
    

    The confidence interval is: [64.96887048455456, 66.66492951544548]

    le = m - (2.2 * se)
    re = m + (2.2 * se)
    [le,re]
    
    mark
  • 1+2) mean and standard deviation

    %matplotlib inline
    import pandas as pd
    df = pd.read_csv('https://www.marksmath.org/cgi-bin/random_data.csv? 
    username=isabel')
    m = df.height.mean()
    s = df.height.std()
    [m,s]
    

    [66.81249999999999, 4.248674273324336]

    3) standard deviation

    se = s/10
    se
    

    0.4248674273324336

    4) margin of error

    me = 2*se
    me
    

    0.8497348546648672

    5) confidence interval

    ci = [m-me, m+me]
    ci
    

    [65.96276514533511, 67.66223485466486]

    mark
Sign In or Register to comment.