An archive of Mark's Fall 2017 Intro Stat course.

blaiser1

So maybe I’m just dumb (and that’s certainly what it is) but I have become incredibly lost, today’s class especially. I have no idea what a test statistic is and I don’t know how to do find the P value. I have to work so I haven’t had time to go to the math lab and I wont be able to go in tomorrow before work either. Could someone help and or pint me in the direction of something you’ve found helpful?

Reading the T table especially has lost me and some questions on the homework have asked for degrees of freedom past 20. I understand what a one or two tailed problem theoretically is but I don’t know how they’re worded

mark

This is a bit difficult to answer, since it’s so general, but I am sorry to hear of the difficulties so I’ll give it a try. It’ll be easier to answer these questions in the context of a problem - so, let’s make one.

## A problem

Supposedly, a kid trick or treating in Montford will, on average, collect 40 pieces of candy in an hour. A random sample of 10 kids yielded only 37.2 pieces of candy with a standard deviation of 2.3. Can we say that the 40 pieces of candy is claim incorrect with 99\% probability?

To solve this problem, we’ll need to understand the “test statistic” and the “p-value” that you ask about. We can also compare and contrast the use of R with the use of a table.

### The test statistic

The test statistic, also called the t -score, is directly analogous to the z -score that we used with the normal distribution. That is, if we have an observation x and we want to see where it lies in a population with mean \bar{x} and standard deviation s , we compute

T = \frac{x-\bar{x}}{s/\sqrt{n}}.

In the example above, we’ve got
T = \frac{37.2-40}{3/\sqrt{10}} \approx -3.849729.

### The p -value

As always, the p -value is the probability that we could get the observed data or worse, under the assumption of the null hypothesis. In this particular case, our null and alternative hypotheses are

• H_0: \bar{x} = 40
• H_A:\bar{x} \neq 40

The point behind the t -score is this probability can be computed as the area under a t -distribution and outside the t -score. This looks something like so:

This is exactly what we’ve learned to compute with R using a command like:

2*pt(-3.849729, 9)
# Out: 0.003907934


Thus, that shaded area (which is the same as our p -value) is 0.003907934. We asked for a 99\% level of confidence, so this is less than our threshold value of 0.01 and we reject the null hypothesis.

### Tables

You can’t read a p -value directly off of our class t -table, because t -tables are generally much more sparse than normal tables. You can read critical threshold values off of a t -table, though, and this is enough to decide whether to reject a null hypothesis or not. If we take a look at our t -table, we see the following:

Note that the last column corresponds to a two-tail 99\% level of confidence. If we go down to the 9 degrees of freedom row, we see that the critical t^* value is 3.17. Since our t -score of 3.85 is even larger than that, we see that we must reject the null hypothesis.

Of course, if you’re working on a HW problem where the table does not include the necessary degrees of freedom, then you can use R to do the computation. On an exam, I guess I can’t go larger than 20 degrees of freedom without expanding the table.

So, this stuff is tricky - I can see why you’d call it a “difficulty spike”. I hope this helps, though, and everyone should feel free to post more questions here!

blaiser1

where did the 9 come from?
thank you for the response

blaiser1

also which questions include stuff like

"What is the​ P-value for t≥(SOME NUMBER) with (SOME NUMBER) degrees of​ freedom?

mark

The second argument of the t -distribution functions controls the degrees of freedom, which happens to be the sample size minus 1. Since there are 10 kids in the sample, the degrees of freedom is 9.