I am studying for my exam and having trouble with these two questions:
1)Colleges rate high schools by comparing a student’s 1st year College average with the student’s average in their top 6 Grade 12 courses. Here is some recent data:
Student A B C D E F G H
College Average 45 82 30 80 71 95 67 56
High School Average 50 74 57 73 70 91 69 63
a) It is clear that University Average will depend on High School Average. Determine the regression equation using the following calculations.
∑x=547; ∑y=526; ∑xy=37,634; ∑x²=38,465; ∑y²=37,740.
b) Predict the College Average of a student who had a 60 for their High School Average.
- Mr. Kartye wants to investigate if student performance on the MDM 4U final exam is related to time spent studying for the exam. He gathered the following sample data:
Hours Studying 10 15 21 6 18 20 12
Exam Mark 78 85 96 75 84 45 82
Using Fathom, Mr. Smith generated the scatter plot shown at the right. Unfortunately, it does not display what Mr. Smith wanted. For example, r²=0.0039 meaning there is absolutely no relationship (correlation) between exam marks and hours spent studying. Furthermore, the line of best fit of yp = -0.197x + 81 demonstrates a negative slope. In other words, the more you study, the lower your exam mark. Kartye determines he must have done something wrong, or not taken something into consideration. Suggest how Mr. Smith should modify his data set. Implement your suggestion to determine a new line of best fit, and coefficient of determination.
@zxcvbnm1216 1. is pretty doable using the general formula for a least squares regression. Plug and chug.
- Is the sample random?
Thanks @MITer94
- yes it is random sampling @MITer94 .
is this right for 1?
a= [8(37,634)-(547)(526)]/[8(38,465)-(547)^2)]
=(301,072-287,722)/(307,720-299,209)
=13,350/8511
=1.57
b= 547- (1.57) (526)
= 547-825.82
=-278.82
@zxcvbnm1216 I am using the least squares regression y = b + ax where a = (∑(xi - xbar)(yi - ybar)) / ∑(xi - xbar)^2 and b = ybar - a*xbar (see [Wikipedia article](https://en.wikipedia.org/wiki/Simplelinearregression#Fittingtheregressionline)). Note that we accidentally swapped a and b but it doesn’t matter. Note i ranges from 1 to n (the sample size).
Not quite. Remember xbar = 547/8 = 68.375 and ybar = 526/8 = 65.75. So now compute ∑(xi -68.375)(yi - 65.75) where xi and yi range over HS averages and college averages respectively. That becomes your numerator. With some algebra you can actually compute the numerator with only the given calculations and no further brute force (hint: expand!)
Your denominator calculation is not quite right either. Using a similar method to finding the numerator (hint: expand again!) you can correctly compute the denominator without brute force.
thanks @MITer94 Now I understand
@MITer94 How would you do 2? I got 1 already but am having trouble with 2
@zxcvbnm1216 2 is more of a conceptual question. I want you to think about it.
@MITer94 I seriously do not know what to do. I thought I had to calculate all the regression calculations but after I did them all and solved for the equation, the slope was also negative. Did I do this wrong or is their an equation I am missing?
@zxcvbnm1216 I’m assuming the equation of the line and the value of r or r^2 is correct, given the data points (I didn’t actually compute them).
Instead, figure out why the least squares line has negative slope (there is one big reason I see, and possibly others). Are the conclusions given by Mr. Kartye and the problem itself correct?
@MITer94 I can’t figure it out. The only reason why the regression line is negative is because of the one student who studied 20 hours and only got a 45. Should he calculate the mean of hours studied and mark or am I missing something? I tried to think of everything possible. Maybe because the x and y values should be swtiched?
@zxcvbnm1216 This doesn’t affect r^2, and the slope would still be negative.
Adding (xbar, ybar) doesn’t change the regression line.
Bingo! Formally, this student/data point is an outlier. What happens if you exclude him?
Another fault is that correlation does not necessarily imply causation.
oh wow @MITer94 thanks so much!!! I figured that it might have to do with the one student because his test messes up the correlation. Again, Thanks so much