@PengsPhils : I know that in a research university, the “added value” idea is tricky, because as long as there are disparities in “difficulty” level of instructors, students will exploit it when they have a chance. Say students one a particular major or pre-professional track tend to take 1 sequence in one year, and another supposedly related one in the other. One would think the related one would reflect the training or lackthereof from the previous year. However, what if there were like 4-6 non-standardized sections with varying difficulty (and that this variation was well known as it normally is) and the students just self sort with many avoiding a situation where they are even held accountable for pre-requisite knowledge to a high level. I’ve see this happen in things like STEM and econ. for example. Unless there is some minimum threshold of standards or standardized exams, you won’t necessarily be able to judge added value. If the subsequent course is a “bottleneck” with only say 1 large section that is reasonably challenging, then maybe some information can be obtained from that. Or you can zoom in on sections meant to be much more challenging than normal and see if there is a substantial differences between those who took certain instructors in the previous course versus others, but I promise you that those previous instructors who intentionally ran a less intensive course would not like to be exposed. At my undergrad, the chemistry department decided to do an evaluation at the end of organic chemistry 2 that required a bit of higher order thinking and reading and needless to say, the instructors’ whose classes did not typically “emphasize those elements” (code: easier for most students) panicked because they a) knew they had obtained mostly the weaker students from the first semester and b) their courses in fact did assess more on lower level items or did allow for more algorithmic problem solving approaches. One honestly attempted to sabotage the results by telling his students it wasn’t that important (and thus some students left it blank and he had low participation rates).
Point is, college instructors seem really antsy about being held accountable in certain ways, especially tenure tracks. The ones who don’t want to go a little further do not want to be exposed or held accountable. They seem quite happy hiding behind the bubble sheet evaluations (one tenure track instructor, who will leave, actually used to receive the highest ratings for ochem 2, but his class would fully participate and get one of the poorest scores on that assessments in both years that it was given). The 2 highest performing sections were of the two instructors that of course were the most difficult. They both rate quite well, and one in fact usually rated number 1 before the substantially easier person came along).
I also suspect students give a boost based on expectations. For example, students were going to the easier instructor mostly to escape or avoid some harder ones. As long as that easier instructor met expectations of relative ease, they’ll get some boost that has little to do with learning (that guy’s class got a significant chunk coming from the easiest 1st semester sections and another chunk unsatisfied with 1st semester results in the harder instructors, as in they did not make an A), This reveals itself in the fact that the average of the easiest instructor (exams are indeed significantly less rigorous than colleagues) was often substantially lower than the others, but would be generously curved. Those students who knew they could not compete against more ambitious students on more difficult exams knew they were getting a sweet deal where the bar was set lower in terms of the cognitive complexity and level of competition in the course. You would be surprised the amount of calculus we do/did when choosing between instructors for the same course.