It all comes down to how they defined/measured learning, as @Sue22 pointed out. For example, if they used the grades in the classes, those have tons of problems with relative curving, and how a teacher grader. To standardize across a lot of colleges is going to be difficult. @ALF 's measure of success in subsequent classes would be a better measure IMO.
Beyond that, not all subjects are going to be equally affected. An interesting math teacher versus an interesting history teacher, when compared to less desirable counterparts, could produce different results.
I think that when something this hard and subjective to measure goes so hard against intuition, skepticism is very reasonable. Anyone in education knows that how much a student is motivated is one of the key factors in learning - if a student doesn’t want to learn, they aren’t going to. Teachers who inspire, motivate, and get students excited about the subject may not actually teach any content significantly better than the average teacher, but the motivation they can inspire can make the bigger difference.