Figure 1 showing conventional and inverse OLS linear regression fits on some real, observed climate variables.
Least squares regression is a very useful technique, widely used in almost all branches of science. It is usually one of the first techniques that is taught in schools for analysing experimental data.
It is also a technique that is misapplied almost as often as it is used correctly.
It can be shown that under certain conditions, the least square linear regression is the best estimation of the true relationship that can be derived from the available data. In statistics this is often called the ‘best unbiased linear estimation’ of the slope. (Some who enjoy contrived acronyms abbreviate this to “BLUE”.)
There are two main conditions for this result to be an accurate estimation of the slope. One is that the deviations of the data from the true relationship are ‘normally’ or gaussian distributed. That is to say that they are of a random nature. This condition can be violated by significant periodic components in the data or excessive number of out-lying data points. The latter may often occur when only a small number of data points is available and the noise, even if random in nature is not sufficiently sampled to average out.
The other main condition is that there be negligible error (or experimental uncertainty) in the x variable. If this condition is not met, the OLS result derived from the data will almost always under-estimate the slope of the true relationship. This effect is referred to regression dilution. The degree by which the slope is under-estimated is determined by the nature of the x and y errors but most strongly by those in x since they are required to be negligible for OLS to give the best estimation.
In this discussion “errors” can be understood to be both observational inaccuracies and any variability due to some factor other than the supposed linear relationship that it is sought to determine by linear regression of the two variables.
In certain circumstances regression dilution can be corrected for, but in order to do so, some knowledge of the nature and size of the both x and y errors has to be known. Typically this is not the case beyond knowing whether the x variable is a ‘controlled variable’ with negligible error although several techniques have been developed to estimate the error in the estimation of the slope.
A controlled variable can usually be attained in a controlled experiment, or when studying a time series, provided that the date and time of observations have been recorded and documented in a precise and consistent manner. It is typically not the case when both sets of data are observations of different variables, as is the case when comparing two quantities in climatology.
One way to demonstrate the problem is to invert the x and y axes and repeat the OLS fit. If the result was valid, irrespective, the first slope would be the reciprocal of second one. However, this is only the case when there is very small errors in both variables, ie. the data is grouped closely around a straight line. In the case of one controlled variable and one error prone variable, the inverted result will be incorrect. In the case of two datasets containing observational error, both results will be wrong and the correct result will probably lie somewhere in between.
In the latter situation, the two regression fits can be taken as bounding the likely true value but some knowledge of the relative errors is needed to decide where in that range the best estimation lies. There are a number of techniques such as bisecting the angle, taking the geometric mean (square root of the product), or some other average but ultimately they are no more objective unless driven by some knowledge of the relative errors. Clearly bisection would not be correct if one variable had low error, since the true slope would then be close to the OLS fit done with that quantity on the x axis.
Figure 2. A typical example of linear regression of two noisy variables produced from synthetic randomised data. ( Click to enlarge graph and access code to reproduce data and graph. )
Figure 2b. A typical example of correct application of linear regression to data with negligible x-errors. The regressed slope is very close to the true value. ( Click to enlarge )
An Illustration: the Spencer simple model.
The following case is used to illustrate the issue with ‘climate-like’ data. However, the problem is an objective mathematical one, the principal of which is independent of any particular test data used. Whether the following model is an accurate representation of climate ( it is not ) has no bearing on the regression problem.
In a short article on his site Dr. Roy Spencer provided a simple, single-slab ocean, climate model with a predetermined feedback variable built into it. He observed that attempting to derive the climate sensitivity in the usual way consistently under-estimated the know feedback used to generate the data.
By specifying that sensitivity (with a total feedback parameter) in the model, one can see how an analysis of simulated satellite data will yield observations that routinely suggest a more sensitive climate system (lower feedback parameter) than was actually specified in the model run.
And if our climate system generates the illusion that it is sensitive, climate modelers will develop models that are also sensitive, and the more sensitive the climate model, the more global warming it will predict from adding greenhouse gasses to the atmosphere.
This is a very important observation. Regressing noisy radiative flux change against noisey temperature anomalies does consistently produce incorrectly high sensitivity. However, it is not an illusion created by the climate system, it is an illusion created by the incorrect application of OLS regression. Where there is error on both variables, the OLS slope is no longer an accurate reflection of the underlying linear relationship being sought.
Dr Spencer was kind enough to provide an implementation of the simple model in the form of a spread sheet for download so that others may experiment and verify the effect.
To demonstrate this problem, the spreadsheet provided was modified to duplicate the Rad vs Temp graph but with the axes inverted, ie. using exactly the same data for each run but in addition displaying it the other way around. Thus the ‘trend line’ is calculated with the variables inverted. No changes were made to the model.
Three values were used for the feedback variable were used in turn, two values: 0.9 and 1.9 that Roy Spencer suggests represent the range of IPCC values and 5.0 which he proposes as a value closer to that which he has derived from satellite observational data.
Here is a snap-shot of the spreadsheet showing a table of results from nine runs for each feedback parameter value. Both both the conventional and the inverted regression slopes and their geometric mean have been tabulated.
Figure 3. Snap-shot of spreadsheet, click to enlarge.
Firstly this confirms Roy Spencer’s observation that the regression of dRad against dTemp consistently and significantly under-estimates the the feedback parameter used to create the data (and hence over-estimates climate sensitivity of the model). In this limited test, error is between a third and a half of the correct value. There is only one value of the conventional least squares slope that is greater than the respective feedback parameter value.
Secondly, it is noted that the geometric mean of the two OLS regressions does provide a reasonably close to the true feedback parameter, for the value derived from satellite observations. Variations are fairly evenly spread either side: the mean is only slightly higher than the true value and the standard deviation is about 9% of the mean.
However, for the two lower feedback values, representing the IPCC range of climate sensitivities, while the usual OLS regression is substantially less than the true value, the geometric mean over-estimates and does not provide a reliable correction over the range of feedbacks.
All the feedbacks represent a net negative feedback ( otherwise the climate system would be fundamentally unstable ). However, the IPCC range of values represents less negative feedbacks, thus a less stable climate. This can be seen reflected in the degree of variability in data plotted in the spreadsheet. The standard deviations of the slopes are also somewhat higher. This is not unexpected with less feedback controlling variations.
It can be concluded that the ratio of the proportional variability in the two quantities changes as a function of the degree of feedback in the system. The geometric mean of the two slopes does not provide a good estimation of the true feedback for the less stable configurations.
The simple model helps to see how this relates to Rad / Temp plots and climate sensitivity. However, the problem of regression dilution is a totally general mathematical result and can be reproduced from two series having a linear relationship with added random changes, as shown above.
What the papers say
A quick review of several recent papers on the problems of determining climate sensitivity shows a general lack of appreciation of the regression dilution problem.
Dessler 2010 b  :
Estimates of Earth’s climate sensitivity are uncertain, largely because of uncertainty in the long-term cloud feedback.
Spencer & Braswell 2011  :
Abstract: The sensitivity of the climate system to an imposed radiative imbalance remains the largest source of uncertainty in projections of future anthropogenic climate change.
There seems to be agreement that this is the key problem in assessing future climate trends. However, many authors seem unaware of the regression problem and much published work on this issue seems to rely heavily on the false assumption that OLS regression of dRad against dTemp can be used to correctly determine this ratio, and hence various sensitivities and feedbacks.
Trenberth 2010  :
To assess climate sensitivity from Earth radiation observations of limited duration and observed sea surface temperatures (SSTs) requires a closed and therefore global domain, equilibrium between the fields, and robust methods of dealing with noise. Noise arises from natural variability in the atmosphere and observational noise in precessing satellite observations.
Whether or not the results provide meaningful insight depends critically on assumptions, methods and the time scales ….
Indeed. Unfortunately they then go on to contradict earlier work by Lindzen and Choi that did not rely on OLS regression by relying on inappropriate use of regression.
Spencer and Braswell 2011 
As shown by SB10, the presence of any time-varying radiative forcing decorrelates the co-variations between radiative flux and temperature. Low correlations lead to regression-diagnosed feedback parameters biased toward zero, which corresponds to a borderline unstable climate system.
This is an important paper highlighting the need to use lagged regression to avoid the decorrelating effect of delays in the response. However, it is ultimately still based on regression of two error laden-variables and thus does not recognise regression dilution that is also present in this situation. Thus it is likely that this paper is still over-estimating sensitivity.
Dessler 2011  :
Using a more realistic value of σ(dF_ocean)/σ(dR_cloud) = 20, regression of TOA flux vs. dTs yields a slope that is within 0.4% of lamba.
Then in the conclusion of the paper:
warming). Rather, the evolution of the surface and atmosphere during ENSO variations are dominated by oceanic heat transport. This means in turn that regressions of TOA fluxes vs. δTs can be used to accurately estimate climate sensitivity or the magnitude of climate feedbacks.
Also from a previous paper:
Dessler 2010 b 
The impact of a spurious long-term trend in either dRall-sky or dRclear-sky is estimated by adding in a trend of T0.5 W/m 2/ decade into the CERES data. This changes the calculated feedback by T0.18 W/m2/K. Adding these errors in quadrature yields a total uncertainty of 0.74 and 0.77 W/m2/K in the calculations, using the ECMWF and MERRA reanalyses, respectively. Other sources of uncertainty are negligible.
The author is apparently unaware of the inaccuracy of regressing two uncontrolled variables is a major source of uncertainty and error.
Lindzen & Choi 2011 
[Our] new method does moderately well in distinguishing positive from negative feedbacks and in quantifying negative feedbacks. In contrast, we show that simple regression methods used by several existing
papers generally exaggerate positive feedbacks and even show positive feedbacks when actual feedbacks are negative.
… but we see clearly that the simple regression always under-estimates negative feedbacks and exaggerates positive feedbacks.
Here the authors have clearly noted that there is a problem with the regression based techniques and go into quite some detail in quantifying the problem, though they do not explicitly identify it as being due to the presence of uncertainty in the x-variable.
All the L&C papers, to their credit, recognise regression seriously under-estimates the slope and utilise other techniques to determine the ratio.
It seems the latter authors are exceptional in looking at the sensitivity question without relying on inappropriate use of linear regression. It is certainly part of the reason that their results are considerably lower than almost all other authors on this subject.
Inappropriate use of linear regression can produce spurious and significantly low estimations of the true slope of a linear relationship if both variables have significant measurement error or other perturbing factors.
This is precisely the case when attempting to regress modelled or observed radiative flux against surface temperatures in order to estimate sensitivity of the climate system.
In the conventional sense that this regression is done in climate studies, it will under-estimate the net feedback factor (often denoted as ‘lambda’). Since climate sensitivity is defined as the reciprocal of this term, this results in an over-estimation of climate sensitivity.
If an incorrect evaluation of climate sensitivity from observations is used as a basis for the choice of parametrised inputs to climate models, the resulting model will be over sensitive and produce exaggerated warming. Similarly faulty analyses of their output will further inflate the apparent model sensitivity.
This situation may account for the difference between regression based estimations of climate sensitivity and those produced by other methods such as recent work by Nic Lewis  and others. Many techniques to reduce this effect are available in the broader scientific literature.
Those using linear regression to assess climate sensitivity need to account for this significant source of error when supplying uncertainly values in published estimations of climate sensitivity or take steps to address the issue.
 Nic Lewis : “A Sensitive Matter: How The IPCC Buried Evidence Showing Good News About Global Warming “