Guest Post by Willis Eschenbach (@weschenbach on eX-Twitter)
Charles the questionable Moderator, who thanks you for staying a hit here at WUWT, asked me to look at the new paper yclept. Multivariate Analysis Rejects Theories of Human-Caused Increases in Atmospheric Carbon Dioxide: The Rule of Sea Surface Temperature by Dai Ato, an independent researcher in Japan. Looks like they’ve got some toys. I will refer to this paper as Ato2024.
I didn’t get far before the alarm went off. The study conducted a multivariate analysis using publicly available data to examine the influence of sea surface temperature (SST) and human emissions on atmospheric CO₂ levels.
This concludes that SST is an independent determinant of the annual increase in atmospheric CO₂ concentration. Human emissions were found to be irrelevant in the regression model.
And most clearly, it says:
Furthermore, the predicted atmospheric CO₂ concentration, using the regression equation obtained for SST originating from the UK-HADLEY center after 1960, shows a very high correlation with the actual CO₂ concentration (Pearson correlation coefficient r = 0.9995, P < 3e-92).
BZZZZZ!! Every time I get a high r value, I know I’m doing something very wrong… I’m going back.
First, let’s start with one of the three variables in the analysis, namely SST, CO2, and emissions. Here are three SST reconstructions from 1854 by three different groups.
Figure 1. Global monthly mean sea surface temperature (SST). The yellow area on the right is the part of the recording analyzed by Ato.
While there are some differences, the overall pattern is clear. There was SST warming from ~1850 to about 1870, cooling to ~1910, warming to ~1940, cooling to ~1965, and then warming.
Looking back, I can see why Ato didn’t want to use the full record—it doesn’t support the claim that SST is an independent determinant of atmospheric CO2 levels. The CO2 data (Figures 2 and 3 below) don’t look like that.
So how did he justify the cutoff? Well, Mauna Loa’s CO2 measurement data starts around 1960. However, it can be extended further using ice core CO2 records. Here’s what it looks like.
Figure 2. Mauna Loa and ice core measurements of background atmospheric CO2 levels, 1000-2010AD. Data: Core Ice Mauna Loa
Ato2024 says the ice core records are inaccurate. However, this is refuted by the close agreement of the ice core records with each other and with the Mauna Loa measurements as shown above.
Below is a closer look at the end of the last data since 1850, which corresponds to the sea surface temperature (SST) time frame in Figure 1.
Figure 3. As in Figure 2, but post-1850 data only
As a result of the good agreement of the two ice cores with each other and with the data of Mauna Loa, I do not see a problem in taking as a good reconstruction of the level of CO2 post-1850.
The problem, of course, is that pre-1960 sea temperatures are not the same as pre-1960 CO2 levels…and disagreeing with this is totally bogus Ato2024. So he had to ignore it.
Next, how did they get such a large correlation, 0.9995, between SST and CO2 in the post-1960 data? In part the answer lies in what he looks at. Here is the Mauna Loa post-1960 CO2 record used. Note that they do not use monthly data, only annual data. It makes it easier to get a higher Pearson correlation coefficient “r”.
Figure 4. Mauna Loa Observatory CO2 observations, along with a linear trend line.
The recent increase in CO2 is a very slow acceleration curve that is almost a straight line. This leads to many spurious correlations because the curve is easy to replicate as we shall see below. This is an ongoing problem in climate science.
But this is only the first problem. The main problem is the procedure used. Here is the description from the paper.
Note that the delta symbol (∆) in the equation means “change in”. So ∆CO2 is the change in CO2 from one year to the next.
Translated, that said:
- Calculate the best-fit linear estimate of the annual change in CO2 (∆CO2), based on the Hadley HadSST sea surface temperature.
- The predicted atmospheric CO2 is then the initial atmospheric CO2 plus the cumulative sum of the estimated annual changes in CO2.
Here’s a graph of the first part of the calculation, matching SST to annual changes in CO2.
Figure 5. Post 1960 annual change in atmospheric CO2 (∆CO2), along with the linear trend line of ∆CO2, and the best estimate of ∆CO2 based on Hadley HadSST4.0.1.
Now, there is something odd about plotting delta CO2, or any ∆ for that matter. It involved a couple of curious changes. I will use the ∆CO2 graph as in Fig. 5 is an example.
First, the overall linear trend in the CO2 data is converted to an overall offset from zero (non-zero mean) in the ∆CO2 graph.
Second, each overall acceleration in the CO2 data is converted to an overall linear trend in the ∆CO2 graph.
So from looking at Figure 5, we can see that the ∆CO2 data has a positive and accelerating trend. We can see both in Figure 3 above.
And now that we’ve fitted the SST to the ∆CO2 data so we can estimate ∆CO2, we just sum the cumulative changes to estimate the underlying CO2 data. This is the result.
Figure 6. Mauna Loa CO2 data, and Ato2024 estimation from Mauna Loa CO2 data
At this point, I have replicated the results.
Now, remember that I said that a correlation coefficient of 0.999+ means there is some fatal flaw in the logic. So…what’s not to like?
In the note asking me to review this paper, Charles Moderator included an interesting AI analysis of the paper, namely (emphasis mine):
Based on the analysis of the paper, the main problem circular reasoning appears to be in the methodology used to predict atmospheric CO2 concentrations from sea surface temperature (SST) data. in particular:
• The authors used multiple linear regression to derive an equation relating the annual CO2 increase to SST for the period 1960-2022.
• This equation is then used to “predict” CO2 concentrations for the same time period 1960-2022.
The predicted and measured CO2 concentrations were found to have a very high correlation (r = 0.9995).
Circular reasoning occurs because the same data is used to derive the equation and to test the predictive power. The key equations involved are:
Regression equation (from Step 7 in the paper):
Annual CO2 increase = 2.006 × HAD-SST + 1.143 (after 1959)
Prediction equation:
(CO2)n = Σ(ΔCO2)i + Cst
Where (CO2)n is the predicted CO2 concentration, (ΔCO2)i is the annual increase calculated from the regression equation, and Cst is the actual CO2 concentration in the initial year.
By using this method, the author essentially sets up an equation for the data and then uses the same exact equation to “predict” the resulting data. This ensures a very high correlation that does not demonstrate predictive power or causality.
A proper analysis would use separate training and test datasets, or use techniques like cross-validation, to avoid this circularity.
The very high correlations reported are almost certainly an artifact of this flawed methodology rather than evidence of a genuine relationship between SST and atmospheric CO2 levels.
And the AI is right. Well, partly right. He is correct in saying that the problem is not Ato2024 where SST is attached to CO2. The problem is that Ato2024 does not retain half of the data to verify the results. It’s easy to predict something when you already know the outcome…
But, and it’s a big But… when the problem is just enough to totally falsify conclusions, there are other really big problems. To illustrate, I have used the Ato2024 method. But instead of using sea surface temperature as an input to fit the ∆CO2 data as Ato2024, I have fitted a straight line to the ∆CO2 data. It’s the blue line in Figure 3 above.
And using the Ato2024 method, I have converted that straight line to the same CO2 data as shown in red in Figure 7 below.
Figure 7. As in Figure 6, plus the red line shows the result of using a simple straight line instead of sea surface temperature (SST) using Ato2024.
interesting. Using the Ato2024 method to set the variable to ∆CO2, straight line as input not only also use SST as input.
But that does not really show the scope of the problem. To do that, I first split the SST, straight line, and ∆CO2 data into two parts. I use the first half to fit SST or a straight line to ∆CO2. Then I used that result to estimate the change in CO2. Figure 8 shows these results.
Figure 8. As in Figure 7, but using only the first half of the data to fit the model, and then using the full data to see how well it performs.
This graph shows two separate problems. First, although the fit is poorer than in Figure 6, the Pearson correlation coefficient “r” does not change…meaning that it is not an appropriate measure of the problem.
Next, the straight line continues to do just as well using SST as the independent variable… nothing. This points to an important problem with the underlying Ato method.
To demonstrate the problem, I’ll show you again Figure 5 from above.
To recap, first, all the overall linear trends in the CO2 data are converted to an overall offset from zero (non-zero mean) on the ∆CO2 graph.
Second, each overall acceleration in the CO2 data is converted into an overall linear trend in the ∆CO2 graph.
And here is the key. If you match the SST data (or more importantly, any data) to the ∆CO2 data, you will get a fitted signal that has the same non-zero mean and the same trend as the ∆CO2 data.
Not only that, but the fit will be balanced, with the amount above and the amount below the style line equal.
And all this guarantees that if you start trying to predict a smooth curve, when you reconstruct the signal using the Ato2024 method, you will get an answer very close to a smooth curve. regardless of what variables are used to reconstruct the signal.
And that’s why using a straight line is not just using SST, or any other variable, as a basis for estimating CO2.
I weep for the death of honest peer-review…
All the best to everyone,
w.
Yes, you’ve heard it before: If you’re commenting, write down the exact words you’re discussing. Avoid endless misunderstandings.
Related