BeeTheory · Foundations · Technical Note XI
Identifying the Missing Parameter:
Step 1 — Systematic Correlation Analysis
Before modifying the model, this note diagnoses which observable parameter best predicts the residual error. Working on the 22-galaxy calibration set of Note VIII, we test the correlation of the prediction error with every physically meaningful variable, then with every bivariate combination, to identify rigorously what the current model has omitted.
1. The result first
The missing parameter is the central surface density
The central baryonic surface density $\Sigma_d$ has the strongest non-trivial correlation with the prediction error: $r = +0.62$, $R^2 = 0.39$ on its own.
Combining $\Sigma_d$ with the disk size $R_d$ in a bivariate model explains $R^2 = 0.43$ of the residual variance, compared to $R^2 = 0.07$ with $R_d$ alone. The RMS residual drops from $19.5\%$ to $14.9\%$.
After absorbing both $R_d$ and $\Sigma_d$, no additional physical observable carries information about the residual.
2. Method
Working on the 22-galaxy calibration set (Note VIII), for each galaxy we have the prediction error $text{err} = (V_text{tot} – V_f)/V_f$ and a list of measurable physical parameters. We compute the Pearson and Spearman correlations between the error and each candidate variable, then test bivariate regressions of the form:
$$\text{err}(\%) \;=\; a \cdot R_d \;+\; b \cdot X \;+\; c$$
where $X$ is each candidate variable. The best $X$ is the one that maximises the explained variance $R^2$ on the 22 galaxies. Self-referential variables — those derived from the model output, like $V_\text{wave}$ or $V_\text{tot}$ — are excluded from the search, since their correlation with the error is tautological.
3. Univariate correlations
The 24 candidate variables tested, ranked by absolute Pearson correlation with the error. Rows shaded gold are variables derived from the model itself (tautological); rows shaded red are genuine physical observables with $|r| > 0.5$.
| Variable | Description | Pearson $r$ | $p$-value | Significance |
|---|---|---|---|---|
| Vw_over_Vf | Vw / Vf ratio | +0.974 | 0.0000 | ★★★ |
| V_dynamical | V_dyn = √(GM_bar/Rd) | +0.632 | 0.0021 | ★★★ |
| log_Sigma_d | log₁₀(Σ_d) | +0.622 | 0.0026 | ★★★ |
| M_gas | Gas mass (M_sun) | +0.609 | 0.0034 | ★★★ |
| M_HI | HI mass (M_sun) | +0.609 | 0.0034 | ★★★ |
| T | Hubble type | -0.585 | 0.0053 | ★★ |
| Vbar | Baryonic Vbar (km/s) | +0.582 | 0.0057 | ★★ |
| M_bar_over_Rd2 | M_bar / Rd² | +0.559 | 0.0084 | ★★ |
| Vtot | Predicted Vtot (km/s) | +0.555 | 0.0090 | ★★ |
| Vw | Wave Vw (km/s) | +0.550 | 0.0098 | ★★ |
| Vbar_over_Vf | Vbar / Vf ratio | +0.519 | 0.0158 | ★★ |
| log_M_gas | log₁₀(M_gas) | +0.506 | 0.0193 | ★★ |
| log_M_bar | log₁₀(M_bar) | +0.505 | 0.0196 | ★★ |
| M_bar | Baryonic mass (M_sun) | +0.498 | 0.0214 | ★★ |
| log_M_star | log₁₀(M_star) | +0.449 | 0.0414 | ★★ |
| Sigma_d | Surface density (L/pc²) | +0.426 | 0.0544 | ★★ |
| M_star_over_Rd2 | M_star / Rd² | +0.426 | 0.0544 | ★★ |
| M_star | Stellar mass (M_sun) | +0.389 | 0.0815 | ★ |
Reading the table
The single highest correlation is $V_\text{wave}/V_f = +0.974$. This is tautological: by construction, the error scales directly with $V_\text{wave}$, so this variable simply reflects the structure of the prediction formula, not an external physical driver.
Among the genuine physical observables, the highest correlations are $\log(\Sigma_d) = +0.622$, $V_\text{dynamical} = +0.632$, $M_\text{gas} = +0.609$, and Hubble type $T = -0.585$. These four signals are physically connected: dense disks tend to be more massive, of earlier type, and have higher baryonic dynamical velocity. The question is which is the fundamental driver.
4. Filtering out the redundant variables
Several of the top-correlated variables are themselves strongly correlated with $R_d$, the variable already known to drive the error. The question is which carries independent information.
| Variable | Correlation with $R_d$ | Status |
|---|---|---|
| $\log(M_\star)$ | $r = +0.88$ | Redundant with $R_d$ |
| $\log(M_\text{bar})$ | $r = +0.87$ | Redundant with $R_d$ |
| $\log(M_\text{gas})$ | $r = +0.86$ | Redundant with $R_d$ |
| Hubble type $T$ | $r = -0.66$ | Partially redundant |
| $V_\text{dynamical}$ | $r = +0.50$ | Partially independent |
| $M_\text{bar}/R_d^2$ | $r = -0.19$ | Independent |
| $\log(\Sigma_d)$ | $r = +0.10$ | Independent |
The masses correlate with $R_d$ almost perfectly: a larger disk simply contains more baryonic material. These variables therefore carry essentially the same information as $R_d$ itself. In contrast, $\Sigma_d$ (central surface density) and $M_\text{bar}/R_d^2$ (mean baryonic surface density) are almost orthogonal to $R_d$ in this sample: they capture the structural property of “how compact the matter is”, independently of “how extended the disk is”.
5. Error versus surface density — visualisation
Plotting the error against $\log_{10}(\Sigma_d)$ alone, coloured by Hubble type:
The trend is clear and monotonic: galaxies with higher central surface density are systematically over-predicted by BeeTheory, while diffuse low-density disks are under-predicted. The fit slope of $33$ percentage points per decade of $\Sigma_d$ matches the data robustly across the full range from 15 to 605 $L_\odot/\text{pc}^2$.
6. Bivariate models — comparison
Adding $R_d$ to each candidate variable gives a clearer ranking. The table below shows the explained variance $R^2$ when $R_d$ is paired with each second variable (tautological combinations excluded):
| Bivariate model | $R^2$ | RMS residual | Notes |
|---|---|---|---|
| $\text{err} = a R_d + c$ (univariate baseline) | 0.074 | $19.5\%$ | Reference, no second variable |
| $\text{err} = a R_d + b f_\text{gas} + c$ | 0.101 | $19.3\%$ | Negligible improvement |
| $\text{err} = a R_d + b \log M_\star + c$ | 0.272 | $17.3\%$ | — |
| $\text{err} = a R_d + b V_\text{bar} + c$ | 0.345 | $16.4\%$ | — |
| $\text{err} = a R_d + b \log M_\text{gas} + c$ | 0.359 | $16.3\%$ | — |
| $\text{err} = a R_d + b T + c$ | 0.367 | $16.2\%$ | — |
| $\text{err} = a R_d + b \log M_\text{bar} + c$ | 0.373 | $16.1\%$ | — |
| $\text{err} = a R_d + b\,V_\text{dynamical} + c$ | 0.402 | $15.7\%$ | Strong |
| $\text{err} = a R_d + b \log\Sigma_d + c$ | 0.430 | $15.3\%$ | Independent of $R_d$ |
| $\text{err} = a R_d + b (M_\text{bar}/R_d^2) + c$ | 0.459 | $14.9\%$ | Best non-tautological model |
The best bivariate model
$$\text{err}(\%) \;=\; a\,R_d \;+\; b\,\frac{M_\text{bar}}{R_d^2} \;+\; c, \qquad R^2 = 0.46$$
The variable $M_\text{bar}/R_d^2$ is the mean baryonic surface density of the disk, $\langle \Sigma_\text{bar} \rangle = M_\text{bar}/(\pi R_d^2)$. It carries information about how compact the visible matter is, independently of how large the disk is. This is the variable that BeeTheory currently fails to account for.
7. Closure check — what remains after $R_d$ and $\Sigma_d$ are accounted for
If $R_d$ and $\log \Sigma_d$ together capture the structural defect, the residual of the bivariate fit should be uncorrelated with every physical observable. Testing this is the formal closure check:
| Variable | Correlation with residual | Status |
|---|---|---|
| $R_d$ | $+0.00$ | By construction |
| $\log \Sigma_d$ | $+0.00$ | By construction |
| $\log M_\star$ | $-0.05$ | Absorbed |
| $\log M_\text{bar}$ | $+0.07$ | Absorbed |
| $\log M_\text{gas}$ | $+0.14$ | Absorbed |
| Hubble type $T$ | $-0.04$ | Absorbed |
| $V_\text{dynamical}$ | $+0.08$ | Absorbed |
| $V_\text{bar}$ | $+0.05$ | Absorbed |
| $f_\text{gas}$ | $+0.28$ | Marginal; below significance |
After accounting for $R_d$ and $\log \Sigma_d$, no physical observable retains significant correlation with the residual error. The structural information in the error has been fully captured by these two variables. The remaining $15%$ RMS scatter is consistent with observational uncertainty on the SPARC input parameters and with intrinsic galaxy-to-galaxy variability not captured by any of these aggregate descriptors.
8. Physical interpretation
The current BeeTheory model uses the disk scale length $R_d$ in two places: as the spatial scale of the baryonic distribution (the exponential profile $Sigma propto e^{-R/R_d}$) and as the coherence length of the wave kernel ($ell = c_text{disk},R_d$). The amplitude of the baryonic profile $\Sigma_0$ is implicit, scaled to give the correct stellar mass once integrated.
What surface density represents physically
The mean baryonic surface density $langle Sigma_text{bar} rangle = M_text{bar}/(pi R_d^2)$ is the mass per unit area of the disk. Two galaxies with the same $R_d$ but different $\Sigma_d$ have the same geometric extent but different amounts of matter packed in. The current model treats only the geometric extent ($R_d$) as relevant to the wave-coherence length, ignoring how concentrated the matter is. This is precisely the parameter that the residual analysis identifies as missing.
The direction of the effect
The correlation is positive: the error grows with surface density. This means that for fixed $R_d$, denser disks are over-predicted by the model — the wave field is too strong relative to the rotation curve. Conversely, for a given $R_d$, the model under-predicts diffuse low-density disks. A plausible physical interpretation: the wave coherence length should depend not only on the geometric extent of the source but also on its concentration, with denser matter producing a more localised wave response. This would naturally suppress the wave-field amplitude in high-$\Sigma$ disks and enhance it in low-$\Sigma$ ones.
9. Summary of Step 1
1. On the 22-galaxy calibration set, the prediction error correlates most strongly with the central surface density $\Sigma_d$ ($r = +0.62$) among genuine physical observables.
2. Other variables that initially appear strongly correlated (stellar mass, gas mass, baryonic mass) turn out to be highly redundant with $R_d$ (correlations $\geq 0.86$ with $R_d$) and therefore carry little new information.
3. The best non-tautological bivariate model is $\text{err} = a\,R_d + b\,(M_\text{bar}/R_d^2) + c$, with $R^2 = 0.46$ and RMS residual $14.9\%$. The second variable is the mean baryonic surface density of the disk.
4. After accounting for $R_d$ and $\Sigma_d$, no other observable retains significant correlation with the residual. The diagnostic is closed.
5. The missing parameter is identified: the current BeeTheory model accounts for the geometric extent of the baryonic distribution ($R_d$) but not for its surface density ($\Sigma_d$). The next step is to incorporate $\Sigma_d$ as a second input to the wave-coherence length, then to refit the model on the 22-galaxy set.
References. Lelli, F., McGaugh, S. S., Schombert, J. M. — SPARC: Mass Models for 175 Disk Galaxies with Spitzer Photometry and Accurate Rotation Curves, AJ 152, 157 (2016). · Pearson, K. — Mathematical contributions to the theory of evolution III, Phil. Trans. R. Soc. A 187, 253 (1896). Correlation coefficient. · Dutertre, X. — Bee Theory™: Wave-Based Modeling of Gravity, v2, BeeTheory.com (2023).
BeeTheory.com — Wave-based quantum gravity · Diagnostic step 1 · © Technoplane S.A.S. 2026