BeeTheory · Foundations · Technical Note XI

Identifying the Missing Parameter:
Step 1 — Systematic Correlation Analysis

Before modifying the model, this note diagnoses which observable parameter best predicts the residual error. Working on the 22-galaxy calibration set of Note VIII, we test the correlation of the prediction error with every physically meaningful variable, then with every bivariate combination, to identify rigorously what the current model has omitted.

1. The result first

The missing parameter is the central surface density

The central baryonic surface density $\Sigma_d$ has the strongest non-trivial correlation with the prediction error: $r = +0.62$, $R^2 = 0.39$ on its own.

Combining $\Sigma_d$ with the disk size $R_d$ in a bivariate model explains $R^2 = 0.43$ of the residual variance, compared to $R^2 = 0.07$ with $R_d$ alone. The RMS residual drops from $19.5\%$ to $14.9\%$.

After absorbing both $R_d$ and $\Sigma_d$, no additional physical observable carries information about the residual.

2. Method

Working on the 22-galaxy calibration set (Note VIII), for each galaxy we have the prediction error $text{err} = (V_text{tot} – V_f)/V_f$ and a list of measurable physical parameters. We compute the Pearson and Spearman correlations between the error and each candidate variable, then test bivariate regressions of the form:

$$\text{err}(\%) \;=\; a \cdot R_d \;+\; b \cdot X \;+\; c$$

where $X$ is each candidate variable. The best $X$ is the one that maximises the explained variance $R^2$ on the 22 galaxies. Self-referential variables — those derived from the model output, like $V_\text{wave}$ or $V_\text{tot}$ — are excluded from the search, since their correlation with the error is tautological.

3. Univariate correlations

The 24 candidate variables tested, ranked by absolute Pearson correlation with the error. Rows shaded gold are variables derived from the model itself (tautological); rows shaded red are genuine physical observables with $|r| > 0.5$.

Variable	Description	Pearson $r$	$p$-value	Significance
Vw_over_Vf	Vw / Vf ratio	+0.974	0.0000	★★★
V_dynamical	V_dyn = √(GM_bar/Rd)	+0.632	0.0021	★★★
log_Sigma_d	log₁₀(Σ_d)	+0.622	0.0026	★★★
M_gas	Gas mass (M_sun)	+0.609	0.0034	★★★
M_HI	HI mass (M_sun)	+0.609	0.0034	★★★
T	Hubble type	-0.585	0.0053	★★
Vbar	Baryonic Vbar (km/s)	+0.582	0.0057	★★
M_bar_over_Rd2	M_bar / Rd²	+0.559	0.0084	★★
Vtot	Predicted Vtot (km/s)	+0.555	0.0090	★★
Vw	Wave Vw (km/s)	+0.550	0.0098	★★
Vbar_over_Vf	Vbar / Vf ratio	+0.519	0.0158	★★
log_M_gas	log₁₀(M_gas)	+0.506	0.0193	★★
log_M_bar	log₁₀(M_bar)	+0.505	0.0196	★★
M_bar	Baryonic mass (M_sun)	+0.498	0.0214	★★
log_M_star	log₁₀(M_star)	+0.449	0.0414	★★
Sigma_d	Surface density (L/pc²)	+0.426	0.0544	★★
M_star_over_Rd2	M_star / Rd²	+0.426	0.0544	★★
M_star	Stellar mass (M_sun)	+0.389	0.0815	★

Reading the table

The single highest correlation is $V_\text{wave}/V_f = +0.974$. This is tautological: by construction, the error scales directly with $V_\text{wave}$, so this variable simply reflects the structure of the prediction formula, not an external physical driver.

Among the genuine physical observables, the highest correlations are $\log(\Sigma_d) = +0.622$, $V_\text{dynamical} = +0.632$, $M_\text{gas} = +0.609$, and Hubble type $T = -0.585$. These four signals are physically connected: dense disks tend to be more massive, of earlier type, and have higher baryonic dynamical velocity. The question is which is the fundamental driver.

4. Filtering out the redundant variables

Several of the top-correlated variables are themselves strongly correlated with $R_d$, the variable already known to drive the error. The question is which carries independent information.

Variable	Correlation with $R_d$	Status
$\log(M_\star)$	$r = +0.88$	Redundant with $R_d$
$\log(M_\text{bar})$	$r = +0.87$	Redundant with $R_d$
$\log(M_\text{gas})$	$r = +0.86$	Redundant with $R_d$
Hubble type $T$	$r = -0.66$	Partially redundant
$V_\text{dynamical}$	$r = +0.50$	Partially independent
$M_\text{bar}/R_d^2$	$r = -0.19$	Independent
$\log(\Sigma_d)$	$r = +0.10$	Independent

The masses correlate with $R_d$ almost perfectly: a larger disk simply contains more baryonic material. These variables therefore carry essentially the same information as $R_d$ itself. In contrast, $\Sigma_d$ (central surface density) and $M_\text{bar}/R_d^2$ (mean baryonic surface density) are almost orthogonal to $R_d$ in this sample: they capture the structural property of “how compact the matter is”, independently of “how extended the disk is”.

5. Error versus surface density — visualisation

Plotting the error against $\log_{10}(\Sigma_d)$ alone, coloured by Hubble type:

Univariate fit error$(\%) = 33\log(\Sigma_d) – 60$, Pearson $r = 0.62$, $R^2 = 0.39$.

The trend is clear and monotonic: galaxies with higher central surface density are systematically over-predicted by BeeTheory, while diffuse low-density disks are under-predicted. The fit slope of $33$ percentage points per decade of $\Sigma_d$ matches the data robustly across the full range from 15 to 605 $L_\odot/\text{pc}^2$.

6. Bivariate models — comparison

Adding $R_d$ to each candidate variable gives a clearer ranking. The table below shows the explained variance $R^2$ when $R_d$ is paired with each second variable (tautological combinations excluded):

Bivariate model	$R^2$	RMS residual	Notes
$\text{err} = a R_d + c$ (univariate baseline)	0.074	$19.5\%$	Reference, no second variable
$\text{err} = a R_d + b f_\text{gas} + c$	0.101	$19.3\%$	Negligible improvement
$\text{err} = a R_d + b \log M_\star + c$	0.272	$17.3\%$	—
$\text{err} = a R_d + b V_\text{bar} + c$	0.345	$16.4\%$	—
$\text{err} = a R_d + b \log M_\text{gas} + c$	0.359	$16.3\%$	—
$\text{err} = a R_d + b T + c$	0.367	$16.2\%$	—
$\text{err} = a R_d + b \log M_\text{bar} + c$	0.373	$16.1\%$	—
$\text{err} = a R_d + b\,V_\text{dynamical} + c$	0.402	$15.7\%$	Strong
$\text{err} = a R_d + b \log\Sigma_d + c$	0.430	$15.3\%$	Independent of $R_d$
$\text{err} = a R_d + b (M_\text{bar}/R_d^2) + c$	0.459	$14.9\%$	Best non-tautological model

The best bivariate model

$$\text{err}(\%) \;=\; a\,R_d \;+\; b\,\frac{M_\text{bar}}{R_d^2} \;+\; c, \qquad R^2 = 0.46$$

The variable $M_\text{bar}/R_d^2$ is the mean baryonic surface density of the disk, $\langle \Sigma_\text{bar} \rangle = M_\text{bar}/(\pi R_d^2)$. It carries information about how compact the visible matter is, independently of how large the disk is. This is the variable that BeeTheory currently fails to account for.

7. Closure check — what remains after $R_d$ and $\Sigma_d$ are accounted for

If $R_d$ and $\log \Sigma_d$ together capture the structural defect, the residual of the bivariate fit should be uncorrelated with every physical observable. Testing this is the formal closure check:

Variable	Correlation with residual	Status
$R_d$	$+0.00$	By construction
$\log \Sigma_d$	$+0.00$	By construction
$\log M_\star$	$-0.05$	Absorbed
$\log M_\text{bar}$	$+0.07$	Absorbed
$\log M_\text{gas}$	$+0.14$	Absorbed
Hubble type $T$	$-0.04$	Absorbed
$V_\text{dynamical}$	$+0.08$	Absorbed
$V_\text{bar}$	$+0.05$	Absorbed
$f_\text{gas}$	$+0.28$	Marginal; below significance

After accounting for $R_d$ and $\log \Sigma_d$, no physical observable retains significant correlation with the residual error. The structural information in the error has been fully captured by these two variables. The remaining $15%$ RMS scatter is consistent with observational uncertainty on the SPARC input parameters and with intrinsic galaxy-to-galaxy variability not captured by any of these aggregate descriptors.

8. Physical interpretation

The current BeeTheory model uses the disk scale length $R_d$ in two places: as the spatial scale of the baryonic distribution (the exponential profile $Sigma propto e^{-R/R_d}$) and as the coherence length of the wave kernel ($ell = c_text{disk},R_d$). The amplitude of the baryonic profile $\Sigma_0$ is implicit, scaled to give the correct stellar mass once integrated.

What surface density represents physically

The mean baryonic surface density $langle Sigma_text{bar} rangle = M_text{bar}/(pi R_d^2)$ is the mass per unit area of the disk. Two galaxies with the same $R_d$ but different $\Sigma_d$ have the same geometric extent but different amounts of matter packed in. The current model treats only the geometric extent ($R_d$) as relevant to the wave-coherence length, ignoring how concentrated the matter is. This is precisely the parameter that the residual analysis identifies as missing.

The direction of the effect

The correlation is positive: the error grows with surface density. This means that for fixed $R_d$, denser disks are over-predicted by the model — the wave field is too strong relative to the rotation curve. Conversely, for a given $R_d$, the model under-predicts diffuse low-density disks. A plausible physical interpretation: the wave coherence length should depend not only on the geometric extent of the source but also on its concentration, with denser matter producing a more localised wave response. This would naturally suppress the wave-field amplitude in high-$\Sigma$ disks and enhance it in low-$\Sigma$ ones.

9. Summary of Step 1

1. On the 22-galaxy calibration set, the prediction error correlates most strongly with the central surface density $\Sigma_d$ ($r = +0.62$) among genuine physical observables.

2. Other variables that initially appear strongly correlated (stellar mass, gas mass, baryonic mass) turn out to be highly redundant with $R_d$ (correlations $\geq 0.86$ with $R_d$) and therefore carry little new information.

3. The best non-tautological bivariate model is $\text{err} = a\,R_d + b\,(M_\text{bar}/R_d^2) + c$, with $R^2 = 0.46$ and RMS residual $14.9\%$. The second variable is the mean baryonic surface density of the disk.

4. After accounting for $R_d$ and $\Sigma_d$, no other observable retains significant correlation with the residual. The diagnostic is closed.

5. The missing parameter is identified: the current BeeTheory model accounts for the geometric extent of the baryonic distribution ($R_d$) but not for its surface density ($\Sigma_d$). The next step is to incorporate $\Sigma_d$ as a second input to the wave-coherence length, then to refit the model on the 22-galaxy set.

References. Lelli, F., McGaugh, S. S., Schombert, J. M. — SPARC: Mass Models for 175 Disk Galaxies with Spitzer Photometry and Accurate Rotation Curves, AJ 152, 157 (2016). · Pearson, K. — Mathematical contributions to the theory of evolution III, Phil. Trans. R. Soc. A 187, 253 (1896). Correlation coefficient. · Dutertre, X. — Bee Theory™: Wave-Based Modeling of Gravity, v2, BeeTheory.com (2023).

Identifying the Missing Parameter:Step 1 — Systematic Correlation Analysis