BeeTheory · Foundations · Technical Note X

Anatomy of the Residuals:
A Linear Trend with Disk Size

The 94-galaxy blind test of Note IX showed a systematic residual trend with disk size. This note characterises that trend quantitatively, isolates the largest deviations on each side, and identifies the structural origin of the dispersion.

1. The result first

A linear residual, two opposite populations

The prediction error scales linearly with the disk scale length: $\text{error}\,(\%) \approx -31.7 + 12.8\,R_d$, with Pearson correlation $r = +0.75$. The line crosses zero at $R_d = 2.48$ kpc, essentially the disk size of the Milky Way that anchored the calibration. The two extremes of this regression correspond to two physically distinct outlier populations: large massive spirals (over-predicted) at one end, compact dwarfs (under-predicted) at the other.

2. The residual is linear in $R_d$

Plotting the prediction error against $R_d$, with each point coloured by Hubble type, makes the linearity of the trend immediately visible. The red line is the linear regression of the error on $R_d$ over all 94 blind galaxies.

94 blind galaxies plotted versus disk size, coloured by Hubble type. The red line is the linear regression of the error on $R_d$. It crosses zero at $R_d = 2.48$ kpc — essentially the disk size that anchored the original calibration.

Error as a function of disk size

$$\text{error}\,(\%) \;\approx\; -31.7 \;+\; 12.8 \times R_d \,[\text{kpc}]$$

Linear fit on 94 blind galaxies, Pearson $r = +0.75$, RMSE of residuals $= 18.4\%$.

Comparison of functional forms

Several alternative parametrisations were compared. The linear form is statistically indistinguishable from log and square-root alternatives:

Model	Pearson $r$	RMSE	Comment
$\text{err} = a + b\,R_d$ (linear)	$+0.749$	$18.4\%$	Cleanest analytical form
$\text{err} = a + b\,\log_{10}R_d$	$+0.748$	$18.4\%$	Statistically equivalent
$\text{err} = a + b\,\sqrt{R_d}$	$+0.768$	$17.7\%$	Marginally better, no real gain
$\text{err} = a + b\,R_d + c\,R_d^2$	—	$17.8\%$	Quadratic term very small ($c \approx -1.1$)

The linear form is therefore adopted as the simplest faithful description of the data.

Hubble type distribution along the line

Hubble class	$N$	Median $R_d$ (kpc)	Median error	Position
S0–Sa (early-type)	4	2.9	$+0.0\%$	Centre, near the zero crossing
Sb–Sbc (intermediate)	23	3.2	$+3.9\%$	Right of centre; tail in the over-predicted region
Sc–Scd (late spiral)	27	2.5	$+7.7\%$	Spread across the diagram
Sd–Im (dwarf / irregular)	40	1.6	$-3.2\%$	Left side; tail in the under-predicted region

The colour pattern in the figure is not an independent signature from the linear trend — it is the same signature seen through the morphology axis. The Hubble sequence in disk galaxies correlates with disk size: late-type dwarfs are predominantly compact, intermediate spirals are predominantly large. Each colour therefore sits along a different stretch of the regression line, with Sd–Im on the left, Sc–Scd at the centre, and Sb–Sbc on the right.

A structural residual, not random noise

A scatter that depends linearly on a single physical parameter, and crosses zero at the calibration point, is the signature of a missing additive constant in one of the model’s relations, not of random observational scatter. The deviation is correctable: it can be absorbed by a single additional degree of freedom in the coherence-length law.

3. The ten most over-predicted galaxies

These are the galaxies for which BeeTheory predicts a flat rotation velocity higher than observed. Sorted by the size of the residual:

Galaxy	Hubble type	$R_d$ (kpc)	$M_\star/10^{10}$	$f_\text{gas}$	$\Sigma_d$	$V_f$	$V_\text{tot}$	Error
UGC00128	Sd-Im	7.50	1.06	0.39	60	135	243	+80.0%
NGC0801	Sb-Sbc	5.80	2.01	0.32	190	208	326	+56.6%
NGC2955	Sb-Sbc	5.50	3.99	0.23	420	266	406	+52.7%
UGC02885	Sc-Scd	8.50	3.40	0.41	150	290	441	+52.0%
NGC0925	Sc-Scd	3.10	0.22	0.75	72	105	155	+48.0%
NGC6195	Sb-Sbc	5.20	3.40	0.26	400	260	380	+46.3%
NGC6674	Sb-Sbc	5.50	3.33	0.29	350	260	380	+46.2%
NGC5033	Sb-Sbc	4.50	1.27	0.46	200	195	280	+43.7%
UGC02487	S0-Sa	7.50	5.30	0.23	300	330	465	+40.8%
NGC6503	Sc-Scd	2.40	0.38	0.55	210	121	168	+38.9%

Property	Median value	Range	Comparison
$R_d$	4.5 kpc	2.4 – 8.5	$2\times$ larger than median
$M_\star$	$1.3 \times 10^{10}\,M_\odot$	$2.2 \times 10^{9}$ – $5.3 \times 10^{10}$	$8\times$ more massive
$f_\text{gas}$	$0.41$	$0.23$ – $0.87$	Below median (0.64)
Hubble $T$	$5$ (Sbc)	$1$ – $8$	Concentrated in intermediate spirals
$V_f$	$195$ km/s	$69$ – $330$	Fastest rotators in the sample

Profile of the over-predicted group

Large, massive, intermediate-type spirals. These galaxies sit on the right side of the regression line, well above the zero crossing. The model’s coherence-length law $\ell = c_\text{disk}\,R_d$ produces values of $\ell$ above 20 kpc in this regime, generating more wave-field mass than the observed rotation requires.

4. The ten most under-predicted galaxies

These are the galaxies for which BeeTheory predicts a flat rotation velocity lower than observed. Sorted by the size of the residual:

Galaxy	Hubble type	$R_d$ (kpc)	$M_\star/10^{10}$	$f_\text{gas}$	$\Sigma_d$	$V_f$	$V_\text{tot}$	Error
NGC6789	Sd-Im	0.30	0.01	0.53	250	60	22	-63.0%
UGC05764	Sd-Im	0.40	0.00	0.86	80	57	31	-45.6%
UGCA442	Sd-Im	1.00	0.00	0.85	15	57	32	-44.2%
NGC4138	S0-Sa	1.30	0.13	0.33	250	150	85	-43.6%
NGC4389	Sb-Sbc	1.20	0.07	0.37	150	110	62	-43.4%
NGC4085	Sb-Sbc	1.20	0.09	0.42	200	135	79	-41.1%
NGC2915	Sd-Im	0.50	0.01	0.84	160	85	53	-38.2%
NGC2976	Sb-Sbc	0.75	0.04	0.29	220	80	50	-37.4%
NGC4183	Sc-Scd	1.60	0.03	0.81	40	110	70	-36.3%
UGCA281	Sd-Im	0.50	0.01	0.63	80	40	26	-36.1%

Property	Median value	Range	Comparison
$R_d$	1.1 kpc	0.30 – 1.80	$2\times$ smaller than median
$M_\star$	$2.7 \times 10^{8}\,M_\odot$	$4 \times 10^{7}$ – $1.3 \times 10^{9}$	$6\times$ less massive
$f_\text{gas}$	$0.58$	$0.29$ – $0.86$	Below median (0.64)
Hubble $T$	$8$ (Sd)	$1$ – $10$	Concentrated in late-type dwarfs
$V_f$	$82$ km/s	$40$ – $150$	Slow rotators

Profile of the under-predicted group

Compact, low-mass dwarfs and small spirals. These galaxies sit on the left side of the regression line, well below the zero crossing. The coherence-length law $\ell = c_\text{disk}\,R_d$ produces $\ell$ of order $1$–$3$ kpc in this regime, possibly too short to gather the full extent of the wave field.

5. Side-by-side comparison of the three groups

Property (median)	Over-predicted (err > +30%, $N = 15$)	Well-predicted (\|err\| ≤ 30%, $N = 67$)	Under-predicted (err < -30%, $N = 12$)
$R_d$ (kpc)	4.5	2.4	1.1
$M_\star / 10^{10}$	1.27	0.15	0.027
$M_\text{gas} / 10^{10}$	0.93	0.27	0.04
$f_\text{gas}$	0.41	0.64	0.58
$\Sigma_d$	200	140	115
Hubble $T$	5 (Sbc)	6 (Sc)	8 (Sd)
$V_f$ (km/s)	195	113	82

Every property varies monotonically from left to right. The over-predicted group is larger, more massive, more star-dominated and faster-rotating; the under-predicted group is smaller, lighter, gas-rich and slower; the well-predicted majority sits in between. The Milky Way ($R_d = 2.6$ kpc, $V_f approx 230$ km/s) falls naturally within the well-predicted regime where the calibration was anchored.

6. Interpretation

The model has a single coupling parameter $\lambda$ and three universal geometric constants $(c_\text{disk}, c_\text{sph}, c_\text{arm})$. These were determined on a galaxy of intermediate size (the Milky Way, $R_d = 2.6$ kpc) and validated on twenty-two galaxies of similar size range. The blind test of Note IX shows that they generalise reasonably well, but with a residual that drifts linearly with disk size.

An affine correction is sufficient

The linearity of the residual in $R_d$ — well-fit by a single straight line crossing zero at $R_d = 2.48$ kpc — is the signature of a missing additive offset in the coherence-length relation. The current law $\ell = c_\text{disk}\,R_d$ ties the wave-coherence length strictly proportionally to the disk scale. Replacing it with an affine relation $\ell = c_\text{disk}(R_d – R_0)$, where $R_0$ is a small offset of about $2.5$ kpc, would produce a residual that vanishes at the calibration point and grows linearly on either side — exactly the pattern observed.

The well-predicted majority is broadly representative

Two thirds of the sample fall in the well-predicted band. These 67 galaxies span the full range of Hubble types and a factor of $sim 100$ in stellar mass. The model’s domain of validity is not narrow: it covers most of the SPARC population, with deviations concentrated at the two extremes of disk size, exactly as a linear $R_d$-dependent residual would produce.

7. Summary

1. The prediction error of the 94-galaxy blind test follows a clean linear trend in disk scale length: $\text{error}(\%) \approx -31.7 + 12.8\,R_d$, with Pearson $r = +0.75$ and RMSE of residuals $= 18.4\%$.

2. The linear regression crosses zero at $R_d = 2.48$ kpc, essentially the disk size of the Milky Way that anchored the calibration. The two ends of the line correspond to two physically distinct outlier populations.

3. The 15 galaxies over-predicted by more than $+30\%$ are large, massive, intermediate-type spirals: median $R_d = 4.5$ kpc, $M_\star \approx 10^{10}\,M_\odot$, $V_f \approx 200$ km/s.

4. The 12 galaxies under-predicted by more than $-30\%$ are compact, low-mass dwarfs: median $R_d = 1.1$ kpc, $M_\star \approx 3 \times 10^{8}\,M_\odot$, $V_f \approx 80$ km/s.

5. The deviation is absorbable by an affine correction to the coherence-length law, $\ell = c_\text{disk}(R_d – R_0)$, with $R_0 \approx 2.5$ kpc — introducing a single new constant.

References. Lelli, F., McGaugh, S. S., Schombert, J. M. — SPARC: Mass Models for 175 Disk Galaxies with Spitzer Photometry and Accurate Rotation Curves, AJ 152, 157 (2016). · de Vaucouleurs, G. et al. — Third Reference Catalogue of Bright Galaxies, Springer (1991). · McGaugh, S. S. — The third law of galactic rotation, Galaxies 2, 601 (2014). · Dutertre, X. — Bee Theory™: Wave-Based Modeling of Gravity, v2, BeeTheory.com (2023).

Anatomy of the Residuals:A Linear Trend with Disk Size