diff --git a/inst/validation/method_refactor_report.md b/inst/validation/method_refactor_report.md new file mode 100644 index 0000000..21441a4 --- /dev/null +++ b/inst/validation/method_refactor_report.md @@ -0,0 +1,81 @@ +# Joint G-Computation for ARTnet-Based MSM Network Parameterization: A Methodological Refactor + +_ARTnet 2.9.0_ + +## Introduction + +The ARTnet study (Weiss et al., *Epidemics* 2020) is an anonymous cross-sectional web-based survey of men who have sex with men (MSM) in the United States, conducted in 2017–2018, that has become a canonical input for parameterizing temporal exponential random graph models (TERGMs) of sexual partnership networks in the EpiModelHIV ecosystem. The `ARTnet` R package transforms the survey microdata into three objects consumed downstream by `EpiModelHIV-p`: epidemic parameters (`build_epistats`), per-respondent network statistics (`build_netparams`), and population-scaled ERGM target statistics (`build_netstats`). The current pipeline has carried the same analytic conventions for several years, with refinements layered into geographic stratification, age-range flexibility, and race-stratification options, but the underlying estimation strategy for each ERGM target statistic — a separate univariate Poisson, binomial, or linear regression of that statistic on a single attribute at a time — remained unchanged. + +The motivation for the present refactor arose from a sister project, **ARTnetPredict**, which is building machine-learning-based forward projections of ARTnet target statistics to incorporate the 2022–2024 American Men's Internet Survey (AMIS) waves. Specifying that forward projection surfaced a methodological concern that applied to the ARTnet baseline as well: under the legacy "univariate marginal" approach, each per-attribute estimate carries ARTnet's *conditional joint distribution of the other attributes* baked in. When `build_netstats()` then applies these per-attribute estimates to a synthetic target population whose attribute joint distribution differs from ARTnet's — Atlanta MSM, NHBS-aligned demographics, or any 2024 projected population — three problems emerge. First, attribute interactions on degree (e.g., age × race effects on partnership formation) are structurally invisible to univariate fits, so the marginal estimate is biased in any direction in which such interactions exist. Second, the resulting target statistics are mutually inconsistent: edges computed from `md.main * num / 2` need not equal edges computed as `Σ_r table(race) * nf.race[r] / 2`, and the long-standing `edges.avg` argument in `build_netstats()` was a tell that the math did not close. Third, the target population sampled in `build_netstats()` was a patchwork of reference sources (NCHS general-population age pyramid, `ARTnetData::race.dist`, ARTnet's own degree and role distributions), with no coherent specification of who exactly the population was. + +Before applying any ML-based 2017→2024 projection, the team wanted the within-ARTnet baseline analytics to be defensible on their own terms. The refactor implements joint Poisson and binomial generalized linear models that fit all attributes simultaneously, applies g-computation to aggregate per-respondent or per-dyad predictions against an explicitly-specified synthetic target population, and exposes a unified post-stratification API for that target. The methodological move is from estimating per-attribute marginal effects (which transport poorly across populations) to estimating attribute-conditional effects and predicting at the target's joint distribution — direct standardization in the sense of Hernán & Robins (*Causal Inference: What If*, ch. 13). The work spans nine merged pull requests over the package's `main` branch (#66 through #77) plus three documentation issues (#59, #65, #72), with strict backward compatibility verified throughout via a snapshot-based regression harness. + +## Methods + +The refactor introduces three new arguments across the ARTnet API: `method = c("existing", "joint")` on both `build_netparams()` and `build_netstats()`; `duration.method = c("empirical", "joint_lm")` on `build_netparams()`; and `target_pop` (NULL, list, data.frame, or character) on `build_netstats()`. All defaults preserve byte-identical legacy behavior. + +**Joint estimation in `build_netparams`.** Under `method = "joint"`, seven model objects are fit per network layer (main, casual, one-time partnerships): + +- A **joint Poisson GLM** of partnership count on `age.grp + sqrt(age.grp) + race + cross-layer-degree + hiv2`, with AIC-based selection over `age.grp:race` and `age.grp:cross-layer-degree` interactions. This replaces the five univariate Poisson fits the legacy method ran for `md`, `nf.age.grp`, `nf.race`, `nf.deg.`, and `nf.diag.status`. +- A **joint binomial GLM** of the concurrency indicator (`deg > 1`) on the same RHS, for the main and casual layers. The binomial form was chosen over a Poisson-derived `P(deg > 1) = 1 - exp(-λ)(1+λ)` because `deg.main` and `deg.casl` are truncated in the training data at 2 and 3 respectively; the Poisson-implied tail probability is biased upward by orders of magnitude. +- **Joint binomial GLMs** for `nodematch` on `same.age.grp` and `same.race`, fit on long-form partnership data with ego attributes only on the right-hand side. Partnership pairs are treated as marginally generated for now (a deliberate choice — Option A in the original issue thread); the ego side comes from the corrected target population. +- **Joint Gaussian regressions** for `absdiff_age` and `absdiff_sqrt.age`, with the same ego-attribute RHS structure. +- A **joint log-linear regression** of `log(duration.time)` among ongoing partnerships under `duration.method = "joint_lm"`, with ego attributes plus `same.age.grp` and `same.race` matching terms. + +**G-computation aggregation in `build_netstats`.** Under `method = "joint"`, target statistics are computed by aggregating per-synthetic-node predictions against the synthetic population. Edges become `sum(pred_deg) / 2`, and `nodefactor_[level] = sum(pred_deg | attr == level)`. By construction, `Σ_ nodefactor_[level] = 2 × edges` to machine precision — the long-standing `edges.avg` inconsistency is resolved structurally. For dyad-level targets, per-ego predicted partnership properties are weighted by per-ego predicted degree: `nodematch_[level] = sum(pred_deg * pred_)[ego in level] / 2`. For dissolution offsets, partner-race uncertainty is marginalized via the `joint_nm_race_model` predictions before per-stratum medians are computed and run through the existing geometric transformation `mean.dur.adj = 1 / (1 - 2^(-1 / median))`. + +A discussion early in the work (issue #71) led to the deliberate choice that the duration target stat is the **mean age of extant ties at cross-section**, not mean full partnership duration. Under TERGM's geometric (constant-hazard) dissolution offset, these two quantities coincide for partnerships with exponentially-distributed durations and diverge otherwise; the simulated network can match the former exactly but not the latter. A length-bias-corrected Weibull MLE explored during development confirmed that ARTnet's per-stratum partnership-duration shape parameter is far from 1 (`k ≈ 0.6` for casual partnerships, `k > 2` for older-matched main partnerships), but the direct mean-duration estimate from a Weibull AFT fit cannot be honored by the geometric simulation. The empirical and joint_lm methods both target the cross-sectional age statistic that *can* be honored. + +**Post-stratification API.** The `target_pop` argument unifies what was previously several arguments. As a named list, it overrides any subset of `{age.pyramid, race.prop, deg.casl, deg.main, deg.tot, role.class, risk.grp}` against the legacy default sources. As a data.frame, it supplies the synthetic population in full — required columns `age`, `deg.casl`, `deg.main`, `role.class`, `risk.grp` (plus `race` when race-stratified) — bypassing attribute sampling entirely. A character form is reserved for a future bundle of geography-specific defaults (e.g., `"atlanta"`, `"us_msm_male"`) constructed from data already in `ARTnetData::race.dist` plus the NCHS age pyramid. + +**Validation and reproducibility infrastructure.** Three artifacts under `inst/validation/` support reproducibility. A snapshot harness (`validate_backward_compat.R`) captures golden-reference output of the build_* pipeline on three parameter sets (`atlanta_default`, `national_no_geog`, `atlanta_no_race`) and compares against any subsequent code state at machine precision; defaults across all nine refactor PRs match 3/3 throughout. A pinned reference copy of the downstream `EpiModelHIV-Template/R/A-networks/` scripts documents the public contract that any change must preserve. A method-comparison harness (`method_comparison.R`) runs both methods across four scenarios, walks every numeric target stat, and produces a side-by-side comparison table at `inst/validation/method_comparison.md`. A GitHub Actions workflow runs `R CMD check` on every push and pull request, with `ARTnetData` installed via a fine-grained PAT secret so all package functionality (including the joint fits) is exercised in CI rather than skipped. The full testthat suite contains 571 assertions covering the joint formation fits, joint dyad fits, joint duration fits, target_pop API, parameterization edge cases, and method-comparison structure. + +## Results + +We compared `method = "existing"` against `method = "joint"` + `duration.method = "joint_lm"` across four scenarios: the EpiModelHIV-Template baseline (`atlanta_default`, race-stratified, Atlanta city), no-geography (`national_no_geog`), an NHBS-MSM-shifted Atlanta (`atlanta_nhbs_shifted` with `race.prop = c(0.35, 0.25, 0.40)`), and an Atlanta variant with `race = FALSE`. Across **363 target-statistic cells** (every numeric entry in the netstats per-layer namespace, summed across scenarios), **229 cells (63%) shift by more than 5%** under joint-vs-existing. Per-scenario, the materially-shifted counts are 63/96 (default), 66/96 (NHBS-shifted), 51/96 (national), and 49/75 (no race). The fraction is stable across scenarios with the NHBS-shifted population producing slightly more material shifts than Atlanta default (66 vs 63), consistent with the marginal-vs-joint hypothesis: the larger the gap between target-population and ARTnet-sample composition, the more the correction matters. + +The largest shifts cluster on three target-stat families. **Dissolution durations on matched-and-old strata** drop substantially: `main$diss.byage$duration[6]` (matched within the oldest age group) moves from 934.4 to 491.5 weeks (−47%) on default Atlanta, with the same direction across all three race-stratified scenarios. The empirical method takes a stratum-level median over a small per-cell sample with a few long ongoing partnerships; the joint log-linear smooths across the full sample via the multivariate fit. The legacy `smooth.main.dur` heuristic (averaging matched.5 with matched.4) brings the empirical from 1186 to 934, but joint_lm's matched.5 already lives at 476 before smoothing — the residual ≈2× gap is the difference between two-neighbor averaging and full multivariate borrowing of strength. **One-time partnership nodematch in older age groups** shifts in proportion: `inst$nodematch_age.grp[5]` moves from 7.9 to 3.9 (−51%) on default Atlanta. The dyad-level joint binomial picked up an `age.grp:race.cat.num` interaction that was structurally invisible to the marginal fit, and inst's older age-stratified cells are sparse enough that the marginal estimate was dominated by noise the multivariate fit shrinks toward the population mean. **High-degree casual nodefactor** shifts the other direction: `casl$nodefactor_deg.main[3]` rises from 11.7 to 16.4 (+40%). The joint Poisson on casual degree finds a weaker negative `deg.main` effect once age and HIV status are controlled for, raising the predicted casual edge contribution from respondents reporting two main partners. + +A decomposition of the −15% main-edges shift on Atlanta default illustrates the correction concretely. ARTnet's sample is 80.7% White/Other, 13.8% Hispanic, and 5.5% Black; Atlanta's MSM population (per `ARTnetData::race.dist`) is 51.5% Black, 4.6% Hispanic, and 43.9% White/Other. Per-race average predicted main degree from the joint Poisson is 0.272 (Black), 0.408 (Hispanic), and 0.408 (White/Other). ARTnet's marginal mean main degree is 0.398, applied as `0.398 × 5000 / 2 = 995` edges under the legacy method. The joint method aggregates per-race predictions across Atlanta's actual race composition: `0.515 × 0.272 + 0.046 × 0.408 + 0.439 × 0.408 = 0.338`, giving 845 edges — exactly the −15% shift. The legacy pipeline's edges target implicitly assumed Atlanta MSM had ARTnet's sample race mix. + +Coefficient comparisons on the joint Poisson for main degree show several attribute effects strengthen substantially after adjustment. The `deg.casl` slope moves from −0.24 (marginal) to −0.55 (joint) — the apparent within-respondent substitution between casual and main partnerships is more than twice as strong once age is controlled for, consistent with the marginal estimate being attenuated by the positive correlation between `deg.casl` and youth (and youth's positive direct effect on main degree). The `hiv2` slope moves from +0.09 to +0.25 in the same direction. The age profile (`age.grp + sqrt(age.grp)`) sharpens by approximately 20% — the steeper slope post-adjustment reflects that older respondents in ARTnet differ from younger respondents on more than age alone. AIC selects an `age.grp:deg.casl` interaction (+0.10) that is structurally invisible to the univariate approach. + +End-to-end ERGM estimation under both methods (Atlanta defaults, network.size = 5000, Stochastic-Approximation) converges cleanly across all six target ERGMs (3 layers × 2 methods). Static `netdx` diagnostics on the default-method main model match all formation target stats within `|Z| ≤ 2.05` and `|% diff| ≤ 4.2%` over 1000 simulations. The joint method's `coef.form[edges]` is consistent with its lower edge target, and no degeneracy or sampling pathology appears in either path. + +The complete per-scenario, per-target-stat comparison is committed at `inst/validation/method_comparison.md`. Backward-compat snapshot harness matches 3/3 across all parameter sets under default arguments, explicit `method = "existing"`, and the full legacy combination of `method = "existing"` + `duration.method = "empirical"` + `target_pop = NULL`. The full testthat suite (571 assertions) passes, and `R CMD check` is clean (0 errors, 0 warnings, 0 notes). + +## Discussion + +The 63% material-shift rate is large but methodologically expected: the legacy univariate approach computes target stats one attribute at a time from a sample whose joint attribute distribution does not match the synthetic populations the package targets. The size and pattern of the shifts confirm that this confounding is not a small effect, particularly for derived quantities (nodefactor counts in sparse strata, dissolution durations in small per-stratum cells) where the joint method's effective sample size advantage is largest. Shift directions are stable across scenarios — the joint method is not introducing noise but applying a systematic correction. + +For ongoing EpiModelHIV-p simulations, the immediate implication is that any model parameterized via the legacy ARTnet pipeline carries baseline target statistics that approximately reflect ARTnet's race and (to a lesser extent) age composition, not the modeled target population's composition. For Atlanta-specific models — the dominant use case — the legacy method over-targets main edges by approximately 15%, under-targets casual edges by approximately 8%, and shifts the `nodematch` and dissolution stats heterogeneously across age strata. Whether this propagates to materially different epidemic dynamics depends on the simulation question, but the corrections move in directions one would expect to affect cumulative HIV/STI incidence over a multi-year horizon. Re-parameterizing existing EpiModelHIV-p workflows with `method = "joint"` is now possible without code changes downstream — the netstats object retains its full legacy field set; only the *values* shift. + +The refactor leaves several methodological openings explicit. First, TERGM dissolution remains constant-hazard (geometric); the per-stratum Weibull shape parameters fit during development indicated that the geometric assumption holds for casual partnerships (`k ≈ 0.6` across strata, decreasing hazard) but fails for older-matched main partnerships (`k > 2`, increasing hazard). A duration-dependent dissolution offset would let the simulation honor those shape differences but is outside ARTnet's scope and would require upstream `tergm` extension. Second, ARTnet's cross-sectional sampling design exposes formation-stat estimates to length-biased sampling on partnership-pair targets (nodematch, absdiff) and to partnership-count truncation at 2 main / 3 casual partners on degree-based targets. The first concern is partially addressable by restricting partnership-pair fits to ongoing partnerships only — a small targeted fix tracked on issue #72. The second concern requires a truncated-Poisson likelihood and is harder. Neither bias is corrected by the joint g-computation in the present refactor. Third, the joint duration model uses ongoing partnerships only, consistent with the existing convention but inheriting its length-bias in observed elapsed durations. The estimand here is mean age of extant ties at cross-section — a quantity the geometric-tergm simulation can target exactly — rather than mean full partnership duration. + +For the **ARTnetPredict** project, the refactor unblocks three concrete next steps. The 2017–2018 ARTnet baseline now has a defensible joint-corrected version that any forward projection should rest on rather than the marginal-method baseline. The 2022–2024 AMIS-based projection re-runs naturally via `target_pop = ` constructed from the AMIS demographic posterior, with no dependency on package-shipped reference data. Post-stratification to NHBS-MSM demographics or any other published distribution becomes a one-line argument rather than a manual reconstruction of the synth population. Methodological writeup material — the cross-method comparison tables, the convergence diagnostics, the bias-decomposition examples — is now consolidated as ARTnet-internal artifacts rather than scattered analyst notebooks. + +A natural next step is a methods paper formalizing the joint-vs-marginal comparison with simulation-based validation: fit synthetic data with known joint-attribute structure, show recovery under both methods, and quantify the bias as a function of target-vs-sample population divergence. Coordination with Steve Goodreau (initiated on issue #72) would sharpen the survival-vs-network-dynamics framing, particularly around the inspection-paradox argument that mean age of extant ties — not mean full duration — is the right inferential target given geometric simulation. The same paper would frame the formation-stat sampling-bias work in #72 as the natural extension into territory where ARTnet's sampling design and the TERGM target structure interact directly. Re-analysis of one or two published HIV-simulation findings with the corrected baseline would give concrete substantive stakes to the methodological argument. + +The package is feature-complete on the joint-corrected analytic pipeline that motivated this work. Defaults remain unchanged; downstream users opt in by passing `method = "joint"` and (where relevant) `duration.method = "joint_lm"` and `target_pop = ...`. The legacy contract held throughout the refactor and is now backed by a snapshot-based regression harness that any future modification will continue to verify. + +## References + +- Hernán MA, Robins JM. *Causal Inference: What If*. Chapman & Hall/CRC, 2020. (Chapter 13: g-computation / direct standardization.) +- Weiss KM, Goodreau SM, Morris M, Prasad P, Ramaraju R, Sanchez T, Jenness SM. Egocentric Sexual Networks of Men Who Have Sex with Men in the United States: Results from the ARTnet Study. *Epidemics* 2020; 30: 100386. +- Krivitsky PN, Morris M. Inference for social network models from egocentrically sampled data, with application to understanding persistent racial disparities in HIV prevalence in the US. *Annals of Applied Statistics* 2017; 11(1): 427–455. + +## Reproducibility + +```r +# Backward-compat regression +source(system.file("validation/validate_backward_compat.R", package = "ARTnet")) +compare_to_snapshot(method = "existing") # ALL MATCH 3/3 + +# Method comparison across scenarios +source(system.file("validation/method_comparison.R", package = "ARTnet")) +res <- compare_methods() +summarize_comparison(res) +render_comparison_report(res) # writes inst/validation/method_comparison.md +``` + +The snapshot files used by the regression harness are gitignored at ~12 MB each; capture them once via `capture_snapshot()` on a clean main checkout.