Skip to content

target_pop API: list / data.frame / character forms (#64, Phase 1.4)#77

Merged
smjenness merged 2 commits intomainfrom
feature/target-pop-api
Apr 25, 2026
Merged

target_pop API: list / data.frame / character forms (#64, Phase 1.4)#77
smjenness merged 2 commits intomainfrom
feature/target-pop-api

Conversation

@smjenness
Copy link
Copy Markdown
Contributor

@smjenness smjenness commented Apr 25, 2026

Closes #64. Adds a single target_pop argument to build_netstats() that unifies post-stratification of the synthetic target population.

API

target_pop accepts four forms:

Form What it does
NULL (default) Legacy patchwork-of-references behavior — byte-identical to pre-#64
Named list Per-marginal overrides for {age.pyramid, race.prop, deg.casl, deg.main, deg.tot, role.class, risk.grp}. Missing names fall through to defaults
data.frame One row per node. Required: age, deg.casl, deg.main, role.class, risk.grp (+ race when race = TRUE). Optional with derivation: sqrt.age, age.grp, active.sex, deg.tot, diag.status. network.size is overridden to nrow(target_pop)
Character Reserved for built-in references (e.g., "atlanta", "us_msm_male") bundled from NCHS age pyramid + ARTnetData::race.dist by geography. Raises a clear not-yet-implemented error for now

Why each form

  • List form unifies what was previously several args (age.pyramid, race.prop) and extends the override surface to per-attribute distributions previously sourced silently from netparams. One place to set every marginal you might want to override.
  • data.frame form is the meaningful new capability for the ARTnetPredict 2024-projection workflow: post-stratify ARTnet targets to AMIS demographics or any user-specified joint distribution by handing in a fully-specified synthetic cohort. Bypasses sampling entirely, no marginal-vs-joint independence assumption.
  • Character form is a planned hook for built-in reference populations. Stubbed out with an informative error so the API surface is locked in even though the data isn't shipped yet.

Implementation

  • New private helper .parse_target_pop() does form detection, column / element validation, and normalization (e.g., handles race.props alias for race.prop).
  • The "Nodal Attribute Initialization" block is restructured into an if/else: data.frame form pulls attrs directly into the same attr_* locals the sampling path produces; sampling path applies list-form distribution overrides via .dist_* locals.
  • The diag.status block honors a user-supplied diag.status column when data.frame form provides one; otherwise falls through to the existing epistats-based draw using the user's attribute vectors.
  • Common out$attr assignments factored out so both code paths share one place to populate.

Validation

  • Backward-compat snapshot harness: 3/3 match on default and explicit method = "existing" (this is the real test that the restructuring didn't break anything).
  • Full testthat suite: 571 / 571 pass.
  • R CMD check: 0 errors / 0 warnings / 0 notes.

Tests

tests/testthat/test-target-pop.R — 12 blocks, 25 assertions. Covers:

  • NULL byte-identical to no-arg (regression test)
  • list form: race.prop override produces matching race composition; deg.casl override produces matching distribution; race.props alias normalized to race.prop; unknown list elements raise informative error
  • data.frame form: attributes pass through; deg.tot cap derivation; missing-required-column error; diag.status falls back to epistats when absent; composes with method = "joint" (internal consistency sum(nf_*) == 2 * edges still holds)
  • character form: not-yet-implemented error message
  • Bad input: non-list/df/char raises "must be NULL, a list, a data.frame, or a character string"

Test plan

  • NULL byte-identical to legacy (snapshot match 3/3)
  • List form overrides each supported marginal correctly
  • data.frame form bypasses sampling, derives missing columns, composes with method = "joint"
  • Character form raises clear error
  • All 4 forms validated end-to-end manually + 25 unit assertions
  • R CMD check 0/0/0
  • Built-in geography-named reference bundles (e.g., "atlanta", "us_msm_male"): tracked as future work. Implementation is a lookup table from name to list(age.pyramid = ..., race.prop = ...) using NCHS age pyramid (already in build_netstats) + ARTnetData::race.dist (already shipped) — no new external data needed.

Closes #64. With #64 + #65 + the joint g-comp refactor (#61#74) all landed, the ARTnet → ERGM target stat pipeline is now fully joint-corrected and post-stratifiable.

smjenness and others added 2 commits April 25, 2026 14:40
Adds a single `target_pop` argument to `build_netstats()` that
accepts three forms and unifies post-stratification of the synthetic
target population:

1. **NULL (default)** -- legacy patchwork-of-references behavior, byte-
   identical to pre-#64. Verified via the inst/validation/ snapshot
   harness (3/3 match on default and explicit method = "existing").

2. **Named list** -- per-marginal overrides for any subset of
   {age.pyramid, race.prop, deg.casl, deg.main, deg.tot, role.class,
   risk.grp}. Names not in the list fall through to existing defaults.
   The list form supersedes the older one-arg-at-a-time approach
   (age.pyramid, race.prop) and extends the override surface to the
   per-attribute distributions previously sourced from netparams.

3. **data.frame** -- one row per node, columns supplying user-specified
   joint attribute values. Required: age, deg.casl, deg.main, role.class,
   risk.grp (plus race when race = TRUE). Optional with derivation:
   sqrt.age, age.grp, active.sex, deg.tot, diag.status. When supplied,
   attribute sampling is bypassed entirely and `network.size` is set to
   `nrow(target_pop)`. Designed for users with a fully-specified joint
   target population (NHBS / AMIS post-stratification, custom synthetic
   cohorts).

4. **Character** (e.g., target_pop = "nhbs_msm_2022") -- raises an
   informative not-yet-implemented error. Built-in reference data
   ships via ARTnetData and requires PI coordination; tracked as a
   future extension on this issue.

Implementation:

- New private helper `.parse_target_pop()` in R/NetStats.R does form
  detection, column / element validation, and normalization (e.g.,
  race.props -> race.prop alias).
- The Nodal Attribute Initialization block is restructured into an
  if/else: data.frame form pulls attributes directly into the same
  attr_* locals the sampling path produces; sampling path applies
  list-form distribution overrides via `.dist_*` locals.
- diag.status block honors a user-supplied diag.status column when
  data.frame form provides one; otherwise falls through to the
  epistats-based draw (init.hiv.prev or hiv.mod) on the user's
  attribute vectors.
- Common attr assignments factored out so both paths share one
  place where out$attr is populated.

New tests: tests/testthat/test-target-pop.R (12 blocks, 25 assertions)
covering: NULL byte-identical to no-arg; list form with race.prop,
race.props alias, deg.casl override; unknown list element error;
data.frame form attribute pass-through; deg.tot cap derivation;
missing-required-column error; diag.status fallback; composition with
method = "joint"; character-form error; non-list/df/char input error.

Validation:
- Backward-compat snapshot harness: 3/3 match on default and explicit
  method = "existing".
- Full testthat suite: 571 / 571 pass.
- R CMD check: 0 errors / 0 warnings / 0 notes.
- Manual exercise of all four forms (NULL, list, data.frame, character)
  produces expected behavior including correct error messages.

Closes #64.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR review feedback: the original character-form error message and
docstring used 'nhbs_msm_2022' as a placeholder name from issue #64's
example. NHBS microdata is restricted, not appropriate for an
ARTnetData-shipped reference.

The realistic plan is geography-specific general male population
demographics: NCHS age pyramid (already in build_netstats) + race
composition from ARTnetData::race.dist (already in the package) per
city / state / region. No restricted data needed; bundles like
"atlanta" or "us_msm_male" would just package what's already there
into named entry points.

Updates:
- Error message no longer references NHBS; describes the actual
  planned set (NCHS + ARTnetData::race.dist by geography).
- Roxygen @param doc rewritten to match.
- Test uses target_pop = "atlanta" (a realistic future bundle name)
  instead of the speculative NHBS example.

No code path change; only the user-facing strings and one test
trigger value.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@smjenness smjenness merged commit f2f5b87 into main Apr 25, 2026
1 check passed
@smjenness smjenness deleted the feature/target-pop-api branch April 25, 2026 20:55
smjenness added a commit that referenced this pull request Apr 25, 2026
A 2,800-word standalone writeup at inst/validation/method_refactor_report.md
documenting the methodological refactor delivered by PRs #66-#77.
Structured as introduction / methods / results / discussion + references
+ reproducibility section.

Sections cover:

- Intro: ARTnet's role in EpiModelHIV-p; the marginal-vs-joint
  problem the legacy univariate approach exposed; the ARTnetPredict
  motivation for fixing the within-ARTnet baseline before forward
  projection.
- Methods: the three new arguments (`method`, `duration.method`,
  `target_pop`); per-layer joint Poisson + binomial + Gaussian +
  log-linear fits; g-computation aggregation in build_netstats; the
  cross-sectional age-of-extant-ties target for dissolution; the
  validation infrastructure (snapshot harness, method comparison,
  GHA CI).
- Results: 229/363 cells (63%) shift > 5% across four scenarios;
  worst shifts on dissolution durations in matched-and-old strata
  (-47%), one-time nodematch in older age groups (-51%), and
  high-deg.main casual nodefactor (+40%); decomposition of the -15%
  Atlanta main-edges shift attributed to ARTnet's 80.7% White vs
  Atlanta's 51.5% Black composition; coefficient strengthening on
  deg.casl (-0.24 -> -0.55), hiv2 (+0.09 -> +0.25), age slope, and
  the AIC-selected age:deg.casl interaction; end-to-end ERGM
  convergence with netdx |Z| <= 2.05 across 1000 sims.
- Discussion: implications for EpiModelHIV-p simulations
  (Atlanta-specific models over-target main edges by 15%);
  three explicit limitations (geometric tergm dissolution can't honor
  Weibull k != 1, length-bias and 5-truncation in formation stats not
  yet addressed in #72, joint_lm uses ongoing partnerships only);
  ARTnetPredict's three unblocked next steps (corrected 2017-18
  baseline, 2022-24 AMIS projection via target_pop data.frame, NHBS
  post-stratification as a one-line argument); methods paper outline.

Numbers cited are spot-checked against the committed
inst/validation/method_comparison.md to ensure the report and the
machine-generated comparison agree.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Phase 1.4] Support post-stratification via user-supplied target population distribution

1 participant