Data Guide

Operational rules for turning human-written diary notes into reusable telemetry with a final human review before publish.

READING GUIDE

What you will find on this page

A quick overview of what Data Guide covers before you read in detail.

Highlights from 4 key sections

Extraction principles

  • Never infer facts that are not explicitly written
  • Normalize units/time/counts while preserving original intent
  • Keep unknown values as null instead of forced guesses
  • Treat causality as a candidate hypothesis with confidence, not a certainty
  • Prioritize time anchors (e.g., 14:00 delivery) and compare neighboring events in a window
  • Validate date/day_number consistency before publish

Human-in-the-loop pipeline

The source is a human-written diary. Automation handles the repeatable middle steps, and the final draft is reviewed by hand before publish.

  1. Capture the daily diary notes
  2. Generate diary/article markdown
  3. Convert to telemetry JSON (convert_diary_to_json_telemetry_v2.py)
  4. Extract semantic events (behavior, environment, intervention) as time anchors
  5. Compare nearby events in before/after windows (e.g., -30 min to +60 min) to find likely cause-and-effect patterns
  6. Save the primary JSON record (posts/telemetry/ja/telemetry_XXXX.json)
  7. Run daily delta sync to SQLite via WP-Cron (1:30 AM)
  8. Manually review and refine before publish
  9. Reuse data in dashboards/FAQ/fixed pages

Publish quality gates

Automation validates drafts first, then a human reviewer checks the results and fixes anything uncertain before publish.

  • title/body length/heading count
  • TODO/TBD placeholder leakage
  • link format sanity
  • frontmatter.date vs telemetry.date consistency
  • time-anchor extraction coverage (missing timed mentions)
  • evidence fragment linkage for each likely cause-and-effect pattern

Reports are stored in logs/publish_quality_gate.jsonl.

Time-Granular Causal Inference (Beta)

Use semantic matching to turn daily context and reactions into reviewable cause-and-effect patterns.

Example: “Activity increased at 14:00” alone does not explain why. We link same-day anchors such as “14:00 delivery”, “14:03 barking”, and “14:10 calm” to model a likely chain: delivery stimulus -> arousal response -> recovery.

  • Output 1: likely trigger/reaction/recovery pattern
  • Output 2: evidence (source fragments and time deltas)
  • Output 3: confidence (high / medium / low)
  • Output 4: next capture instruction (what to measure next)

Review these patterns in Forecast and turn them into practical steps in the Training Guide.

Related pages