Data Guide
Operational rules for turning human-written diary notes into reusable telemetry with a final human review before publish.
READING GUIDE
What you will find on this page
A quick overview of what Data Guide covers before you read in detail.
Extraction principles
This section gives a quick overview of Extraction principles.
Human-in-the-loop pipeline
This section gives a quick overview of Human-in-the-loop pipeline.
Publish quality gates
This section gives a quick overview of Publish quality gates.
Time-Granular Causal Inference (Beta)
Use semantic matching to turn daily context and reactions into reviewable cause-and-effect patterns.
Extraction principles
- Never infer facts that are not explicitly written
- Normalize units/time/counts while preserving original intent
- Keep unknown values as null instead of forced guesses
- Treat causality as a candidate hypothesis with confidence, not a certainty
- Prioritize time anchors (e.g., 14:00 delivery) and compare neighboring events in a window
- Validate date/day_number consistency before publish
Human-in-the-loop pipeline
The source is a human-written diary. Automation handles the repeatable middle steps, and the final draft is reviewed by hand before publish.
- Capture the daily diary notes
- Generate diary/article markdown
- Convert to telemetry JSON (
convert_diary_to_json_telemetry_v2.py) - Extract semantic events (behavior, environment, intervention) as time anchors
- Compare nearby events in before/after windows (e.g., -30 min to +60 min) to find likely cause-and-effect patterns
- Save the primary JSON record (
posts/telemetry/ja/telemetry_XXXX.json) - Run daily delta sync to SQLite via WP-Cron (1:30 AM)
- Manually review and refine before publish
- Reuse data in dashboards/FAQ/fixed pages
Publish quality gates
Automation validates drafts first, then a human reviewer checks the results and fixes anything uncertain before publish.
- title/body length/heading count
- TODO/TBD placeholder leakage
- link format sanity
- frontmatter.date vs telemetry.date consistency
- time-anchor extraction coverage (missing timed mentions)
- evidence fragment linkage for each likely cause-and-effect pattern
Reports are stored in logs/publish_quality_gate.jsonl.
Time-Granular Causal Inference (Beta)
Use semantic matching to turn daily context and reactions into reviewable cause-and-effect patterns.
Example: “Activity increased at 14:00” alone does not explain why. We link same-day anchors such as “14:00 delivery”, “14:03 barking”, and “14:10 calm” to model a likely chain: delivery stimulus -> arousal response -> recovery.
- Output 1: likely trigger/reaction/recovery pattern
- Output 2: evidence (source fragments and time deltas)
- Output 3: confidence (high / medium / low)
- Output 4: next capture instruction (what to measure next)
Review these patterns in Forecast and turn them into practical steps in the Training Guide.