Data integrity
How we keep ScoreView’s data trustworthy
Audit completed .
01 / Why this matters
Housing professionals decide based on what ScoreView shows them.
Regulatory positioning, complaint-handling priorities, and board decisions all rest on these figures. We owe you confidence that every number is what it claims to be.
This page explains what we checked, what we changed, and how the platform defends itself against the kind of silent errors that creep into any data product.
02 / What we checked
Three layers, end to end.
- Ingestion. How Housing Ombudsman determinations move from the public record into our database.
- Analytics. How individual records become benchmarks, trends, and peer comparisons.
- Presentation. How those numbers, and the AI briefings derived from them, reach you.
Across these layers we surfaced more than 30 specific risks: a number could be misread, a record could be silently mis-categorised, or an AI-generated briefing could be mistaken for an Ombudsman quote.
03 / What we fixed
Six changes, one principle: never let a failure look like a finding.
Silent defaults killed
Previously, if our scraper failed to extract a determination’s outcome, the record would fall back to “no maladministration”, making an extraction failure indistinguishable from a landlord being cleared. Likewise, a missing publication date would silently become today’s date, surfacing the record in trend charts as fresh news. Both fallbacks are gone. Extraction failures now carry an explicit parse error flag and are excluded from every analytics view by invariant.
Failures filtered by enforced rule
Every analytics query (sector outcome rates, landlord rankings, time-to-determination percentiles, category-level remedy rates) filters out parse-error records. An automated check runs on every code change to prevent any new query from forgetting this filter.
Small samples are labelled
Benchmarks and peer comparisons built on fewer than 25 records are labelled. Fewer than 10: low‑confidence. Fewer than 5: very‑low‑confidence. AI briefings cannot declare sector trends from underpowered samples and must flag when the most recent quarter is still in progress.
Provenance you can quote
Every CSV, PDF, PowerPoint, and Word export carries a methodology page: the corpus snapshot date, the exact filter set used (as a reproducible hash), and a link back to the Housing Ombudsman source for every record. Email digests and the determination detail page do the same. AI-generated summaries are labelled clearly. They are ScoreView’s interpretation of structured metadata, never the Ombudsman’s wording.
Staleness is visible
Every dashboard shows when the corpus was last refreshed. If our scheduled ingestion fails or stalls beyond a 10-day window, a red warning appears on screen. You will know the data is stale before you cite it.
Source links on every record
Every determination on every screen (search results, detail pages, exports, alert emails, sector digests) links back to the original record on the Housing Ombudsman website. Crown copyright is respected: we store structured metadata only, never the determination text.
04 / What we proved at launch
We applied the audit retroactively to every existing record.
- 50Outcome corrected
- 245Recategorised
- 191Summaries regenerated
- 178Honestly excluded
05 / How we keep it true
The audit was a moment. These controls are permanent.
Seven controls are now built into the platform:
- Database-level guards. Bulk-delete operations on the corpus table are blocked at the PostgreSQL level. Only an explicit, audited override can perform a corpus-wide write.
- Regression tests, mandatory. Every silent-default failure we fixed has a test that would catch its return. Tests run on every code change.
- Static analysis at merge. Any new analytics query that fails to exclude parse-error records is rejected automatically.
- Health-check thresholds on every scrape. If unclassifiable landlords exceed 10%, or unknown categories exceed 5%, the run is flagged for review before the data reaches production.
- AI briefing guard rails. The AI cannot claim a trend from a small sample, cannot characterise the latest in-progress period as a spike or decline, and cannot reproduce determination text.
- Append-only ingestion. The corpus grows by upsert, not by truncate-and-rebuild. Historical records cannot be silently lost.
- Backups before any retro-fix. Every script that touches historical data takes a database backup first, and refuses to run without one.
06 / How to read what you see
Every figure on this platform is five things.
When you read a ScoreView number, you can take it as:
- Clean. Drawn from Housing Ombudsman determinations that parsed without error. Anything that failed extraction is excluded, labelled, and visible to our engineering team for follow-up.
- Fresh.Refreshed weekly. The “as of” date is shown on every dashboard. If the platform has not refreshed recently, you will see a warning.
- Reproducible. Every export records the filters used and the snapshot date, so a figure you quote today can be reproduced and defended later.
- Sized. Sample counts are shown; benchmarks below the confidence threshold are marked.
- Sourced. Every determination links to its public record on housing-ombudsman.org.uk. You can verify any single data point at source.
AI-generated content (synthesis briefings, summaries, weekly digests) is labelled, and works only from cleaned structured metadata. It is not the Ombudsman’s wording, and it is not a legal opinion.
07 / Questions and corrections
If a number looks wrong, we want to know.
Contact us through your account page. Quote the URL and the date. We will trace the figure back to source and respond.
For the underlying source data we draw from, see our Data Sources and Methodology page.