Appearance
Introduction to Obvyr
The Testing Confidence Problem
Do your tests actually protect production? Or are you just assuming they do?
Most engineering teams deploy with hope, not evidence:
- Hope that flaky tests aren't hiding real bugs
- Hope that local environment matches CI behaviour
- Hope that your 3,000 tests are all providing value
- Hope that AI-generated code has AI-reliable tests
Obvyr replaces hope with proof.
What Obvyr Does
Obvyr is a testing insights platform that proves test reliability through comprehensive data collection and pattern recognition. Instead of showing you point-in-time test results, Obvyr analyses patterns across thousands of test executions to reveal:
- Which tests are truly flaky vs. which are genuinely broken
- Where your environments diverge between local, CI, and production
- Which tests catch bugs vs. which ones just slow down CI
- Whether AI-generated tests actually validate behaviour or just assume it
You move from assumption-based testing ("we think our tests are good") to evidence-based testing ("we can prove our tests are reliable").
How Obvyr Organises Your Data
To prove test reliability, Obvyr collects and organises test execution data through a flexible hierarchy designed around how engineering teams actually work:
Organisations
Your organisation account represents your company or team. This is where billing is managed and where you control user access across all your projects.
Why it matters: Multi-tenant isolation ensures your test data is completely separate from other organisations, providing both security and clarity in pattern analysis.
Projects
Projects are logical groupings that make sense for your workflow. You might organise by:
- Codebase (one project per repository) - Compare test patterns across different repositories
- Service (frontend, API, mobile app) - Understand test reliability per service
- Team (platform team, product team) - Track team-specific testing practices
- Environment (staging, production) - Analyse environmental test differences
Why it matters: Flexible project organisation lets you analyse test patterns at the granularity that makes sense for your team, whether that's service-level, team-level, or environment-level insights.
Organise for Insights
There's no single "right" way. Use whatever structure helps you analyse your testing data most effectively. The goal is evidence-based insights, not rigid hierarchy.
CLI Agents
Within each project, you'll create CLI agents to collect data from specific types of testing activity. Each CLI agent has its own API key and wraps different commands to capture execution data.
Why it matters: Separate CLI agents for different test types (unit, integration, linting) let you analyse patterns specific to each quality check. You can identify which test types are flaky, which catch the most bugs, and which provide the best ROI.
Example setup:
Wyrd Tech (Organisation)
├── Obvyr API (Project)
│ ├── Typecheck (CLI Agent) - Tracks mypy execution patterns
│ ├── Lint (CLI Agent) - Monitors ruff/black reliability
│ └── Test (CLI Agent) - Analyses pytest behaviour
├── Obvyr CLI (Project)
│ ├── Typecheck (CLI Agent) - Mypy pattern tracking
│ └── Test (CLI Agent) - Pytest execution analysis
└── Obvyr UI (Project)
├── Lint (CLI Agent) - ESLint pattern monitoring
└── Test (CLI Agent) - Vitest execution insightsObservations
Every time you run a command wrapped by the Obvyr CLI, it creates an observation. This captures:
- Command output (stdout/stderr)
- Execution duration and timing
- User who ran the command
- Environment context and variables
- Test results and framework metadata
Why it matters: Individual observations are data points. Thousands of observations become patterns. Obvyr analyses these patterns to reveal:
- Flaky tests: Tests that fail inconsistently across observations
- Environment drift: Systematic differences between local and CI observations
- Test value: Which tests catch bugs vs. which never fail
- Performance trends: Tests getting slower over time
The Obvyr Difference
What Traditional Testing Shows You
- ✅ Test passed (but was it reliable or just lucky?)
- ❌ Test failed (but is it broken or flaky?)
- 📊 85% coverage (but does that coverage catch bugs?)
- ⏱️ 45-minute CI (but which tests provide value?)
What Obvyr Shows You
- ✅ "This test passed in 847/847 executions across all environments (100% reliable)"
- ❌ "This test failed in 23/150 executions, 91% correlated with CI runner 'ci-3' (environmental issue, not code issue)"
- 📊 "These 234 tests caught 94% of your bugs over 6 months (actual value, not assumed coverage)"
- ⏱️ "These 1,200 tests have never caught a bug and account for 63% of CI time (safe to remove)"
The Obvyr Workflow: From Setup to Insights
1. Set Up Your Structure
Create projects in the Obvyr dashboard that match how your team organises testing
Time: 2 minutes Value: Clear data organisation for targeted insights
2. Create CLI Agents
Within each project, create CLI agents for different test types you want to monitor
Time: 3 minutes Value: Separate pattern analysis for unit tests, integration tests, linting, type checking
3. Install and Configure the Obvyr CLI
Install the CLI and configure it with your CLI agent API keys
Time: 2 minutes Value: Start capturing comprehensive test execution data
4. Wrap Your Commands
Replace pytest tests/ with obvyr pytest tests/ (same for any test command)
Time: 1 minute Value: Zero workflow disruption, immediate data collection
5. Analyse the Insights
View patterns, trends, and evidence-based test reliability in the dashboard
Time: Ongoing Value: Prove test reliability, identify flaky tests, optimize CI, prevent incidents
What's Next?
Ready to prove your tests are reliable?
- Why Obvyr? - Understand the full value proposition and what makes Obvyr different
- Problems Solved - See detailed scenarios of specific testing challenges Obvyr solves
- Getting Started - Set up your first project and start collecting evidence in 10 minutes
From Hope to Proof in 10 Minutes
Stop assuming your tests are reliable. Start proving it. Get started now.