A/B Testing: How to Know What Actually Works
A/B testing isn't just for large companies with massive traffic. Here's how to run experiments that give you actionable insights, even with modest visitor numbers.
What A/B Testing Actually Is
A/B testing means showing two versions of something to randomly divided visitors and measuring which performs better.
Example: Half your visitors see a yellow CTA button, half see a purple one. After collecting enough data, you see which color led to more clicks.
Why it matters: Removes opinions from design decisions. Data tells you what works.
The Scientific Method for Websites
A/B testing is the scientific method applied to web design:
- Observe - Notice something isn't performing well
- Hypothesize - Form a theory about why and how to improve it
- Experiment - Test your hypothesis with real users
- Analyze - Look at the data objectively
- Conclude - Make a decision based on evidence
This is how optimization compounds over time.
What to Test (And What Not to Test)
High-Impact Elements to Test
Headlines: Even small wording changes can double conversions
CTA buttons:
- Copy ("Get Started" vs. "Try It Free")
- Color
- Size
- Position
Value propositions: Different ways of explaining what you do
Social proof: Testimonials vs. logos vs. statistics
Pricing: Presentation, packaging, trial offers
Low-Impact (Don't Bother)
Minor design tweaks: Button corner radius, exact shade of blue Footer links: Unless you have data showing people use them Elements few people see: If it's below the fold and rarely scrolled to, test something else first
The rule: Test things that matter to your business goals, not things that matter to your preferences.
Statistical Significance: When to Trust Results
The mistake most people make: They run a test for a week, see "Version B" ahead, and declare it the winner.
The reality: You need statistical significance to trust results.
What that means:
- Enough visitors (minimum 100 conversions per variation)
- Enough time (at least 1-2 weeks to account for day-of-week variance)
- Clear winner (95% confidence level minimum)
Tools that calculate this for you:
- Google Optimize (free, basic)
- Optimizely (powerful, expensive)
- VWO (mid-range)
- Convert (focused on statistical rigor)
Common A/B Testing Mistakes
Mistake 1: Testing Too Many Things at Once
Multivariate testing (testing multiple changes simultaneously) requires exponentially more traffic.
Example:
- Testing 2 headlines × 2 button colors × 2 layouts = 8 variations
- You need 8x the traffic for reliable results
Better approach: Test one thing at a time (serial testing).
Mistake 2: Stopping Tests Early
Scenario: You run a test for 3 days, see B is winning by 15%, and declare victory.
Problem: Not enough data. Random chance could explain the difference.
Solution: Use a sample size calculator before starting. Know the minimum traffic needed.
Mistake 3: Ignoring Context
Week-to-week variation: Traffic behaves differently on different days Seasonal effects: Holiday traffic behaves differently than normal Traffic sources: Email visitors behave differently than social visitors
Solution: Run tests for full weeks and compare week-over-week, not day-over-day.
Mistake 4: Testing Opinions Instead of Hypotheses
Bad: "I prefer blue buttons. Let's test blue vs. yellow." Good: "Research shows blue creates trust. Our users need more trust signals. Hypothesis: Blue CTA will increase sign-ups."
The difference: The second one is testable and educational regardless of outcome.
Hypothesis Framework
Use this template for every test:
Current state: [What's happening now] Problem: [Why that's suboptimal] Hypothesis: [What we think will improve it] Research basis: [Why we think this will work] Success metric: [What we're measuring] Minimum improvement: [What result would make this worth implementing]
Example:
- Current: CTA says "Submit"
- Problem: Generic, doesn't communicate value
- Hypothesis: "Get My Free Audit" will increase clicks
- Research: Personalized CTAs convert 202% better (HubSpot)
- Metric: Click-through rate
- Min improvement: 10% lift
What to Do When You Don't Have Enough Traffic
Reality: Most small business sites don't have thousands of daily visitors.
Alternatives to traditional A/B testing:
1. Sequential Testing
- Run Version A for 2 weeks, measure results
- Run Version B for 2 weeks, measure results
- Compare
Pros: Works with any traffic level Cons: Less rigorous, more variables (time, seasons, etc.)
2. Qualitative Research
- User testing (watch 5 people use your site)
- Heatmaps (see where people click and scroll)
- Session recordings (watch real user behavior)
- Surveys (ask visitors directly)
These often reveal bigger issues than A/B tests.
3. Best Practices Implementation
- Use research-backed principles from behavioral science
- Implement changes based on proven patterns
- Monitor key metrics before and after
At Sparken: We build in research-backed best practices from the start, so you don't need massive traffic to have a high-performing site.
Tools for A/B Testing
Free:
- Google Optimize (being deprecated, but still usable)
- Microsoft Clarity (heatmaps and session recordings)
Paid:
- Optimizely ($$$)
- VWO ($$)
- Convert ($)
- AB Tasty ($$)
For small businesses: Start with heatmaps and session recordings. They're cheaper and often more revealing than A/B tests.
The Compounding Effect
One 10% improvement = 10% more conversions Five 10% improvements = 61% more conversions (1.1^5) Ten 10% improvements = 159% more conversions (1.1^10)
This is why systematic testing matters: Small improvements compound.
The Sparken Approach
We build sites with research-backed best practices from behavioral science, so they convert well from day one.
Then we test:
- Identify bottlenecks (where are users dropping off?)
- Form hypotheses (why might that be?)
- Implement changes (one at a time)
- Monitor results (did it work?)
- Document learnings (what did we learn about these users?)
Result: Continuously improving performance based on real user behavior, not guesses.
Want a website built on research-backed principles that gets better over time? Let's talk.