A/B Testing: How to Know What Actually Works

A/B testing isn't just for large companies with massive traffic. Here's how to run experiments that give you actionable insights, even with modest visitor numbers.

What A/B Testing Actually Is

A/B testing means showing two versions of something to randomly divided visitors and measuring which performs better.

Example: Half your visitors see a yellow CTA button, half see a purple one. After collecting enough data, you see which color led to more clicks.

Why it matters: Removes opinions from design decisions. Data tells you what works.

The Scientific Method for Websites

A/B testing is the scientific method applied to web design:

Observe - Notice something isn't performing well
Hypothesize - Form a theory about why and how to improve it
Experiment - Test your hypothesis with real users
Analyze - Look at the data objectively
Conclude - Make a decision based on evidence

This is how optimization compounds over time.

What to Test (And What Not to Test)

High-Impact Elements to Test

Headlines: Even small wording changes can double conversions

CTA buttons:

Copy ("Get Started" vs. "Try It Free")
Color
Size
Position

Value propositions: Different ways of explaining what you do

Social proof: Testimonials vs. logos vs. statistics

Pricing: Presentation, packaging, trial offers

Low-Impact (Don't Bother)

Minor design tweaks: Button corner radius, exact shade of blue Footer links: Unless you have data showing people use them Elements few people see: If it's below the fold and rarely scrolled to, test something else first

The rule: Test things that matter to your business goals, not things that matter to your preferences.

Statistical Significance: When to Trust Results

The mistake most people make: They run a test for a week, see "Version B" ahead, and declare it the winner.

The reality: You need statistical significance to trust results.

What that means:

Enough visitors (minimum 100 conversions per variation)
Enough time (at least 1-2 weeks to account for day-of-week variance)
Clear winner (95% confidence level minimum)

Tools that calculate this for you:

Google Optimize (free, basic)
Optimizely (powerful, expensive)
VWO (mid-range)
Convert (focused on statistical rigor)

Common A/B Testing Mistakes

Mistake 1: Testing Too Many Things at Once

Multivariate testing (testing multiple changes simultaneously) requires exponentially more traffic.

Example:

Testing 2 headlines × 2 button colors × 2 layouts = 8 variations
You need 8x the traffic for reliable results

Better approach: Test one thing at a time (serial testing).

Mistake 2: Stopping Tests Early

Scenario: You run a test for 3 days, see B is winning by 15%, and declare victory.

Problem: Not enough data. Random chance could explain the difference.

Solution: Use a sample size calculator before starting. Know the minimum traffic needed.

Mistake 3: Ignoring Context

Week-to-week variation: Traffic behaves differently on different days Seasonal effects: Holiday traffic behaves differently than normal Traffic sources: Email visitors behave differently than social visitors

Solution: Run tests for full weeks and compare week-over-week, not day-over-day.

Mistake 4: Testing Opinions Instead of Hypotheses

Bad: "I prefer blue buttons. Let's test blue vs. yellow." Good: "Research shows blue creates trust. Our users need more trust signals. Hypothesis: Blue CTA will increase sign-ups."

The difference: The second one is testable and educational regardless of outcome.

Hypothesis Framework

Use this template for every test:

Current state: [What's happening now] Problem: [Why that's suboptimal] Hypothesis: [What we think will improve it] Research basis: [Why we think this will work] Success metric: [What we're measuring] Minimum improvement: [What result would make this worth implementing]

Example:

Current: CTA says "Submit"
Problem: Generic, doesn't communicate value
Hypothesis: "Get My Free Audit" will increase clicks
Research: Personalized CTAs convert 202% better (HubSpot)
Metric: Click-through rate
Min improvement: 10% lift

What to Do When You Don't Have Enough Traffic

Reality: Most small business sites don't have thousands of daily visitors.

Alternatives to traditional A/B testing:

1. Sequential Testing

Run Version A for 2 weeks, measure results
Run Version B for 2 weeks, measure results
Compare

Pros: Works with any traffic level Cons: Less rigorous, more variables (time, seasons, etc.)

2. Qualitative Research

User testing (watch 5 people use your site)
Heatmaps (see where people click and scroll)
Session recordings (watch real user behavior)
Surveys (ask visitors directly)

These often reveal bigger issues than A/B tests.

3. Best Practices Implementation

Use research-backed principles from behavioral science
Implement changes based on proven patterns
Monitor key metrics before and after

At Sparken: We build in research-backed best practices from the start, so you don't need massive traffic to have a high-performing site.

Tools for A/B Testing

Free:

Google Optimize (being deprecated, but still usable)
Microsoft Clarity (heatmaps and session recordings)

Paid:

Optimizely ($$$)
VWO ($$)
Convert ($)
AB Tasty ($$)

For small businesses: Start with heatmaps and session recordings. They're cheaper and often more revealing than A/B tests.

The Compounding Effect

One 10% improvement = 10% more conversions Five 10% improvements = 61% more conversions (1.1^5) Ten 10% improvements = 159% more conversions (1.1^10)

This is why systematic testing matters: Small improvements compound.

The Sparken Approach

We build sites with research-backed best practices from behavioral science, so they convert well from day one.

Then we test:

Identify bottlenecks (where are users dropping off?)
Form hypotheses (why might that be?)
Implement changes (one at a time)
Monitor results (did it work?)
Document learnings (what did we learn about these users?)

Result: Continuously improving performance based on real user behavior, not guesses.

Want a website built on research-backed principles that gets better over time? Let's talk.