7 February 2025

08 Min. Read

Generating Mock Data: Improve Testing Without Breaking Prod

Just wrapped up a challenging sprint that opened my eyes to the importance of robust mock data generation. Here's what I learned after burning 3 days trying to test our new financial dashboard. Creating realistic test data is an art, not just a checkbox task. Initially, I fell into the classic trap of using simple random number generators and "user1, user2" patterns. The result? Our edge cases remained hidden until QA, causing unnecessary back-and-forth. -Arturo Martinez, Former Staff Engineer @ Stripe

Mock data doesn’t just isolate components; it ensures testing reflects real-world scenarios. It gives developers confidence that the system won’t break in production.

The Importance of Mock Data in Testing

Mock data serves as a stand-in for real data during testing, allowing teams to simulate various scenarios without the risks associated with using live data. This is particularly important in environments where data privacy and compliance are critical, as well as in situations where the application interacts with external services that may not be available during testing.

Mock data can help teams:

Catch regressions early and ensure consistent testing by simulating real-world API responses before integration, eliminating flakiness caused by dynamic production data.
Reduce reliance on external systems by providing stable mocks when APIs or third-party services are unavailable or unreliable.
Speed up development and debugging by avoiding slow API calls, enabling controlled testing scenarios, and replaying real network interactions.
Enhance CI/CD pipelines with reliable, repeatable test data that mirrors production behavior, ensuring smooth automated testing.
Enable isolated testing by simulating specific conditions without external dependencies, making it easier to validate edge cases and system behaviors.

However, the effectiveness of mock data is heavily dependent on its realism and relevance to actual production scenarios.

Learn about an approach that automatically builds mocks using real-application traffic including database states, third-party APIs, and inter-service contracts.

The challenge of generating realistic mock data and how to solve it?

While mock data is incredibly useful, generating it is not without challenges:

✅ Realism: Mock data must closely resemble real-world data to be effective. Poorly generated mock data can lead to false positives or negatives in testing.

✅ Maintenance: As systems evolve, mock data must be updated to reflect changes in APIs, databases, and queues.

✅ Complexity: Simulating complex interactions, such as API chaining or database transactions, can be difficult.

1. Complexity of Real-World Scenarios

One of the primary challenges in generating realistic mock data is the inherent complexity of real-world scenarios. Production environments often involve intricate interactions between various components, including databases, APIs, and third-party services. Creating mock data that accurately simulates these interactions can be daunting.

Mocking data isn’t just about creating fake numbers; it’s about capturing intricate relationships between data points that affect system behavior.

For instance, consider a fintech application:

Example: A FinTech app with multiple accounts

A user may have different types of accounts (checking, savings, investment), each with unique rules.

🔹 Simple Mock (Fabricated, Unrealistic Data)

A naive mock setup might treat all accounts the same, missing real-world variations.

{
  "user_id": "123",
  "accounts": [
    { "type": "checking", "balance": 5000 },
    { "type": "savings", "balance": 20000 },
    { "type": "investment", "balance": 30000 }
  ]
}

🔹Realistic Mock (Based on Production Data)

Includes transaction limits per account type.
Captures interest rates and fund lock-in periods.
Adds compliance-related restrictions (e.g., large withdrawals flagged for fraud checks).

{
  "user_id": "123",
  "accounts": [
    { "type": "checking", "balance": 5000, "daily_limit": 2000 },
    { "type": "savings", "balance": 20000, "interest_rate": 1.5 },
    { "type": "investment", "balance": 30000, "locked_until": "2025-01-01", "risk_profile": "moderate" }
  ],
  "fraud_checks_enabled": true
}

Why is this better?

It prevents false positives in testing (e.g., approving transactions that should be blocked).
It simulates edge cases such as exceeding daily withdrawal limits or interest accrual.

HyperTest provides auto-updating mocks for databases, APIs, and service contracts, ensuring test reliability without requiring developers to manually write or update mocks.
Learn more about the approach here.

2. Data Volume and Variety

Another significant challenge is the volume and variety of data required for effective testing.

Let’s take the example of a Fintech app again, they often handle vast amounts of data, and the diversity of this data can be staggering.

For instance, a personal finance management app must account for different user profiles, transaction histories, budgeting categories, investment portfolios, and financial goals.

Generating a representative sample of this data can be time-consuming and may not cover all edge cases, leading to gaps in testing that could result in production issues.

For example, consider a fintech application that allows users to track their spending, set budgets, and invest in stocks. The application must simulate a wide range of user behaviors, including:

Transaction types, User profiles, Investment portfolios, Budgeting categories, and compliance and regulatory data

As the fintech landscape evolves, the data requirements may change, necessitating continuous updates to the mock data sets. For example, if a new feature is introduced that allows users to invest in fractional shares, the mock data must be updated to include scenarios that reflect this new functionality.

❌ Fabricated Mock (Over-simplified, Incomplete Data)

A naive approach might assume only whole shares are traded, ignoring fractional investments, rounding logic, or minimum purchase limits.

{
  "user_id": "123",
  "portfolio": [
    { "symbol": "AAPL", "quantity": 10, "price": 150 },
    { "symbol": "TSLA", "quantity": 5, "price": 700 }
  ]
}

🚨 Problems:

No support for fractional shares, assumes only whole numbers.
Ignores minimum investment requirements (e.g., a $1 minimum buy-in).
Fails to test rounding logic.

✅ Realistic Mock (Production-Like, Covers Edge Cases)

A well-structured mock includes fractional share purchases, investment limits, and rounding rules.

 
{
  "user_id": "123",
  "portfolio": [
    {
      "symbol": "AAPL",
      "quantity": 10.75,
      "price": 150,
      "fractional_allowed": true,
      "min_investment": 1.00
    },
    {
      "symbol": "TSLA",
      "quantity": 2.5,
      "price": 700,
      "fractional_allowed": true,
      "min_investment": 5.00
    }
  ],
  "last_updated": "2025-02-07T14:30:00Z"
}

Why is this better?

Eliminates backend bugs – HyperTest also generates mocks directly from real network traffic, ensuring that fractional share calculations reflect actual brokerage behaviors.
Enforces business rules automatically – With HyperTest, mocks mirror real production constraints, catching violations like purchases below the minimum investment threshold.
Provides 100% accurate API simulations – Since HyperTest captures live API interactions, its mocks behave exactly like real brokerage APIs, preventing discrepancies between test and production environments.

3. Maintenance and Consistency

As applications evolve, so do their data requirements. What was once a reliable set of mock data can quickly become outdated, leading to inconsistencies in testing. Keeping mock data up to date isn’t just about volume, it’s about maintaining complex relationships between services, transactions, and compliance rules.

For example, introducing a new payment method in a fintech app requires more than just adding a new API response. The mock data must also reflect:

How the new method integrates with existing user accounts and transaction histories
Compliance checks and fraud detection mechanisms
Changes in third-party payment gateways

Every time we roll out a new feature, it feels like I’m back at square one with the mock data. It’s exhausting to constantly update and ensure that everything aligns with the latest changes. Sometimes, I wonder if we’re spending more time maintaining mock data than actually developing new features.

This frustration is common across teams, where outdated mocks lead to broken tests, delays, and misalignment between engineering and QA.

This is again a solved problem in HyperTest:

HyperTest automatically mocks database calls while testing your code or endpoints. It also updates these mocks when your external system changes, ensuring your tests always reflect the current true behavior of your external system.

Here’s your video to see how exactly HyperTest does that👇

Conclusion

Generating mock data is a critical aspect of modern software testing. It allows you to validate your code in a safe, isolated environment while reducing costs and improving speed. However, generating realistic and maintainable mock data can be challenging.

HyperTest’s ability enable developers to replicate database states, third-party APIs, and inter-service interactions seamlessly for testing. Plus, mocks stay in sync with real-world changes, no manual updates required.

Related to Integration Testing

Frequently Asked Questions

1. How does mock data improve testing reliability?

Mock data isolates tests from live systems, ensuring consistent, repeatable results without real-world dependencies or side effects.

2. Can mock data fully replace the need of production testing?

Yes, tools like HyperTest generate mocks from real network traffic, eliminating dependencies on the production environment while ensuring accurate and reliable testing.

3. What’s the best way to generate realistic mock data?

Use auto-generated mocks from tools like HyperTest to simulate production-like scenarios without manual effort.

For your next read

Dive deeper with these related posts!

08 Min. Read

Using Blue Green Deployment to Always be Release Ready

Learn More

09 Min. Read

What are stacked diffs and how do they work?

Learn More

07 Min. Read

Everything You Need To Know About RabbitMQ

Learn More

Watch a Product Demo

Tech Verse

7 February 2025

08 Min. Read

Generating Mock Data: Improve Testing Without Breaking Prod

The Importance of Mock Data in Testing

The challenge of generating realistic mock data and how to solve it?

1. Complexity of Real-World Scenarios

Example: A FinTech app with multiple accounts

2. Data Volume and Variety

❌ Fabricated Mock (Over-simplified, Incomplete Data)

✅ Realistic Mock (Production-Like, Covers Edge Cases)

3. Maintenance and Consistency

Conclusion

Frequently Asked Questions

1. How does mock data improve testing reliability?

2. Can mock data fully replace the need of production testing?

3. What’s the best way to generate realistic mock data?

For your next read

08 Min. Read

Using Blue Green Deployment to Always be Release Ready

09 Min. Read

What are stacked diffs and how do they work?

07 Min. Read

Everything You Need To Know About RabbitMQ