Spike Overview

Spike is a tool to help develop and test code inside service-oriented architectures. It wraps the network connections between services, providing enhanced functionality along the way. With Spike you can isolate components from failures elsewhere, test tricky failure modes before a big launch, monitor the health and topology of your entire system, and more.


Service-oriented architectures are meant to make development and testing easier. The idea is that by separating a large application into a number of services or micro-services, then each component can be developed at its own pace, and deployed and bug-fixed independently. So long as the API contracts between components are agreed upon, a SOA is supposed to help teams move faster.

But in practice, service-oriented architectures introduce a number of problems that are very difficult to get away from, even with proper API contracts.

  • Failures have a ripple effect. Systems downstream will receive bad (or no) data, and be unable to continue. This affects production systems, and wil block teams during development.

  • Changes to the system (features and enhancements, bug fixes, etc) have to be coordinated among several teams. Even when all teams are in agreement, there is still an ordering and scheduling issue to deal with.

  • Teams only have direct control only their own services. If the services are stacked two or more deep, then a team usually has little or no direct communication with the remote groups.

  • Because the overall architecture is distributed between teams and components, no one person understands the system end to end. It becomes easy to miss large problems.

Ideally, we could have the flexibility and advantages of services, but avoid the problems above. Spike solves this by adding functionality at a place which is often overlooked: the network connections themselves.

Fixing the connections

Spike is a distributed network proxy which intercepts all service calls between components. Because it sees all communication between systems, it can:

  • Simulate failure and network problems without disrupting production.
  • Capture all service data for playback later.
  • Capture all service data for debuggging later.
  • Monitor performance stats (latency, bandwidth, etc) of all your service calls.
  • Show you exactly what your service architecture looks like, and spot design or performance problems at the macro level.

See more in the Features section.

This is a sample Spike deployment. The Spike agent is installed on each machine which makes outgoing service calls. Spike scales easily, and there is no single point of failure.