The PipelineDeals web application recently celebrated its ninth birthday. It’s seen its fair share of developers, all of whom had their own idea of clean code. As a team, we’d been brainstorming ways to wrangle certain areas of the application. The question we’d frequently ask ourselves was How do we clean up _______ [some neglected feature of the application]?.
Reasonable solutions ended up being:
- Rewrite it
- Rewrite and put it elsewhere
In short, we chose to rewrite many of the hairy areas of the app into separate services communicating over HTTP. It’s been about a year since our first commit in a separate service, and we’ve learned quite a bit since then. This is part 1 in a series of posts related to our transition to microservices.
How we got here
This was us 18 months ago. PipelineDeals was a crufty Rails 2 application that many of us were scared to open. It’d been several years of adding feature upon feature without consistent knowledge, style, or guidance. And it’s probably not surprising we had what we did. Regardless, we needed to fix it.
One of our goals was to move to Rails 3, and later more updated versions, but in order to get there, we had to refactor (or remove) quite a bit of code to make the transition easier.
This, to me, was a huge factor around our decision to move to a more service-focused approach. At this year’s Railsconf keynote, DHH joked about the majestic monolith and how many companies prematurely piece out services, all to later suffer pain when they realize it was a premature optimization.
The same could be said for our move. Instead of spinning out separate services, we could have cleaned up the mess we had by refactoring every nasty piece of our app. We could have turned our ugly monolith into a majestic one. But while it would’ve been possible, our team agreed we were better served by more or less starting over. Not in the big-bang rewrite sense, but instead to stand up brand new service apps when we added new features, and when it made sense. Made sense is the key here. There have been many times when it didn’t make sense over the past 12 months. But we’re learning and getting better at identifying the things that are good candidates for a more isolated service.
Do we wait for the next requested feature or what?
At one of our weekly team hangouts, we watched a talk focused on starting by isolating the responsibility of Email. It was the perfect introduction and motivation for us to get a small win and some experience under our belts. For some reason prior, we didn’t have a great sense of how to start making that transition.
The idea was to take our emails (and there were plenty) and move them to a separate Rails app that’s only responsibilty is sending email. While it sounds trivial, the idea alone introduces a lot of interesting questions: What do we do with those really nasty emails that have 30 instance variables? What do we do if the email service is down? How do we trigger an email to be sent?
We created a new Rails 4 app, removed all the stuff we didn’t need and created a golden shrine where emails could flourish…but seriously, that’s all it did. And it did it really well.
The next question was how to send emails from the main application. We’re very happy Sidekiq Pro users, and one of the benefits we love about Sidekiq is the built-in retries. This gives us a layer of reliability outside of the code layer. So rather than build some ad-hoc retry mechanism by creating a counter in ruby, and rescuing failures within a certain range, we shoot off a job. If it fails because the network is down, or the endpoint isn’t available, the job will retry soon after and continue down the happy path. Sidekiq retries are a recurring theme with our infrastructure. We’ve made a number of decisions around the fact that we have this advantage already built-in, and we might as well take advantage of it. More on that to come.
The defacto communication method between services is over HTTP. And we did nothing different. Our services use JSON payloads to exchange data, which let’s us easily take advantage of Sidekiq on both ends.
So now, rather than invoking a built-in Rails mailer like:
we invoke a PORO to send off the communication:
|1||Email.to current_user, :user_welcome|
There’re a number of use-case specific variables above, but the
email_key is probably the most important. We used that to describe what email should be invoked on the service.
In the above example, we triggered the
welcome email on the
UserMailer class. We translated this request into an email key of
This key then gets interpreted by the Email service app and turned into an actual
Mailer class and method within it. We could have done this in a variety of ways, but we split the string on the service-side at the
_, and the first element described the mailer, the rest the method. So in this case, it gets interpreted as
One thing this pattern allowed us to do was almost full copy/paste the old mailer methods in to the new Email service application.
Failures, failures, failures
What if the service is down? you say, the email request will fail! Sure will.
So let’s wrap that request in a Sidekiq job to take advantage of the built-in retries.
Rather than invoke the following method in the email object:
|1||RestClient.post(ENV[“PIPELINE_EMAIL_URL”], json, :content_type => :json)|
we’ll shoot off a Sidekiq job instead, changing the
queue_email method to:
There we have it. Network-proof email requests!
Not so fast…
Astute readers will probably recognize that the service-side network communication can potentially also fail. This is becoming a pattern, huh? More communication, more potential for failure and more potential headaches.
On the service side, we have a controller that takes in the request for the email and immediately serializes it to a Sidekiq job:
Because we immediately serialize the job to Sidekiq, we’ve successfully acknowledged the job was received, and the main app’s Sidekiq job completes successfully. Now the email service can move on to doing the heavy-lifting in whatever way makes the most sense. In our case, we use Mailgun to send our emails, so the
EmailWorker Sidekiq job invokes a new mailer based on the
email_key param and sends it off to mailgun for transport. And because it’s wrapped in a Sidekiq job, we can sleep well knowing that the Mailgun request can fail and the job will successfully retry until it goes through.
Service communication is definitely not for the faint of heart and as a team, we can completely appreciate the challenges that come along with keeping services in sync now—especially having stood up about 8 new services in the last 12 months.
Sidekiq has been the queueing solution we’ve leaned on to keep communication in sync and reliable. We’ve also written a few internal tools that piggy-backy off Sidekiq that we’re really excited share with the community in the near future.
Part II, in this series, will discuss the methods of communication necessary to consider when implementing a service-based architecture.