HTML Emails Inserting Spaces in Odd Locations

About two weeks ago, my team received a report of a problem in one of our system generated emails.  A small handful of the words in longer paragraphs were being split.

For example, there was a long paragraph (200 words or so), and the word “condition” was split into “condit ion” – a strange problem but one related to a previously discovered limitation in the venerable yet pervasive sendmail program which we used for delivering the emails.

The Challenge

Sendmail splits long lines after the 998th character.  It does this by adding a carriage return (like hitting Return on your keyboard).  What was happening is the “t” in “condition” was at the 998th character.  Further muddying the water, was the fact that we are dealing with escaped HTML, so a quote (“) is actually represented as " And there were also tags which are invisible to a human.

The Fumbling

I was aware of the 998th character issue of sendmail, but didn’t know of a good work around.  I started chatting with Jaron, a fellow Notre Dame programmer and good friend of mine, about the problem.

Both of our initial understandings of HTML emails was that they simply worked.  Which clearly was not the case.

Important Sidebar

Instead of starting the chat by stating the root cause, I solicited a request for help with my proposed solution – a regular expression to add a carriage return after every period, so long as it wasn’t part of an attribute of an html-tag.

Thankfully, I only spent a four minutes going down that path before I stated the root cause – carriage returns were being injected into an HTML email and it was breaking words.

When looking for help with a problem, don’t ask for help on a problem related to your proposed solution. Instead clearly state your understanding of the initial problem.  Then state your proposed solution for correcting the problem.

The Solution

After a lot of trial and error, we eventually settled on setting the email’s HTML part’s Content-Transfer-Encoding to base64, and encoded the HTML part in base64.

Below is our Rails 3.0.11 solution, it hasn’t been “cleaned up” but it highlights the key take-aways:

# Rails.root/app/models/notifier.rb
class Notifier < ActionMailer::Base
  def general_email
    # important configuration stuff
    # setting @object for template access
    mail { |format|
      format.text
      format.html(:content_transfer_encoding => base64)
    }.deliver
  end
end

# Rails.root/app/views/notifier/general_email.html.erb
<%= Base64.encode64(%(<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
   "http://www.w3.org/TR/html4/loose.dtd">

<html lang="en">
<head>
  <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  <title>#{@object.subject}</title>
</head>
<body>
  #{ @object.body.html_safe }
</body>
</html>))%>

Comments are closed.