Conditionals to polymorphism and the Chain of Responsibility pattern for Refactoring

Tue May 05 2015

Consider the following situation: We want to expand our customer base through a new online campaign (let's call it the A-Campaign) with Google. If some user is interested in our advertise and clicks on the promo, the link redirects him to a landing page where he can fill a form. This simple form sends an email to a defined account.

Now, we need to process the email to make it useful for a sale. So, we have a console application that checks the new emails from the account and stores the relevant information in our CRM. The logic is something like this:


class Program
{
  static void Main(string[] args)
  {
    // Get emails from account
    List<string> emails = GetNewMails();

    for (int i = 0; i < emails.Count; i++)
    {
      // Get relevant data from email, in the form <Key, Value>
      Dictionary<string, string> dataFromEmail = GetData(emails[i]);

      // If the email contains data, store it on CRM
      if(dataFromEmail != null)
        SaveData(dataFromEmail);

      DeleteEmail(i); 
    }
  }
}
  

The program is nothing but a procedural code, in which the Main() method gets the data, process it and then saves it in the CRM.

Let us focus on the GetData(string email) method. As you can see, we receive a simple string (all the source from an email). We need to process the information and then return the relevant information. Our logic is something like this:


private static Dictionary<string, string> GetData(string email)
{
  Dictionary<string, string> emailReturn = null;

  if (Subject(email).Contains("A-Campaign"))
  {
    // Our implementation uses a Regular expression to get the data, based on a defined structure.
    // Here we create the dictionary and add all the information in (key, value) pair.
  }
  return emailReturn;
}
  

More or less, this method is capable to return an object with all the information.

Now, suppose our company starts two new campaigns (B-Campaign and C-Campaign), each one with different email and data structure. Perhaps we can modify our method with something like this:


  private EmailData GetData(string email)
  {
    Dictionary<string, string> emailReturn = null;
    if (Subject(email).Contains("A-Campaign"))
    {
      // Our implementation uses a Regular expression to get the data, based on a defined structure.
      // Here we create the dictionary and add all the information in (key, value) pair.
    }
    else if(Subject(email) == "B-Campaign")
    {
      // We use another regular expression, or perhaps another logic to get the information
      // All the data is stored in the emailReturn object.
    }
    else if(Subject(email) == "C-Campaign")
    {
      // ... the same idea
    }    
    return emailReturn;
  }  
  

We are starting to have a code smell: a section of code that has some weakness in their design, despite it's technically correct because it makes his purpose. Let's start to refactor our codes.

Replacing conditionals

When we start to see section of codes separated by if or switch in which all the branches seems quite similar, it's better to refactor replacing the conditional with polymorphism. That way we can order and encapsulate each section of the code. The procedure is straightforward:

  1. Create a new abstract class with an abstract method. The name of the method can be the same as the method you're refactoring.
  2. For each branch in the conditional: create a new class that inherit from the base class in point 1. Move the logic of the branch to the method that must be override.
  3. In the Main() program, instead of calling the procedural method, now you work with the abstraction.
  4. At some point you need to get the right class that implements the abstraction, based on certain condition (in this case the subject of the email).

class Program
{
  static void Main(string[] args)
  {
    // We declare the abstraction
    Parser.EmailParser parser;

    // Get emails from account
    List<string> emails = GetNewMails();

    for (int i = 0; i < emails.Count; i++)
    {
      // Here we call a method that returns the implementation class
      parser = Parser.EmailParser.GetParser(emails[i]);

      // Get relevant data from email, in the form <Key, Value>
      Dictionary<string, string> dataFromEmail = parser.GetData(emails[i]);

      // ... the rest of the code
    }
  }
}  

abstract class EmailParser
{
  internal static EmailParser GetParser(string email)
  {
    // Returns the implementation class based on email subject.
  }

  // Abstract method. All the derived classes must override this method, 
  // each one with his own implementation.
  internal abstract Dictionary<string, string> GetData(string email);  
}

// ACampaignParser, BCampaignParser and CCampaignParser share the same
// structure. Each one has their own implementation, based on the 
// original branch
class ACampaignParser: EmailParser
{
  internal override Dictionary<string, string> GetData(string email)
  {
    // The A-Campaign implementation
  }
}
  

The modifications in the code are very subtle:

  1. We have an abstraction, of type Parser.EmailParser.
  2. Instead of calling the GetData() method as before, we first get the class that inherits from Parser.EmailParser, based on the email Subject.
  3. Once received the parser, the data is recovered using parser.GetData().

With this, we can have as many classes derived from EmailParser as needed.

This refactoring has several advantages over our first implementation.

The first is that we can diminish the size of the GetData() method: it's not recommended to have very large methods, in LOC (lines of code). They are too difficult to maintain and to extend without making mistakes. Have you ever seen a method with, for example, 2 KLOC? I do, and believe it's not a pleasure to make changes on them. Practically you're coding and praying to not mess anything else. You feel like this:

BombSuit

Another advantage is that each implementation is completely isolated. This means that in case of an error in any of the parsers, you can isolate the problem and correct it efficiently, avoiding the rest of the code. Thank to this: you can use TDD to test the effectiveness of the implementation. Also, in case a new campaign appears, we only have to create a new class that derives from our base class. Then, we decide at some point which parser to use.

Chain of Responsibility

Despite our refactoring is complete, I would like to show a minor modification of it, using the Chain of Responsibility pattern.

The Chain of Responsibility is a behavioral pattern that allows to pass a request between a chain of objects, in sequential order. Each object check if he is responsible of the action, based on business logic or other condition. If not, it pass the call to his successor in the chain. The process of passing the message is executed until one of the objects responds to it.

Graphically it's like a relay race, in which each runner receives the message and passes it to the next. The difference arises in the stop condition: in a relay race the process ends when the last racer crosses the finish line. In this case the finish occurs when one of the objects responds to the message.

ChainOfResponsibility

From the image we can see:

  • We have a Client object that request something.
  • We have a Handler interface or abstract class that declares the handleRequest() method (this method handle the request from the client). Also, it has a successor variable of type Handler.
  • We have ConcreteHandler1 and ConcreteHandler2. If Handler is an interface, then both implements the Handler interface. If Handler is an abstract class, then both classes inherit from the Handler class.

We can implement the Chain of Responsibility in our codes, changing the way we can access to the implementations: instead of requiring the specific class, based on the subject, we just send the message to one handler. If the handler can process the message it returns the data. If not, tries to pass the message to the next in the chain:


class Program
{
  static void Main(string[] args)
  {
    // We only need one parser, which contains the complete chain
    Parser.EmailParser parser = Parser.EmailParser.GetParser();

    // Get emails from account
    List<string> emails = GetNewMails();

    for (int i = 0; i < emails.Count; i++)
    {
      // Get relevant data from email, in the form <Key, Value>
      Dictionary<string, string> dataFromEmail = parser.GetEmailData(emails[i]);

      // ... the rest of the code
    }
  }
}

abstract class EmailParser
{
  // This is the succesor in the chain
  private EmailParser _succesor;

  // Here we set the succesor
  private void SetSuccesor(EmailParser succesor) { 
    _succesor = succesor;
  }

  // This method returns the complete chain
  internal static EmailParser GetParser()
  {
    EmailParser aCampaignParser = new ACampaignEmailParser();
    EmailParser bCampaignParser = new BCampaignEmailParser();
    EmailParser cCampaignParser = new CCampaignEmailParser();
    EmailParser nullCampaignParser = new NullEmailParser();

    aCampaignParser.SetSuccesor(bCampaignParser);
    bCampaignParser.SetSuccesor(cCampaignParser);
    cCampaignParser.SetSuccesor(nullCampaignParser);

    return aCampaignParser;
  }

  // Here is the logic of the chain:
  // If we CanProcess (based on subject) then we try to get the data
  // If the data is null and we have a succesor, we call it.
  internal Dictionary<string, string> GetEmailData(string email)
  {
    Dictionary<string, string> data = null;

    if(this.CanProcess(email))
      data = this.GetData(email);

    if (data == null && _succesor != null)
    {
      data = _succesor.GetEmailData(email);
    }
    return data;
  }

  // Abstract methods for the execution itself: 
  // GetData() process the data
  // CanProcess returns true if the class can process the email, based on Subject.
  protected abstract Dictionary<string, string> GetData(string email);
  protected abstract bool CanProcess(string email);
}
  

With the Chain of Responsibility we only need to concatenate the successors, and to control the execution of the chain. The GetEmailData() is the Handler method in which the client (our application console) can connect with the chain. It controls the flow of the execution calling methods from the derived classes. This control flow is an example of the Template method pattern in which we define the skeleton of an algorithm in a method, leaving some execution to subclasses.

Sample code

You can see an example of this article in my github account. In this solution I have three different console applications for the same purpose: to read emails and process the data.

Solution

  • ProceduralImplementation is the first application, in which you have all the logic in the same method (GetData()).
  • ConditionalsToPolymorphism is the first refactoring. In this code we create an abstract class (EmailParser) and their derived classes ((A|B|C|Null)CampaignEmailParser). The creation method is in the GetParser() method of the abstract class.
  • ConditionalsToPolymorphismWithChain is the refactoring with Chain of Responsibility. The only difference with ConditionalsToPolymorphism is the Handler and the way we process the data: instead of getting first the responsible class based on subject, we just send the mail to the head parser in the chain. The handler relay the message to his succesor, in case he cannot process it.

Final thoughts

Take into account that it's a waste of time to over apply patterns to all the codes. As such, SOLID principles are an important part of good practices in coding, but abuse and overuse of them is counterproductive. Sure, you can decouple as much as possible and have a cathedratic code, but a what cost.

If we can extend our codes in a safe and practical way then our design is correct. If, otherwise, our superb extension can jeopardy our software, making it impractical to modify. For that reason you must consider the pros and cons of your design decisions.

In the sample, the succession of events were:

  • At first we only processed one campaign. Our simple code was enough.
  • Months later we needed to incorporate another different campaign. Ok, we still can handle a simple if/else if.
  • Now the message from the company was that we need to be prepared for multiple campaigns. At this point we cannot continue with the same conditional logic.

Why all of this come along? Because it's important to understand the how and the why and when in design patterns. If you are a team leader, it's a good idea to coach your crew in this best practices. The experience and a lot of trial and error can help you to identify when to apply what.

Greetings!

comments powered by Disqus