Refactoring your codes

Thu Feb 05 2015

Have you ever seen a code from another developer (or a group of them) where you have a method that makes a lot of stuffs? For example, a single method that consists of 0.5-1 KLOC? That method was made by you a few years ago, or you contributed on it? Or maybe you have two methods that differ in just a tiny lines of code, but the main structure is the same? (a despicable copy and paste).

You can see a lot of cases like these on legacy codes. The most years the system has, the more different developers/hands work on them and the more smell codes you can find. It's inherent to the nature of the systems: as long as more developers starts and finishes to work on these systems, you can find different hands and ways of making things. Perhaps some of them use more structured codes and tries to apply decoupling, separation of concerns, OO designs, etc., but other simply just start to patch and see if the fix corrects the issue, perhaps breaking some company standards by traversing directly to dependencies instead of interfaces, etc. Once the code is broke the design is more compromised. It's like a crack in a pool: if you don't fix it quickly the fissure can drain it entirely.

Of course, you cannot just throw away years of efforts just because you think the code has a lot of drawbacks, or maybe because you are the supreme god that understand how things need to be done. You must live with legacy code and the history behind it.

Suppose we have to fix certain part of the system that you identified, or maybe add some needed functionality to a method from a class that does not have a good OO design: it's just a bunch of procedural methods under an object oriented framework. Take, for example, this case (based on a real problem I had to fix):

public void MethodWithBug(a lot of parameters, bool isPayCheck)
    MethodWithPayCheck(a lot of parameters);
    MethodWithoutPayCheck(a lot of parameters);

private void MethodWithPayCheck(a lot of parameters)
  // ... a lot of logic

private void MethodWithoutPayCheck(a lot of parameters)
  // ... a lot of logic, almost the same from MethodWithPayCheck

The methods MethodWithPayCheck and MethodWithoutPayCheck were practically the same. They differ just in a few lines of code. I suppose the reason to make this separation was that conceptually you have two different logics. But, roughly speaking, there was just one implementation with a single diferentiation, based on the isPayCheck variable.

The big problem here is that when you need to add/change/fix the logic of the business logic you have to make additional efforts in order to maintain everything in order. This means that you need to maintain both methods in sync on each correction. If, for some reason, nobody remembered that you need to fix two methods instead of one, then at some point you have two different methods, that maybe can never be joined. Remember that the fixes can be executed at different stages (in different years also) and with different developers.

So, it's always a must for us to think proactively when we need to change or add functionalities to our codebase.

When to refactor

It's desirable that you always try to refactor the code in which you are going to make some changes. Especially when you see code smells. Of course, time and budget are a main factor in the development process that restricts a little our refactoring. It's obvious: we need to deliver value to our customers. You have to keep in mind that, trying to see when it's propitious to refactor. The rule of thumb, however, must be: if you cannot refactor, or if it's not necessary, at least be cautious to not introduce a bug or bad design in your modifications. That is:

  • Consider SOLID principles in your codes.
  • If possible, try to test your modification.
  • Never duplicate code. That only add costs to the maintenance process.

There is a lot of tips and good practices on refactoring. The important to note is where we want to go. A good horizont might be to have a system that follows SOLID principles. For example, a god object is a class that knows or does too much. It's a bottleneck in our development process when we know it has a lot of dependencies and that makes a lot of things.

Refactor as an add-on

So, you have to modify a method that has a bug (or you want to add a functionality), you think it needs to be refactored first but this action is not considered in the project schedule. The refactoring has to be secure and under no circumstances it has to add new bugs. One of the reasons why you see a lot of duplicated code is the fear or lost of confidence in minor modifications that can affect the system. Especially when you do not have a good testing policy.

In this case the most secure and almost free of errors is the extract method refactoring.

Extraction method

In this technique, basically you select a group of line codes and extract them into another method. For example, under Visual Studio:

RefactorExtract1 RefactorExtract2 RefactorExtract3

The extract method takes the required parameters and passes them to the new method. It's "almost" safe and the tools at present has good extraction techniques. About the overhead: I'm not really sure how much you lose in efficiency between a big unique code and a lot of mini calls, because of the overhead in value object and pointer references. Maybe you lose something, but the gain is much more because you can have a code that's easy to read and ready to be extended.

If you have a god method or class that's too big it's always a good idea to separate the logic in short methods that have a common meaning. That way your god method can be reduced to short method calls.

Be careful to be too confident with the extraction method. It can always make assumptions on the parameters and the logic you're encapsulating, so it's not a perfect tool. There are sometimes when the extraction send all the parameters to the new method, when it's not completely necessary.

What to extract: branching

Don't be too ambitious on the refactoring, especially when there is no assigned time for that. If you contribute with a little grain of sand it's better than nothing. Remember that.

If a method has a lot of branch logic, try to find common logic and separate them into methods. Another technique is to go from inner to outer branches. For example:

RefactorInnerOuter Inner to outer branch


What to extract: cyclomatic complexity

Cyclomatic complexity is a metric used to indicate the complexity of a program. It has a formal definition that's beyond this article, so for the sake of simplicity we can use it as a counter, considering:

  • It is calculated by developing a Control Flow Graph of the code that measures the number of linearly-independent paths through a program module (reference).
  • Complexity is determined by the number of decision points in a method. Decision points: if, else, switch, loops.

For example:

void CyclomaticCount(int a, int b, int c) // +1 for method entry
  if(a == 10)  // +1
    if(b > c)  // +1
      a = b;   
    else       // +1
      a = c;

// Cyclomatic complexity: 3

Visual Studio has an analysis tool that can measure cyclomatic complexity (Analyze/Calculate Code Metrics):


In this case you can see a number that has a score of 18. If we extract some lines of code from the method into another (and especially if we refactor various repeated lines of code in just one method), we can "balance" the complexity of the codes.

What's the magic number? The literature define certain ranges for the cyclomatic complexity:

  • 1 - 10 is considered normal.
  • 11 - 20 is considered moderate.
  • 21 - 50 is considered risky.
  • more than 50 is considered unstable.

If you have a method that has a score of 20 or more, then it's a good idea to think it as a target for refactoring.

Refactor as a project

When your company dedicates time for refactoring, you have more resources and time. In this case you can be more aggressive in the code modifications. The extract method is useful, but maybe you want to go ahead and do some optimizations on your code.

Based on Test Driven Development, or red/green/refactor, you always has to consider two important issues on refactoring:

  • The objective of the method/class has to be the same, before and after refactoring.
  • You can't break anything.

For both directives you need to work with care. In this case, the most confident way to make refactoring is to first cover your code based on unit testing. With this, you can:

  • Create unit tests to reply what your code does now.
  • Once assured your method is covered entirely by the unit test, refactor it.
  • Assure the tests have the same behaviour as before the refactoring.

If you make test coverage you can be more confident on the modifications you're about to realize. It's a more safe way to refactor, instead of the more easy but simpler extraction method.

How can we cover and unit test our codes if we maybe have methods with a lot of dependencies? Well, we have techniques for isolation that's very used in the TDD world:

  • Interface segregation.
  • Extraction to virtual methods or abstract classes.

Consider the following code that updates the debt for the clientes:

public void UpdateDebts()
  // Access to data layer
  DAL.Clients dalClients = new DAL.Clients();
  DAL.Payments dalPayments = new DAL.Payments();
  DAL.Debts dalDebts = new DAL.Debts();

  IEnumerable<Domain.Client> clients = dalClient.GetClients();
  IEnumerable<Domain.Payment> payments = dalPayments.GetPayments();
  IEnumerable<Domain.Debt> debts = dalDebts.GetDebts();

  foreach(Domain.Client client in clients)
    double totalPayment = 0;
    double totalDebt = 0;
    foreach(Domain.Payment payment in payments)
      if(payment.ClientId == client.Id)
        totalPayment += payment.Amount;
    foreach(Domain.Debt debt in debts)
      if(debt.ClientId == client.Id)
        totalDebt += debt.Amount;
    client.TotalDebt = (totalDebt - totalPayment);

It's a typical code based on table design*. Also, this method depends heavily on other resources (the Data Access Layer).

* All the entities are identified by their table Id (PK and Foreign key). I'm not saying that it's the best code on the Earth. In fact, it uses OO just because the language exists, but finally is nothing but a data algorithm. Badly designed, but it's what you will face in a lot of projects.

If we want to refactor this piece of code, the first thing is to create our unit tests that covers the algorithm. We have a strong dependency on DAL objects. As you can see, the method updates the debt for each client, so the data from external sources is not responsibility of the method. We can decouple it. Let's see how.

Interface segregation

The idea is very straightforward: use an Interface instead of the class implementation. The interfaces can be injected in the constructor, or passed as arguments. With this we can mock the data access in our unit tests:

public interface IClients { }
public interface IPayments { }
public interface IDebts { }

// Constructor injection
public class Calculation
  public Calculation(IClients clientsData, IPayments paymentsData, IDebts debtsData)
    _clientsData = clientsData;
    _paymentsData = paymentsData;
    _debtsData = debtsData;

  public void UpdateDebts()
    IEnumerable<Domain.Client> clients = _clientsData.GetClients();
    IEnumerable<Domain.Payment> payments = _paymentsData.GetPayments();
    IEnumerable<Domain.Debt> debts = _debtsData.GetDebts();
    ... logic

// Parameter injection
public void UpdateDebts(IClients clientsData, IPayments paymentsData, IDebts debtsData)
    IEnumerable<Domain.Client> clients = clientsData.GetClients();
    IEnumerable<Domain.Payment> payments = paymentsData.GetPayments();
    IEnumerable<Domain.Debt> debts = debtsData.GetDebts();
    ... logic

Extraction to virtual methods or abstract classes

In this case we extract the logic of data acquisition in virtual or abstract methods. Then, our unit test class inherits from the class to test and then we overwrite the methods:

public class Calculation
  public void UpdateDebts()
    IEnumerable<Domain.Client> clients = GetClients();
    IEnumerable<Domain.Payment> payments = GetPayments();
    IEnumerable<Domain.Debt> debts = GetDebts();
    ... logic


  public abstract IEnumerable<Domain.Client> GetClients() { }
  public abstract IEnumerable<Domain.Payment> GetPayments() { }
  public abstract IEnumerable<Domain.Debt> GetDebts() { }

// Unit test
public class CalculationTest: Domain.Calculation
  public UpdateDebts_TestCoverage()

  // Here we override the Calculation abstract methods
  public override IEnumerable<Domain.Client> GetClients() { // the mock }
  public override IEnumerable<Domain.Payment> GetPayments() { // the mock }
  public override IEnumerable<Domain.Debt> GetDebts() { // the mock }

Further reading

There is a lot of literature in refactoring. Most of the books serve as a guide on how to handle it. My experience in software development shows me that, however refactoring is very useful and in certain aspects is needed, not much companies invest in it. Maybe there is some bias vision on customer value: our customers only sees the new functionalities, not the best optimization on existing ones. Good or bad, a software does not mean to live forever and certain projects have a short lifetime.

If you want to go deeper in refactoring, I suggest the excelent book from Michael Feathers Working Effectively with Legacy Code. It covers different situations and has useful information based on the author's experience.

Testing and Refactoring Legacy Code from Sandro Mancuso is an excellent and basic tutorial on decoupling legacy code for test coverage and refactoring. It's explained in Java, but the code is easy to understand.

I strongly recommend to read the article from Mariusz Sieraczkiewicz on infoq. He points four big steps towards refactoring, from code refactoring to architecture refactoring. Is a good resume of the flow of work that you might consider in order to evolve your system.


comments powered by Disqus