Ultimate Architecture Enforcement: Prevent Code Violations at Code-Commit Time

Author: Paulo Merson

If you’re a more pragmatic than avid reader, feel free to jump to the solution section of this post to read about my experience using Checkstyle and pre-commit hooks on Subversion to verify the conformance between code and architecture, and more.

Foremost, source code must address the functional requirements and should not have bugs. We usually verify these qualities through testing–absolutely important, but not the subject of this post. Quality source code also involves simple things such as following code conventions and programming best practices, as well as more sophisticated requirements such as modularity, low coupling, and extensibility. The latter characteristics are often achieved through a carefully crafted software architecture.

The Problem

When the software development and maintenance effort involves several programmers and spans months or years, a common phenomenon takes place: the source code exhibits an actual architecture that gradually diverges from the intended architecture. The reasons for that may include

  • The intended architecture is not properly communicated to everybody who’s writing code.
  • There’s high turnover in the team, and newcomers are not familiar with the architecture.
  • Developers are assigned maintenance tasks and take shortcuts in the code, disregarding the now-forgotten intended architecture.
  • Developers of large systems don’t see the big picture of the system.
  • New design decisions are made that modify the intended architecture. This evolution is natural and welcomed. However, it’s too expensive to refactor old code to comply with the new design. All of a sudden, the once-compliant old code doesn’t conform to the (new) intended architecture anymore.

Consequences

When the discrepancy between the intended architecture and the source code grows uncontrolled, maintainability is impaired. The introduction of non-conformant code dependencies (shortcuts) makes the code brittle, hard to understand and to maintain. But we lose more. Design decisions in the intended architecture are aimed at achieving certain qualities, such as reliability, security, modifiability, performance, portability, and interoperability. If the code at some point departs from the architecture, these qualities can be negatively affected.

Addressing the Problem

To keep the source code compliant with the software architecture over time, there are two main things to do: one is to properly communicate the architecture to the developers, and the other is to actively check that the source code follows the intended design. The first thing is essential–we even wrote a book about it–but is not the subject of this post. The second thing is architecture enforcement.

There are many tools and techniques that can be applied for architecture enforcement. The poor man’s solution is manual code review performed by an authority on the architecture being enforced. Actually, this solution is laborious and hence quite expensive–the “poor man” probably can’t afford it. More efficient alternatives include the use of static analysis tools. Some tools give you a reverse-engineered picture of the architecture found in the code, so that you can visually compare it with the intended architecture. Other tools can even contrast the actual architecture found in the code with the intended design automatically.

A Solution Using Checkstyle and an SVN Pre-Commit Hook

Checkstyle is a free open-source static-analysis tool for Java. Like other tools, out of the box it can check for code conventions and numerous programming best practices. Unlike most tools though, Checkstyle offers an API that allows the implementation of “customized checks.”

A check is a Java class that is called when Checkstyle is parsing a Java file. The check is given that file’s AST and can inspect each token and look for constructs that represent a violation of some sort. For example, let’s say your system uses data access objects (DAO) to access the database. DAO classes can be identified by the class name (e.g., “Dao” prefix), by extending an abstract Dao class, or by a specific annotation (e.g., “@Dao”, “@Repository”). Now suppose your architecture dictates that only code in the “service” layer can use DAO classes. Suppose the specific layer is identified by a package namespace. It’s now easy to create a Checkstyle check that will enforce that rule:

/**

 * Classes prefixed by Dao can’t be used
 * outside com.mycompany.mysystem.service.*

 */

public classCheckNonServiceUsesDao extends Check {

  private boolean inServiceLayer; 

  @Override

  public int[] getDefaultTokens() {

    return new int[] {TokenTypes.PACKAGE_DEF, TokenTypes.IDENT};

  }

  @Override

  public void visitToken(DetailAST aAST) {

    if (aAST.getType() == TokenTypes.PACKAGE_DEF) {

      inServiceLayer = false;

      String packageName = fullyQualifiedPackage(aAST);

      if (packageName != null &&

          packageName.startsWith(“com.mycompany.mysystem.service”)) {

        inServiceLayer = true;

      }

    } else if (aAST.getType() == TokenTypes.IDENT && !inServiceLayer) {

      if (aAST.getText().startsWith(“Dao”)) {

        log(aAST.getLineNo(),

          “Classes outside the service layer can’t call Dao classes”);

      }

    }

  }

}

 

A Subversion (SVN) hook is a program that can be configured in the subversion repository. The hook is automatically invoked when there is a commit operation. We created a pre-commit hook that simply performs a number of checkstyle customized checks on the Java source files that are the subject of the commit operation. If any of the checks detects a violation, the commit operation fails and the user gets an error message indicating the source file, line number, and a description of the problem.

Our Experience

There are two general approaches that architecture teams follow to try to enforce the architecture. One is to act as the “architecture police” to make sure developers are following the published architecture. The other is to mentor and work closely with developers to make sure they understand and naturally follow the architecture. You may be thinking that automated checks that bar source-code commits because of violations fall within the first approach. In our experience, it’s the opposite. To understand why, consider our setting:

  • Every new developer goes through quick training on the architecture.
  • The multi-view architecture documentation is published on a wiki. Each architecture view contains a section describing the rationale for the design decisions.
  • The architecture team often engages in valuable discussions with developers that result in improvements to the architecture.

Despite all efforts to socialize the architecture knowledge, violations crop up here and there. So, we configured the SVN hook to send an email to the architecture team whenever a developer tries to commit source code that violates the architecture. These email notifications expose lack of understanding of the architecture to which we can respond immediately. We contact the developer to further explain why and how does his/her code violate the architecture. These email messages have been an incredibly effective mechanism to raise awareness of the architecture and coding best practices.

The Life Cycle of a Customized Check

We have created more than 30 customized checks. The process we follow is like this:

  1. Identify an architecture rule that can be expressed in syntactic terms of the Java code.
  2. Program the customized check.
  3. Generate a report that pinpoints all the violations in the code base. Sometimes there’s only a handful, sometimes there are thousands of them.
  4. Go/no-go decision with respect to manually fixing the violations. The best scenario is when we can fix the violations.
  5. Once the number of violations for a customized check is down to zero, the check is enabled in the SVN pre-commit hook. No new violations are added to the code base from then on.
  6. Even when we can’t fix all the violations, the check can still be enabled on SVN. In this case, we either enable it only for SVN add operations (new code) or we adapt the check to ignore the modules with violations (they’re considered acceptable violations).

Conclusion

Checkstyle customized checks have limitations, but are simple to create and yet powerful. In addition to architecture enforcement, we have used customized checks to enforce good coding practices, such as proper exception handling. More recently, a number of checks were implemented to detect security issues in the code, such as SQL injection vulnerabilities in JDBC and Hibernate programming.

For many years, I’ve studied and applied solutions for architecture enforcement. This is the first solution I see that is simple to implement, scales up to all code base, allows for continuous verification, and (discreetly) names the developers who need further clarification about architecture and other implementation rules.

About these ads

7 responses to “Ultimate Architecture Enforcement: Prevent Code Violations at Code-Commit Time

  1. Good to read about those enforcements in checkstyle. For big projects or many professionals, it is a must have.
    I would suggest using sonarsource.org instead svn hooks, it provides a better and unified experience. Take a look at http://nemo.sonarsource.org/drilldown/violations/176190 where they use checkstyle to present a nice view.

    • Claudio, in this organization we already use sonar along with checkstyle, pmd, findbugs, and sonar verifications in some projects. Many of the built-in generic verifications in these tools are enabled. The dashboard and the reports do provide a nice visual picture of the code quality. However, the goal of the approach described in the post goes beyond that and is twofold: creation of *customized* checks that take into account your own architecture and homegrown classes; assurance that code *is not committed* to the repository if it contains violations.

  2. Merson, such a nice post. I really enjoy it, mainly the integration of the approach with SVN hook. Amazing!

    Our research group at UFMG has proposed an approach called DCL, which is allows architects to define acceptable and unacceptable dependencies in Java systems. For instance, we could specify the constraint of the aforementioned example as follows:
    > module ServiceLayer: com.mycompany.mysystem.service.*
    > module Dao: “.*(.Dao).*”
    > only ServiceLayer can-depend Dao

    Take a look in DCLsuite (www.dclsuite.org), an Eclipse plug-in that detects architectural violations as soon they appear in the source code (incremental building). Still more interesting, DCLsuite also provides recommendations on how to repair the detected violations.

    Furthermore, the API provided by Checkstyle remembers me DesignWizard (www.designwizard.org) proposed by Brunet et al. at UFCG. Using DesignWizard API for architecture enforcement could be even easier than Checkstyle API. Take a look!

  3. This is a great statement of the problem and solution strategy. I was not aware that CheckStyle accepted plugins or that they are so easy. This is news that I can use!

    We have often used static analysis as part of a CI server suite to verify conformance to the agreed standards and found it to be effective and well received by the developers. Most developers really want to do great work, so when asked to conform to a set of rules they believe in, it’s an easy sell. The enforcement is merely a reminder about their agreement to the rule set.

  4. I understand that since this can only enforce module viewtype concerns it will be limited — I’m wondering what your experience has been with actual constraints you want to express. For those, what kind of hit rate do you think you’ve had? I could easily imagine that while in theory this seems like a quite limited technique, in practice you might be able to get pretty good coverage on what you want to enforce, if only indirectly.

    • Let’s say I need to identify a class as pertaining to a given layer (or subsystem). In order to do that, the class must belong to a java package, have a name with a given prefix or suffix, or have a specific annotation. So, this approach relies on a well-designed java package namespace and naming conventions. Fortunately, this is common place these days. Example: JUnit test classes often have the Test or Tests suffix.
      Once a customized check is done, we generate a report. If there are false positives in the report, we customize the check further. Example: say you want to prevent code in the presentation layer to make calls to the JDBC or Hibernate API. You generate the report and find out that SpecialClassXyz *is* in the presentation layer and (exceptionally) has to make a JDBC call. Then we just add an “if (SpecialClassXyz) then OK” to the customized check.
      Thus, we can trust the all check findings are indeed violations. OTOH, the hit ratio is not 100% in some cases. Example: we created a check to spot SQL injection vulnerabilities. It looks for SQL statement strings used in the JDBC and Hibernate APIs. Then it locates the String literal assignment and looks for variable concatenation–that is the exploit because the coder concatenated variables instead of using the safe variable binding mechanism. Well, if the SQL statement string is passed as a parameter to a public method, the check can’t parse the actual value and hence cannot tell if there’s a problem (the actual value of the parameter is only visible at runtime).
      Bottom line is that if you’re familiar with they way code is created in your organization, your customized checks are likely to have 100% or near 100% hit ratio.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s