SATURN 2015: Living a Nightmare, Dreaming a Dream: A Drupal Deployment Dilemma (Session Notes)

Gail E. Harris, TVOntario

TVOntario’s mandate since 1970 has been to “use electronic and associated media to provide educational opportunities for all people in Ontario.” TVO has embraced the internet and mobile space: they support an online high school degree, games for children and curriculum, and current affairs documentaries.

Harris began at TVO as a web development manager and architect. Some weeks into the job, the development team had planned a new release, and the system administrator complained that no one could ever remember how deployments work. But Harris discovered that there were excessive amounts of time, code freeze, and person effort for releases. So much time was spent doing releases that there was not much time left for development. Releases included verifying the stage environment, verifying that the test release works, taking the site offline at midnight for hours, and then performing the same regression tests again. They were doing the same set of tests three times, and they really didn’t understand why.

This led Harris to her dream, to do three “easy” things: change the way they wrote Drupal code, introduce automated tests, and introduce automation into the release process. She decided to introduce automated tests first because they would provide some value even if she didn’t accomplish the other two things. A few months later, they didn’t have a suite of tests to depend on, only a small number of tests. But some changes in staff and some new positions held the opportunity for focusing on automated tests and recruiting to meet the newly defined needs.

Next, Harris tackled coding practices. She wanted to stop the practice of point-and-click; everything had to be in code. The developers needed to understand the benefit of this practice, and that meant understanding Drupal’s backend better. They decided to turn Drupal into a read-only mode, which is not hard, but it meant that in order to deploy during the day, rather than midnight, they needed to deploy fast because they couldn’t stop TVO’s other business functions for hours. They solved this problem by using load balancers so they could deploy on half of the servers at a time.

Organizational culture issues were more difficult to overcome than the technical issues. Developers and system administrators weren’t used to collaborating. Project managers needed to learn to include infrastructure work in the backlog. Learning to trust automation was also a challenge, and they had to recruit people with new skills to deal with these capabilities. It was also crucial to have executive support for the process changes required to deploy during the day. One key factor was the two hats she wore: director and architect. Her job was establishing and coaching development staff as well as creating the technology roadmap and system development lifecycle. She had the authority to say “we must automate tests” or “no more point and click,” as well as the authority to change job descriptions. In many organizations, these responsibilities are split, and there are a greater number of people to influence and inspire to change the way the job gets done.

Some lessons learned include (1) maintainability drives architecture decisions, (2) architecture decisions drive changes in work habits, (3) changes in work habits drive recruiting and organizational structure decisions, and (4) technology choice matters least of all.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s