SATURN 2014 Art and Science of Scalability Session (notes)

Notes by Ziyad Alsaeed, edited by Tamara Marshall-Keim

BI/Big Data Reference Architectures and Case Studies
Serhiy Haziyev and Olha Hrytsay, SoftServe, Inc.

Serhiy and Olha shared their experience with the tradeoff between modern and traditional (non-relational and relational) reference architectures. They looked into the challenges associated with each approach and gave tips from real-life case studies on how to deal with big data reference architecture. As a reminder, they visited some of the known big data challenges:

  • Data is generated from many and different sources.
  • As data grows, it becomes complicated and heterogeneous (velocity and volume) until it’s no longer manageable.

To obtain meaningful information from unstructured big data, you need to adapt new architectural approaches, intelligence, and tactics to discover hidden patterns. In modern data analysis, an engineer should focus on questions like “What will happen in the future?” or “Are there any risks that I should be aware of?” instead of questions like “What happened?” or “Why did it happen?” in the traditional structure.

Serhiy and Olha presented three major data analytic use cases: data discovery, business reporting, and real-time intelligence. They also showed different reference architectures based on different business architecture drivers. Extended relational, non-relational, and hybrid are the three major reference architectures that they had experience with. They showed in detail where each approach could encounter architectural issues like bottlenecks and the proposed mitigation for such issues. Also, they showed where each approach could be useful against different business drivers. Finally, they described some of their experience with different use cases from real-life examples and how the user requirements led them to adopt different reference architecture approaches and technologies.

Presentation link: http://resources.sei.cmu.edu/asset_files/Presentation/2014_017_101_89659.pdf

MapReduce over a Petabyte Is the Easy Part: Some Important Big-Data Practicalities
Jeromy Carrière, Google

Jeromy’s goal for the presentation is not really specific to big data. However, he wants to share some of his experience with Google in using various open-source technologies to solve common systems issues. Jeromy first explained how improving algorithms and strategies is important. For example, Google was processing petabytes of data stored in approximately 8,000 hard drives in 6 hours in 2008, while it was able to process the same amount of data in 33 minutes in 2011. This improvement helped in dealing with hardware failure.

Jeromy’s main responsibility with Google is related to cluster management systems. This involves scheduling, monitoring, and logging in at a significantly scaled level. To set the context of his presentation, Jeromy also described what clusters are composed of and how sets of clusters are managed logically by what they call cells. Cells are where cluster management happens. Then Jeromy described how users interact with these machines. Users send what Google defines as jobs. Jobs could contain one task to thousands of tasks. These tasks then run on different physical machines. Google has different configurations that help manage physical resources such as memory and CPU allocation dynamically based on the tasks received. As Jeromy described, Google has two kinds of jobs. One type is services, which are latency sensitive, such as searching on Google or opening an email. The other type is batches, which are not latency sensitive. Then Jeromy described the many types of failure possibilities, starting from machine failure, network interruption, and power outage and ending with rare things like dogs chewing wires!

Given the information above, Jeromy started describing his goals. For example, he wants everything to work properly. Also, he needs to make sure that everything is predictable and that his users (Google engineers) understand what’s happening by giving them meaningful feedback in case of failure. Moreover, he needs to think about an important tradeoff between resource efficiency and innovation efficiency.

Jeromy then discussed some of the technologies that helped him achieve his goals and the issues associated with them:

  • Build versioning deployment. The challenge with build versioning deployment is how someone can get the right version of the right software in the right place at the right time continuously. And more important than that is how to automate such processes, since they can’t afford any other option at the level of scaling required. To resolve such an issue, there are many open-source solutions such as Jenkins, the Hadoop ecosystem, Chef, Puppet, SaltStack, and Docker (he touched briefly on each one).
  • Configuration. The question is how to control the behavior of these large systems and avoid detailed specifications. Questions like how to distribute or scale should be answered automatically.
  • Reliability. The most important factor in this subject is that nothing should be unpredictable. That includes the monitoring system itself.
  • Usability. Here, the goal is to make the system accessible for the users. Sawzall was a good solution for this goal; however, Dremel is now the system used by Google to achieve this goal.
  • Monitoring and logging. The goal is not only to monitor the services you’re providing but also to make sure that your monitoring systems are reliable. Some of the solutions could be systems like Stackdriver, which Google acquired recently.
  • Utilization. The goal is to manage the non-latency-sensitive tasks. For this topic, Jeromy showed graphs of some of their achievements in reducing load on their systems using good utilization techniques.

As a closing thought, Jeromy stated that the ad hoc tools are where we get innovation from, but we need to make sure that we use them in an efficient way. The challenge is how to bring all those pieces together.

Service Variability in Multi-Tenant Engineering: A Systematic Literature Review on the State of Practice, Limitations, and Prospects
Ouh Eng Lieh, National University of Singapore

The primary goals of Ouh’s presentation are to share his experience and touch on topics like multi-tenancy and variability, service architectural choices, and experience reports. Ouh described a real-life example of multi-tenancy from Singapore shops. The primary goal of the service providers is to make sure that tenants can share the resources for the idea to be feasible. On the other hand, the tenants’ primary goal is to have isolation from other tenants. The same thing applies in the software industry, Ouh stated. When designing multi-tenant services, you will need to make architectural decisions in the following areas:

  • Service hosting: Is the service dedicated to many tenants or only one tenant?
  • Service packaging: How are the components packaged? Are they packaged for any tenant or a specific one?
  • Service binding: Do you want to do static or dynamic binding?

The service architectural models available are either fully shared, partially shared, or not shared. Most service providers are trying to adopt the fully shared model, but it’s generally hard to achieve.

From their experience, it was hard to figure out the scope of what is configurable and what is sharable. They tried to go for the fully shared model, but they ended up with partially shared due to customers’ different specifications. The primary takeaway they found is that a one-size-fits-all solution actually doesn’t fit all.

Presentation link: http://resources.sei.cmu.edu/asset_files/Presentation/2014_017_101_89593.pdf

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s