Developer's black hole - openstack

PoC: Deploy MariaDB, Keystone and Glance with TripleO on Kubernetes

2017-09-18T00:00:00+02:00

I recently posted a small screencast showing part of the progress I've made on the research to deploy OpenStack services on Kubernetes using TripleO.

In this new screencast, I would like to demo a small deployment of Keystone, Glance, and mariadb using the TripleO undercloud deploy command.

What's really new in this screencast is the ability for the APBs being used to bootstrap the services. These new roles create the databases, run the initial migrations and register the endpoints in keystone. Here's the video:

NOTE: Sorry for the small font

What if I want to play with it?

Here's a small recap of what's needed to play with this PoC. Before you do, though, bear in mind that this work is in its very early days and that there are many things that don't work or that could be better. As usual, any kind of feedback and/or contribution are welcome. Note that some of the steps below require root access

1# Clone the tripleo-apbs repository and its submodules:

git clone --recursive https://github.com/tripleo-apb/tripleo-apbs

2# Build the images you want to run:

./build.sh mariadb

 ./build.sh glance

 ./build.sh keystone

3# Clone the undercloud_containers repo and run the doit.sh script. This repo is meant to be used only for development purposes:

git clone https://github.com/flaper87/undercloud_containers

4# Prepare the environment

 cd undercloud_containers && ./doit.sh

5# Deploy the undercloud (as root)

cd $HOME && ./run.sh

The doit.sh scripts uses my fork of tripleo-heat-templates, which contains the changes to use the APBs. It's important to highlight that this fork doesn't introduce changes to the existing API. You can see the comparison between the fork and the main tripleo-heat-template's repo here

Any feedback is welcomed! Remember this is a PoC and there's just 1 guarantee: It may fail ;)

Deploy mariadb on kubernetes with TripleO

2017-08-07T00:00:00+02:00

I've spent quite some time researching how we can migrate TripleO from deploying OpenStack on baremetal to Kubernetes. This work has been going on for around a year already and it started with a migration from baremetal to Docker. Now that this first migration is almost done, I've moved to research how we can do the final migration to Kubernetes.

As in most of our works, we're striving the least possible, backwards compatible, changes. To do this, I've focused on 3 main areas for now:

Unified configuration management: Migrate out of puppet for configuration management and adopt a solution that can be shared across different projects in OpenStack.
Re-use of existing data: Don't require greenfield deployments but be able to consume the existing data - hiera files, basically.
Re-use existing templates and libraries: Avoid rewriting all the templates that have been written already for the first, docker based, migration. There are libraries, CLI tools, and API's that were developed for the first phase that can be re-used in the second one to reduce the amount of work needed.

I'm not planning to go into great detail in this post on what has been done in each area - I'll do that in future posts - but rather show a small screencast that features the TripleO undercloud command deploying mariadb on Kubernetes.

The code used in this screencast includes ansible-role-l8s-mariadb, and ansible-role-l8s-tripleo. The changes to tripleo-heat-template have not been published yet. I'll work on that and update this post (you can see the mysql.yaml file in the video, that's all you need to change).

On communities: When should change happen?

2017-02-13T00:00:00+01:00

One common rule of engineering (and not only engineering, really) is that you don't change something that is not broken. In this context, broken doesn't only refer to totally broken things. It could refer to a piece of code becoming obsolete, or a part of the software not performing well anymore, etc. The point is that it doesn't matter how sexy the change you want to make is, if there's no good reason to make it, then don't. Because the moment you do, you'll break what isn't broken (or known to be broken, at the very least).

Good practices are good for some things, not everything and even the one mentioned above is not an exception. Trying to apply this practice to everything in our lives and everywhere in our jobs is not going to bring the results one would expect. We will soon end up with stalled processes or even worse, as it's the case for communities, we may be dictating the death of the thing we are applying this practice on.

When it comes to communities, I am a strong believer that the sooner we try to improve things, the more we will avoid future issues that could damage our community. If we know there are things that can be improved and we don't do it because there are no signs of the community being broken, we will, in most cases, be damaging the community. Hopefully the example below will help understanding the point I'm making.

Take OpenStack as an example. It's a fully distributed community with people from pretty much everywhere in the world. What this really means is that there are people from different cultures, whose first language is not English, that live in different timezones. One common issues with every new team in OpenStack is finding the best way to communicate across the team. Should the team use IRC? Should the team try video first? Should the team do both? What time is the best time to meet? etc.

The defacto standard mean of communication for instant messaging in OpenStack is IRC. It's accessible from everywhere, it's written, it's logged and it's open. It has been around for ages and it has been used by the community since the very beginning. Some teams, however, have chosen video over IRC because it's just faster. The amount of things that can be covered in a 1h-long call are normally more than the ones covered in a 1h-long IRC meeting. For some people it's just easier and faster to talk. For some people. Not everyone, just some people. The community is distributed and diverse, remember?

Now, without getting into the details of whether IRC is better than video calls, let's assume a new (or existing team) decides to start doing video calls. Let's also assume that the technology used is accessible everywhere (no G+ because it is blocked in China, for example) and that the video calls are recorded and made public. For the current size and members of the hypothetical team, video calls are ok. Members feel comfortable and they can all attend at a reasonable time. Technically, there's nothing broken with this setup. Technically, the team could keep using video calls until something happens, until someone actually complains, until something breaks.

This is exactly where problems begin. In a multi-cultural environment we ought to consider that not everyone is used to speaking up and complaining. While I agree the best way to improve a community is by people speaking up, we also have to take into account those who don't do it because they are just not used to it. Based on the scenario described above, these folks are still not part of the project's team and they likely won't be because in order for them to participate in the community, they would have to give up part of who they are.

For the sake of discussion, let's assume that these folks can attend the call but they are not native English speakers. At this point the problem becomes the language barrier. The language barrier is always higher than your level of extroversion. Meaning, you can be a very extrovert person but not being able to speak the language fluently will leave you off of some discussions, which will likely end up in frustration. Written forms of expression are easier than spoken ones. Our brain has more time to process them, reason about them and apply/correct the login before it even tries to come out of our fingers. The same is not true for spoken communication.

I don't want to get too hung up on the video vs IRC discussion, to be honest. The point made is that, when it comes to communities, waiting for people to complain, or for things to be broken, is the wrong approach. Sit down and reflect how you can make the community better, what things are slowing down its growth and what changes would help you be more inclusive. Waiting until there is an actual problem may be the death of your community. The last thing you want to do is to drive the wrong people away.

If you liked this post, you may also like:

On communities: Trading off our values... Sometimes

2017-01-19T00:00:00+01:00

Not long ago I wrote about how much emotions matter in every community. In that post I explained the importance of emotions, how they affect our work and why I believe they are relevant for pretty much everything we do. Emotions matter is a post quite focused on how we can affect, with our actions, other people's emotional state.

I've always considered myself an almost-thick skinned person. Things affect me but not in a way that would prevent me from keep moving forward. Most of the time, at least. I used to think this was a weakness, I used to think that letting these emotions through would slow me down. With time I came to accept it as a strength. Acknowledging this characteristic of mine has helped me to be more open about the relevance of emotions in our daily interactions and to be mindful about other folks that, like me, are almost-thick skinned or not even skinned at all. I've also come to question the real existence of the so called thick-skinned people and the more I interact with people, the more I'm convinced they don't really exist.

If you would ask me what emotion hits me the most I would probably say frustration. I'm often frustrated about things happening around me, especially about things that I am involved with. I don't spend time on things I can't change but rather try focus on those that not only directly affect me but that I can also have a direct impact on.

At this point, you may be wondering why I'm saying all this and what all this has to do with both, communities and with this post. Bear with me for a bit, I promise you this is relevant.

Culture (as explained in this post), emotions, personality and other factors drive our interactions with other team members. For some people, working in teams is easier than for others, although everyone claims they are awesome team mates (sarcasm intended, sorry). I believe, however, that one of the most difficult things of working with others is the constant evaluation of the things we values as team members, humans, professionals, etc.

There are no perfect teams and there are no perfect team mates. We weight the relevance of our values everyday, in every interaction we have with other people, in every thing we do.

But, what values am I talking about here?

Anything, really. Anything that is important to us. Anything that we stand for and that has slowly become a principle for us, our modus operandi. Our values are our methods. Our values are those beliefs that silently tell us how to react under different circumstances. Our values tell us whether we should care about other people's emotions or not. Controversially, our values are the things that will and won't make us valuable in a team and/or community. Our values are not things we posses, they are things we are and believe. In other words, the things we value are the things we consider important that will determine our behavior, our interaction with our environment and how the events happening around us will affect us.

The constant trading off of our values is hard. It makes us question our own stances. What's even harder is putting other people's values on top of ours from time to time. This constant evaluation is not supposed to be easy, it's never been easy. Not for me, at least. Let's face it, we all like to be stubborn, it feels go when things go the way we like. It's easier to manage, it's easier to reason about things when they go our way.

Have you ever found yourself doing something that will eventually make someone else's work useless? If yes, did you do it without first talking with that person? How much value do you put into splitting the work and keeping other folks motivated instead of you doing most of it just to get it done? Do you think going faster is more important than having a motivated team? How do you measure your success? Do you base success on achieving a common goal or about your personal performance in the process?

Note that the questions above don't try to express an opinion. The answers to those questions can be 2 or more depending on your point of view and that's fine. I don't even think there's a right answer to those questions. However, they do question our beliefs. Choosing one option over the other may go in favor or against of what we value. This is true for many areas of our life, not only our work environment. This applies to our social life, our family life, etc.

Some values are easier to question than others but we should all spend more time thinking about them. I believe the time we spend weighting and re-evaluating our values allow us for adapting faster to new environments and for us to grow as individuals and communities. Your cultural values have a great influence in this process. Whether you come from an individualist culture or a collectivist one (Listen to 'Customs of the world' for more info on this) will make you prefer one option over the other.

Of course, balance is the key. Giving up our beliefs every time is not the answer but not giving them up ever is definitely frustrating for everyone and makes interactions with other cultures more difficult. There are things that cannot be traded and that's fine. That's understandable, that's human. That's how it should be. Nonetheless, there are more things that can be traded than there are things that you shouldn't give up. The reason I'm sure of this is that our world is extremely diverse and we wouldn't be were we are if we wouldn't be able to give up some of our own beliefs from time to time.

I don't think we should give up who we are, I think we should constantly evaluate if our values are still relevant. It's not easy, though. No one said it was.

On communities: Empower humans to be amazing

2016-11-29T00:00:00+01:00

When it comes to communities, a system is the set of processes you put in place to allow for humans to be amazing. It's the means to empower these humans to contribute to your community, learn from it and grow with it.

These systems are essential for your community to exist. Your community is a system in itself and it functions through the processes that have been created along the way. These processes exist even if you're not aware they do. The processes that help your community function are not some kind of magic tunnel through which things happen automatically. These are processes created and tailored for your community. There are many things that can be shared across different communities but there are others that are simply specific to yours.

The way you merge code, the means through which you communicate in your community, the standards you put in place. These are all processes that allow your community to cope with growth and chaos. These processes are all meant to be created, evolve and sometimes die. If you want your community to survive change, you must change your community therefore you must change your processes.

There's no secret recipe for managing these processes, though. One thing to always keep in mind is that the humans that interact with these processes are more important than the processes themselves. If the way you review code is too complex for most of the humans that are doing reviews, change it. If the way you define new processes doesn't allow other humans to actively participate, change it. If the way you allow for contributions to be submitted ends up frustrating your contributors, change it. Your community is made by humans and the sooner you acknowledge that, the sooner you'll adapt your processes to better empower these humans. Remember that humans react to emotions (you can read more here) but they act based on their cultures.

Culture has been defined in many ways. Some definitions involve long explanations about society, evolution and human interaction. When it comes to communities a, perhaps oversimplified, way to define culture is that it's the way humans in the community actually do things. I heard this definition at Zingtrain in June 2016 and it stuck with me. Not only it makes sense on paper but it's also true in reality.

We spend a huge deal of time defining new processes in OpenStack that would help the community evolve and adapt itself and we see over and over how many of these processes change as soon as other humans start interacting with them. Sometimes this interaction ends up in processes being "officially" changed and many times they are left as they are so people can use them the way it works best.

In other words, the processes in your community will be bent by the cultures in your community more often than not and this is fine. You want this to happen. You want your community to adapt itself to the cultures that it embraces. You want your community to embrace more cultures and to allow for new cultures to be created within the community itself. Different cultures have different ways to solve problems and there's lots your community can learn from this.

Your community must be flexible for it to be able to adapt itself and tolerate the bending of its processes caused by the interaction with other cultures. If the processes in your community can't be bent a bit, then some cultures won't be able to interact with them and this will, of course, affect your community.

For humans from different cultures (even the same culture, really) to be able to interact with each other, they must be tolerant to variance. Putting the tolerance to variance at the bases of your community will set the principle that humans in your community will interact based on. There must be balance, though. Being either too tolerant or too intolerant won't help your community. You can't make everyone happy but you definitely must make enough humans happy.

Eventually it will come down to how good your community is at allowing humans to interact with each other and I believe a good summary of this post could be:

You must tolerate variance in your community, to empower humans, hopefully from any culture, to be amazing.

If you liked this post, you may also like:

If you liked this post, you may be interested in the keynote I gave at Pycon South Africa. Keeping up with the pace of a fast growing community without dying

On communities: Emotions matter

2016-11-17T00:00:00+01:00

Technology is social before it's technical - Gilles Deleuze

Humans are driven quite a bit by emotions. You may be a rational human being but your emotions will still drive many of your choices. You can be excited, angry, interested, or sad about things. It doesn't matter, you'll react to those emotions and you'll very often leak that into your communications.

You'll likely leak your emotions and so will other members of the community. If you think humans should suck it up and act like nothing is happening, I'm afraid you are living in a bubble. That is not how humans operate. That's not how humans interact.

Some humans know this and these humans should make sure other humans know this as well. Emotions matter and they affect our daily tasks. Emotions take control over many times during our day and they determine how our day will go. Being thick skinned doesn't really matter. It just means you can control your emotions a bit more than others but you still react to them. You react in a different, perhaps more controlled, way but you still react to your emotions.

There's nothing wrong about this, though. It's one of the things that makes us humans. Emotions are fine, we just need to learn how to deal with them and how they may or may not affect our days. We need to learn how to be smart about our emotions and we need to learn how to react to each one of our emotions. We need to be observant of ourselves and others

I don't think we should turn communities into therapy sessions. But I do think we should learn how to communicate in a open and friendly way. We can be smarter in our interactions with other members of the community. If we're all observant of our emotions, we'll be able to act properly whenever things are not going exactly well.

I try to keep a low profile whenever my days are not going exactly well. I do this because I don't want other people to be affected by this. This doesn't mean I don't talk to other members. Keeping a low profile in the community doesn't mean you should hide your emotions from everyone. To me, it means I just don't engage as much on discussions that require my entire focus.

On the other hand, if you notice someone is having a bad day you can choose to either reach out or to not make their day worse. You should care, your community is made by humans not laptops.

So, assume good faith or better, always communicate in a friendly way regardless of what your current emotions are or what other member's current emotions are. Don't loose it and focus on what really matters. I know this is all easier said than done but hey, I'm on the same boat.

If you liked this post, you may be interested in the keynote I gave at Pycon South Africa. Keeping up with the pace of a fast growing community without dying

Embracing new languages in OpenStack

2016-11-07T00:00:00+01:00

OpenStack has been an (almost) Python-only community for a very long time. Other programming languages have been used for very specific use cases - UI, configuration files, deployment tools, for example - but never for OpenStack's API services until now.

During the Newton cycle, a resolution to accept Go as an official programming language for OpenStack was brought up to the OpenStack's Technical Committee for evaluation. The topic was discussed on several meetings, mailing list threads and on the resolution itself. I won't get into the details of the discussion and how it evolved but I do want to provide some details about the decision, which will be useful for the rest of the post.

The decision was to reject Go as an official language for the time being and be open to re-evaluate a proposal like this again in the future. I believe the reasons behind this rejection can be summarized as follow:

Some members of the TC were concerned about the impact adding a new language would have had in the community. Would accepting the language split the community? Would this create new silos? Would accepting the language raise the bar for new members (experienced and not)?
Some members of the TC were concerned about the lack of information, research, and work on known common areas that exist today in the community. How would the Go code be shared across the community? Would there be a goslo project (thanks Thierry for the name)? What about authentication? What about the messaging layer? How to produce releases? How to maintain stable branches?
The team requesting this change has a history of not working on cross-project tasks beyond their project and this raised the concerns above and made some members of the committee skeptical about this being successful for the entire community.

What would it take to accept a new language?

I want to make it clear that I'm not speaking for any members of the TC and that this is a personal opinion and a way to communicate better what the expectations on this topic are, at least to me. I'll let members of the entire community (dis)agree with me on their own.

During the discussions I was strong on my concerns about #1, mostly because I believe the migration to the "Big Tent" is still not complete. I don't really know what will be the thing that would make us consider the migration as completed but, I can tell for sure that as a community we're hitting some problems that ought to be addressed before we can make any other major change to our policies.

Back to the post. I've become more and more obsessed with setting expectations straight for many things, especially for requests like this which aim to make changes to processes and policies that exist already. By having the expectations laid out, it becomes easier for the people involved to know the direction they need to head towards, and it defines the challenge to make the change happen.

I believe working on #2 would ease my worries around #1. It'd show a stronger commitment for the teams/people involved in this change and it'll help building the initial knowledge base that eventually will be used by other members of the community. I know working on #2 might seem getting a bit ahead of ourselves but it's not. By working out basic things like how the common code will be shared, how the code will be tested, how the code will be shipped, the authentication library, etc we will be setting the bases for the actual work that needs (or will) happen in the future. It's like pretending to run a CI jobs without defining the workflow, OS, etc first.

Anyway, what are these "basic things" that I mentioned above? I'll try to summarize them in the non-exhaustive list below:

Define a way to share code/libraries for projects using the language

The Oslo Team is responsible for maintaining the common libraries used across the OpenStack community. This set of libraries includes the messaging library (oslo.messaging), the i18n library (oslo.i18n), the DB layer library (oslo.db) among other critical libraries.

These libraries don't exist to keep the Oslo team busy. They exist because they collect common code that used to be duplicated across many projects in the community. This code has now been removed, stabilized and release by the Oslo team.

I believe that we, as a community, learned the hard way that this is an inevitable thing. As soon as more projects using the same language will start popping up, the need for sharable code will inevitably come. Therefore, I believe it'd be better for us to define (technically and theoretically) how the code , for any new language willing to be adopted in the community to be accepted, will be shared before it's even accepted.

The above includes defining the team (or initial set of people) that will take care of it is, how the deliverables of this team will be shipped, how they will be tested and how they will be consumed.

I know doing this work ahead of time doesn't mean there won't be work in the future and that everything will be flowers and ponies. I know there are many unpredictable and changing things in our industry. I believe this work will cover most of the initial work, nonetheless.

Work on a basic set of libraries for OpenStack base services

This may seem like a quite high bar to set. While figuring out how code will be shared may seem already like a difficult enough requirement, I believe it's still doesn't cover the minimum for OpenStack services.

OpenStack services that are integrated in the ecosystem require at least one of the following libraries:

keystoneauth / keystone-client
oslo.config
oslo.db
oslo.messaging

Working on a database or messaging abstraction library without consuming it is likely going to provide the wrong abstraction, resulting in a poor API. The authentication layer, on the other hand, is something that pretty much every OpenStack service needs and it shouldn't be such a hard thing to work on, which is not to say it's an easy task.

By working on any of these libraries, it'll be possible to test the CI jobs that will be used for the new language to make sure the bases for new projects are set correctly.

Define how the deliverables are distributed

OpenStack's release process is almost entirely automated. Most of the processes that involve releasing the various deliverables produced by the community are automated and managed by the release team. At the end of the process, tarballs are generated for each deliverable.

As far as Python goes (and the rest of the languages currently supported in OpenStack) generating these tarballs is simple as they just contain the source code. For compiled languages, like Go, it's critical to define what will be shipped as part of these tarballs. Will the tarball contain a binary? Will the tarball contain the source code? If the answer is that it'll contain a binary, should the release team be worried about having 2 different types of tarballs (one containing source files and the other binaries)?

Define how stable maintenance will work

Stable branches are often forgotten in our community and the work that is put in maintaining these branches often goes thankless. Stable branches, however, run many of the cloud providers that use OpenStack today and they are critical for backporting fixes that are backwards compatible.

Each language has its own way to ship libraries, manage compatibility, express stability, etc. When adding a new language to the community, it's critical to work with the rest of the teams that have a horizontal impact so they can ramp up and become familiar with the new language methodologies.

The team proposing the new language should work with the stable maintenance team and help defining the guidelines that should be followed for the new language. Some of the guidelines have been written down in the stable branches section of the project guide team.

Setup the CI pipelines for the new language

Last but not least in my list of minimum requirements for adding a new language there's working with the infrastructure team to setup the CI pipelines that will eventually be used for testing code written with the new language.

This task is probably at the bases of the work required here. In order to address any of the previous tasks, it'll be necessary to setup CI jobs, which involves coordinating with the Infrastructure team. The latter is critical. The involvement of the Infrastructure team is crucial for adding any new language and their feedback will play an important role in any decision.

If we take a look at the list of jobs we've setup for Python, there are some common jobs that most of the projects (service and libraries) have in common. I'd expect the team working on adding a new language to also setup jobs for common things that are used across different projects.

Here's a (non-exhaustive) list that attempts to collect some of this common jobs:

Lint checkers
Doc builders
Release Pipelines

That looks like a lot to do

Going through the above mentioned tasks takes quite some time and it requires people. I'm aware of that. Unfortunately, each of the teams that would be involved in this process don't really have spare hands to work on many other things, which is why I believe most of the effort for adding a new language must come from the group of people interested in the language. This effort will require time from each of these teams anyway, even if most of the researches, documentations and patches are driven by the interested team.

It took the entire community several years to get to the point it is now with Python. I do not expect the team working on adding a new language to do in one week what's been accomplished in 6 years for Python. However, I do not expect this to take as long. The processes have been established, the teams exist already and by working together it'll be possible to address the above points in reasonable time.

I would expect this to be a multi-cycle work, which is why I'd be very skeptical on adding new languages without the above being addressed first. People come and go and even if commitment is promised, I think the best way to guarantee the work is by doing it first and then accepting the language.

Finally, even in the presence of a well-formed process for adding new languages, I'd recommend projects to prefer Python over other languages. This has nothing to do with language preference but the shared knowledge that today exists in our community. I believe this knowledge is invaluable. Changing this knowledge to a new language would take may years whereas making it better is an easier task.

Innovation is important for many projects. We have to accept that things won't stay the same forever, languages change, projects evolve, some projects die. This is part of the evolution of our community and I'd like the OpenStack community to embrace innovation the best way possible. I'd like us to do it in a more conservative way, though. I believe the tasks in this post would help adding new languages safely-enough and yet fast-enough.

This is, of course, a personal view of things. As I mentioned, I've become more and more obsessed with making expectations clear. Therefore, I'll work on a official document that I can submit to the Technical Committee for review and, hopefully, approval.

On communities: Sometimes it's better to over-communicate

2016-10-17T00:00:00+02:00

Communities, regardless of their size, rely mainly on the communication there is between their members to operate. The existing processes, the current discussions, and the future growth depend heavily on how well the communication throughout the community has been established. The channels used for these conversations play a critical role in the health of the communication (and the community) as well.

The things that are communicated are, of course, important. They are the objects being sent among the peers in the community. These things are the messages traveling throughout the system and they must respect a protocol, like every message in every other protocol. Failing to respect this protocol will result in a non-effective communication. Failed communications have side-effects on the system.

A community is a live ecosystem and as such it relies on communications to inform other peers of the system about the current status, evolution, changes, etc. These communications (or channels therefor) cannot guarantee awareness. Let us leave delivery guarantees aside for the sake of the argument being made. Awareness comes after delivery and delivery does not guarantee awareness. A message could have been delivered to other members of the ecosystem but it does not mean the message was processed, therefore the peer may be neither aware of the message nor of the message content even after the message was delivered.

Think of emails, blogs, or any other asynchronous way of communication. None of these channels can guarantee the peers that have received the message have actually read it. This is not under the sender's control. There's a large number of elements that may affect the communication. If you take mailing lists, for example, it may very well be that the receiver of the message is getting too many emails and therefore is subject to missing some of them. This is just one, realistic, example of what could happen. The number of cases that can cause lack of awareness is bigger than what I've mentioned so far but it's not worth exploring it any further.

The way some systems cope with the lack of the above guarantees is by propagating the same message several times - perhaps through different channels - with the same expectations (or lack thereof). Over-communicating won't solve the issue of peers not being aware of the message. This won't get rid of surprises. It does, however, increases the probabilities of the message being processed.

The use of multiple channels will provide different ways for consumers of this message to process it. Communities, specifically, are built by individual peers from different environments and cultures. These peers have different preferences and they may consume messages from different sources. It is indeed impossible to cover all the options and to satisfy every preference. Selecting the right set of channels for these communications and propagating the messages through multiple of these channels when necessary is the key to increase the probability for the messages to be consumed.

Over-communicating does not imply spamming consumers, it does not imply sending the same message, multiple times, through the same channel either. Over-communicating, in the context of communities, requires using different channels to reach different sets of peers. These sets may overlap, nonetheless.

Surprise (sometimes) doesn't mean there's lack of communication or transparency. It's important, however, to reflect on whether the communication channels and methodologies being used are the right ones - or simply enough - for reducing the lack of awareness.

If you liked this post, you may be interested in the keynote I gave at Pycon South Africa. Keeping up with the pace of a fast growing community without dying

Glance Mitaka: Passing the torch

2016-03-09T14:24:00+01:00

I'm not going to run for Glance's PTL position for the Newton timeframe.

There are many motivations behind this choice. Some of them I'm willing to discuss in private if people are interested but I'll go as far as saying there are personal and professional reasons for me to not run again.

As I've always done in my past cycles as PTL, I'd like to take some time to summarize what's happened in the past cycle not only for the new PTL to know what's coming up but for the community to know how things went.

Before I even start, I'd like to thank everyone in the Glance community. I truly believe this was a great cycle for the project and the community has gotten stronger. None of this would have been possible without the help of all of you and for that, I'm deeply in debt with you all. It does not just take an employer to get someone to contribute to a project. Being paid, for those who are, to do Open Source is not enough. It takes passion, motivation and a lot of patience to analyze a technology, think out of the box and look for ways it can be improved either by fixing bugs or by implementing new features. The amount of time and dedication this process requires is probably worth way more than what we get back from it.

Now, with all that being said, here's Glance Mitaka for all of you:

Completed Features

I think I've mentioned this already but I'm proud of it so I'll say it again. The prioritization and scheduling of Glance Mitaka went so well that we managed to release M-3 without any feature freeze exception (FFE) request. This doesn't mean all the features were implemented. In fact, at least 4 were pushed back to Newton. However, the team communicated, reviewed, sprinted and coded in such a way that we were able to re-organize the schedule to avoid wasting time on things we new weren't going to make it. This required transparency and hard decisions but that's part of the job, right?

If the above doesn't sound impressive to you, let me fill you in with some extra info about Glance's community.

Community

Glance's community currently has 12 core members, 3 of which joined during Mitaka and 2 of those 3 members joined at the end of the cycle. That means the team ran on 9 reviewers for most of the cycle except that out of those 9, 1 left the team and joined later in the cycle and 3 folks weren't super active this cycle. That left the team with 5 constant reviewers throughout the cycle.

Now, the above is NOT to say that the success of the cycle is thanks to those 5 constant reviewers. On the contrary, it's to say that we've managed to build a community capable of working together with other non-core reviewers. This was a key thing for this cycle.

I don't think it's a secret to anyone that, at the beginning of the cycle, the community was fragile and somewhat split. There were different opinions on what Glance should (or shouldn't) look like, what new features Glance should (or shouldn't) have and where the project should be headed in the next 6 months.

The team sat down, the team talked and the team agreed on what the project should be and that's what the team did in the Mitaka cycle. Sharing one message with the rest of the OpenStack community (and especially new Glance contributors) was a key for the community to become stronger.

What changed? What did the community do differently?

Priorities and Goals

Mitaka was the first cycle that Glance strictly followed a list of priorities. Funny enough, 2 of those priorities didn't make it in Mitaka but we'll get to that in a bit.

The list of priorities didn't do it all by itself. The list of priorities gave us a target, a goal. It helped us to remain focused. It kept us on track. However, it did way more than that. The list of priorities allowed us for:

Sending a clear message of what the community has agreed on and where the community is headed
Selecting a narrow list of features that we would be able to work on and review throughout the cycle
Scheduling and splitting reviews to accommodate the priorities

Of those points, I believe the second one is the one that really did it for us. We kept the set of new features small so that we could focus on what was important. We had more proposals than we approved and we rejected the rest based on our priorities. This is something I'd like to see happening again in Glance and I'd like to encourage the next PTL to do the same and be strict about it.

Reduce the review backlog

We abandoned patches! We removed from the review queue all the patches that, for 2 or more months, had been in merge conflict, had had -1/-2 from cores or had had -1 from jenkins (hope I'm not missing something here). We did that and we made the backlog shorter, we kept in the review list what was really relevant at that moment.

Something important about the above is that we didn't abandon patches that had stalled for lack of reviews. We prioritized those, we bumped those to the top of our review list and we provided the reviews those patches deserved. Some of them landed, some didn't but the important bit is that those patches were reviewed. Glance's current backlog (verified patches, Workflow 0 and no -2s) is less than 90 patches across all projects (likely way less than that but I just did a rough count) and the most important thing is that ALL these patches have received reviews in 2016. Now, if you don't think this is great, you should have seen our backlog before.

Now, there's no point in cleaning up the review queue if we're going to let it fill up again. Right? This is where the community awesomeness comes to light. We created a review dashboard, which some folks used to organize their reviews. I found it super useful, I used it to prioritize my reviews and help other folks to prioritize theirs. When you're given an organized list of reviews rather than just a list of random reviews, it's way easier for you to know what to review. That right there is the key. To know what to review. I believe, in Mitaka, the team knew what to focus on and the team also knew someone in the community was ready to provide a fresher, cleaner, list of reviews they could focus on. Some folks would prefer to go and make up a list themselves, others will prefer to have one ready. Either way, having a clear story of where the focus should go is the key to help reviews move faster. Remove the noise, it distracts from people from what's really important.

Review Days

Not really a new thing. This has happened before and we just kept doing it. The difference, perhaps, is that we increased the number of review days in the cycle. We tried to do at least 1 review day per milestone and we're now doing a Review Monday until the end of the cycle to get as many bug fixes as possible in before the release. RC1 is looking good already!

So, if you'd ask me, I believe what changed was the community. The community got together, polished some things, and focused on what's important the project. If you read between lines, the above shows one constant pattern, the community matured and it found what its placed in the OpenStack community.

Single Team

The Glance team is now back to being a single reviewing machine rather than several, isolated, teams with specific tasks, which sometimes ended up duplicated. The Glance Driver's team has been merged into the Glance Core team and the Glare team (Artifacts) is not using the Fast Track anymore.

Having smaller teams has resulted in a very useful thing to do for other projects. Depending on the size of the project, it'd be possible to map tasks to smaller teams and then reduce them once the job is done ;). Unfortunately, given Glance's team size, this ended up adding more things to do to members of those smaller teams that were also part of the other teams as well.

One reason to mention this is because we'll have the temptation to do this again in the future but, as it's been proven thus far, Glance's community is not big enough to make such splits worth it and those end up causing more harm to the community than good.

Spec Freeze

The team incorporated a spec freeze in this cycle. The dates that were picked were not the most ideal ones but the freeze helped a lot to bring back focus on code reviews and coding. This freeze put a timeline on folks to get their proposals ready, hence forcing them to have enough time to implement such proposals. Having open milestones distracts the community from the schedule. Announcing such milestones in advance and providing constant reminders helped with making sure folks were prepared and ready to react.

Was it all rainbows?

No, it was not. There were and there are many things we need to work on and improve. For instance, 2 of the priorities didn't make it this cycle. One of them (Nova's adoption of Glance's v2) simply requires a bit of more work and it specifically requires a better alignment with the Nova community's priorities. In other words, Nova needs to make this a priority for them.

The second priority that missed the deadline is the refactor of the image import workflow. Some of you might be thinking "Guys, you had 1 job, ONE job and it was to discuss and implement that refactor". Well, turns out that such refactor has an impact on every cloud and it's not something the team can afford to change a third time (yes, this is the second time the image import workflow is refactored). I'm actually happy it didn't make it in Mitaka because that gave the team more time to evaluate the proposal that had been discussed at the summit, the issues around it and the different alternatives. Nonetheless, I am a bit sad about how things evolved with this proposal because at the very beginning of the cycle we were a bit naive in our planning of this work. That is to say, that we should've probably known from the beginning that we wouldn't have had the time to implement this spec and that it would have taken us the whole cycle to discuss it. The problem is not that we didn't know it to begin with but the fact that we weren't able to communicate that to the community from the beginning. I don't think this is a big deal, though. We realized soon enough that we shouldn't rush this and that dedicating the cycle to discuss this spec was more better than rushing it and then have a poor implementation of it.

We also experimented with a new process for lite specs and it was not a huge success. This impacted some of the lite specs that had been proposed but we did our best to come out of that situation without impacting other's people work. In fact, that situation not just highlighted the issues we had with the process but the team responsible for it (the glance-drivers team), which ended up being merged into the glance core team (as I mentioned in the previous section). This process is being refactored and you can learn a bit more about it in this review.

There's one more thing I wish we would have dedicated more time on. That's tempest. Unfortunately, given the time available, size of the team and the priorities we had, tempest did not receive as much love as we'd have loved to. There are several tempest tests that need to be cleaned up a bit, especially on the V2 side.

To the Glance Community

All the credits for the above goes to you! As a PTL I don't think I can take any credit for what I consider a successful cycle brought by the community itself. I instead recognize that it was all possible because the community decided to go back to being awesome. I'm a believer that the PTL's role is all about enabling the community to be awesome. Planning, prioritization, scheduling, etc. it all serves a single goal, which is to allow the community for doing what they know best and focus on that.

I've enjoyed every single of my stages in this community. Rushing through reviews, coding like crazy, ranting like crazy, leading the community and back to reviewing like crazy. These years as a member of Glance's community have taught me a lot about this project and how critical it is for the rest of the community. As I always say, it's one of those projects that can take your whole cloud down without you even noticing but I do hope you notice it.

Glance is often referred to as a simple project (true), as a small project (kinda true) and sometimes as not super cool (false). I'd like to remind you that not only Glance is a "cool" project to work on but it's also super critical for OpenStack. As I remind you this, I'd like to urge you to help the project stay on track across the cycles. Glance (as every other projects) depends on the ability of its community to dictate what's best for it.

Glance's interoperability has been compromised and there's a plan to help bringing it back. Let's get that done. Glance's v1 is not considered secure and it must be deprecated. Let's do that as well. Glance's stability and security has shown some weaknesses. Let's not ignore that. Working on new features is always sexy. Working on the new cool stuff that other projects are doing might seem like a must do task. I'd argue and say there's a time for everything and, while Glance shares OpenStack's priorities, there are times where the project needs to take a step back, put itself together again and start again. I don't believe Glance has left that self-healing period and I'd like to urge the whole community to keep this in mind.

To the new PTL

Listen! Listen to the things the OpenStack community has to say. Listen to the things external folks have to say. Most importantly, listen to what the Glance community has to say. Glance is not a playground for making random decisions. If you listen to what the community has to say, it'll be easy enough to know what to do and what the next steps are. However, you should be ready for making hard decisions and you need to have the courage to do so. During the last elections, I wrote a post about what being a PTL means and I'd like to encourage you to read it, even if you've done so already.

If you look at the goals we set for Glance during Mitaka and the results we achieved, you'll soon notice what the priorities for the next cycle should be. The community will help shaping those priorities but the baseline is there already.

A great cycle is not measured on how many features the community is able to implement. Therefore, I encourage you to not fall under the temptation of approving as many specs as possible. It is perfectly fine to say no to specs because they conflict with the project's priorities. The more specs the team approves, the more code there will be, the more people the project will need to complete the feature (code wise and review wise). Keep the release small, keep it concise, keep it focused. It's extremely important to communicate the intent of the release to the rest of the community. Do not forget Glance is a critical piece of every cloud.

Glance's community is not formed by the core team. It's formed by every person willing to dedicate time to the project either on reviews or code. Work with them, encourage them. They are helping the project. Some folks simply don't want to do reviews, that's fine. They are still helping with code and bug fixes. Recognize that and make sure they feel part of the community because they are. Expanding the core team is great as long as you can ensure folks in the team are aligned with the team's priorities. Welcome new members and do it gradually.

One more thing, learn to delegate. During my time as a PTL, I relied on other members as much as possible for keeping up with some tasks. For instance, Erno Kuvaja helped immensely with releases and stable maintenance, Nikhil Komawar kept the team updated about the cross-project initiatives, Stuart Mclaren, Hemanth Makkapati and Brian Rosmaita worked with the vulnerability team on security issues, etc. Thanks to all of them for their immense help and I do hope you'll keep up at what you're doing :). In other words, burnout is real and you gotta take care of yourself too. Work with the community, there's no need to take everything on your shoulders as you might end up dropping some balls. When folks don't show up on reviews and they don't share their opinions, do not give those as granted. Find them and ask for it.

And please, I beg you, let's get rid of v1!

So, you're an ATC. Let me tell you something

2015-10-02T09:49:00+02:00

You may be probably wondering what the heck is wrong with me. If you haven't, please, keep reading. If you have, though, please, keep reading.

It's that time of the cycle - ha! you saw this comming, didn't you? -, in OpenStack, when we need to elect new members for the Technical Committee. In a previous post, I talked about what being a PTL means. I talked directly to candidates and I encouraged them to understand each and every point that I've made in that post. This time, though, I'd like to talk directly to ATCs for a couple of reasons. First one is that Thierry Carrez has a great post already where he explains what being a TC member means. Second one is that I think you, my dear ATC, are one of the most valuable member of this community and of the ones with most power throughout OpenStack.

Let's start by laying down what ATC means.

Active Technical Contributor

An Active Technical Contributor (ATC) is a member of the OpenStack Foundation that has contributed to any of the official projects in the last two cycles. Any contributions to the projects will make you an ATC.

Being an ATC, like anything else in OpenStack, is a volunteers job. It's not necessary to be an ATC to be part of the community and, if you are, you're not required to cover for all the ATC responsibilities, although you'll still get all the ATC benefits.

Why do ATCs have power?

As in any other democratic model, members of the communities have the power to elect their leaders. As far as OpenStack goes, every ATC will have the power to vote for the people that will represent the community in the Technical Committee and in the Foundation Board (Individual members only).

If you're not familiar with these groups, I'd really suggest you to read more about the governance structure and I'd also recommend you to take a look at the current Technical Committee.

I'll abstain to give a short version of the current governance model because anything I'll write here won't be as detailed as what's in those links. However, I'd like to encourage you to read them before going forward with this post.

Now that you've a better understanding of OpenStack's governance model and the responsabilities of each of its parts, I hope it's also clearer why your vote, more importantly your conscious vote, is so critical.

Teach me how to vote

Glad you asked because that's what this post is about. I don't mean to tell you who to vote for and I definitely don't mean to share this as the definitive guide of how/why you should vote. However, I do think the points below should be added to your list of considerations when you're casting your vote.

Technical Committee takes time

Being part of the technical committee takes a lot of time. Just like being a PTL and being a super active ATC. It all takes time. Don't ever give for granted that people running for a TC seat have enough time in their hands. If you have doubts, I'd highly recommend you to openly ask to everyone whether they have enough time in their hands.

Look at the candidates tasks. Look at how many things they are doing and ask yourself (or themselves) whether, considering their current tasks, they'll have enough time. For example, PTLs may find it hard to dedicate a significant amount of time to being a TC. It depends on the project, it depends on the satus, etc. But, past has proven that this is normally the case.

The reason you should care about that is not just because you want the TC members to take good care of you and OpenStack. That's an amazing reason. However, you, as an ATC, should also take care of the TC. You don't want members of the TC to burnout when OpenStack is half-way through a cycle. Many times, people underestimate the cost of time and what the TC requirements are.

Did you know the TC meetings are on Tuesdays at 20:00 UTC? That's 22:00 CEST and 8:00 in New Zealand (during summer/winter ;). The only reason I'm mentioning this is because it's relevant for the next topic.

Attending Meetings

You'd think that one should not require anyone to attend meetings but, as I go through my 7th month as a TC member, I can tell you for sure that that's were things are discussed. Yes, there are emails and yes, there are reviews. However, the TC discusses things mostly on meetings. It's a model that has worked well enough so far as it's allowed the TC for reaching consensus in a decent amount of time.

All these meetings are open and logged. The TC and other community members share their opinions there. You can see live how the interactions work, how the TC behaves, what each of the member's opinions are and even if they are active or not.

The point here is that, whenever you're voting for a TC member, you must make sure that people's visions are sound to what your visions are. Think of what you would like OpenStack and the community to be like and then go and judge each of the candidates on emails, reviews, etc.

Many times, current TC members send their candidacies to stay in their current role. Reviewing meetings is a great way to get a feeling of what their work is like. But that's definitely not the only one, there are also reviews. Nonetheless, I think attendance to meetings and the contributions during these meetings are a good way to get a feeling of what the commitment of the members is.

Reviewing TC Reviews

The governance repo is the starting point of many discussions that happen in the meetings. You can get a great feeling of what the TC members opinions, agreements and disagreements by just looking at the governance reviews. There's a dashboard that many of us use for reviews but I'd also recommend you to go and look back to some of the approved ones.

As an ATC, you don't want to just judge the decision. You want to evaluate existing reviews and see how the TC is doing. Having diversity and different opinions is extremely important. The last thing OpenStack needs is tribalism and I'd highly encourage you to seek for folks that have good visions, different opinions and perspectives.

A change on perspective

As I just mentioned in the previous section, different perspectives and diversity is extremely important. The TC needs different views to avoid making decisions that benefit just part of the community. While I don't think this is currently the case, I do believe the lack of a diverse set of views increases the probabilities for that to happen.

When reviewing the candidacies, I'd like to encourage you to take a moment to see what teams that candidate interacts the most with. Is it OPs ? Is it Docs? Is it OpenStack 101? Is it small or big clouds? Is it corporate or startup? Just a couple of ideas, you don't really need to go through them all but I hope those give you an idea of what I mean here.

Think of what you'd like the community to go from here and how each of the candidates would help taking it there. Change is great but it must be done cautiously. Making huge shifts in such a big community comes with lots of risks. Many times, I've agreed with some folks perspectives but then disagreed on the timing. This is important too and you have the power in your hands to make changes like this happen (or not).

The TC is not cool

Yes, exactly. Being part of the TC is not about being cool. It's not about having lables and seriously, there wouldn't be TC without a community like OpenStack's so, I consider being an ATC way cooler than being a member of the TC.

A TC member is always under the spotlight. Anything that the TC does will be, eventually, evaluated by the community. These decisions, while they must be taken on the best interest of OpenStack, don't always make everyone happy. Candidates should be ready to make taugh calls that are on the best interest of the community and you, as a voter, have the chance to ask and/or identify these candidates by looking at their candidacies and previous works.

In other words

Many of my points above will help you evaluate existing TC members that would like to run for another cycle but don't stop there. Take those points and apply them to other candidates. Look at their work, look at their points of views and please, do take the time to think how you would like OpenStack to be and how these candidates can help it get there.

I'm asking you to use that power to help the TC to be better. The TC needs people that are active, people that volunteer for jobs, people that have diverse opinions and people that are also capable of proposing solutions rather than just pointing out things that are wrong. It's always easy to say what's wrong and then sit down waiting for someone else to fix it. We're a small group and we need to get things done.

Look at the candidates, look at whether they are active not only in their communities but also in OpenStack in general. The TC is not a bunch of people that meet every week to share random opinions. Please, base your vote on facts that will help the community because it is OpenStack that we're trying to make better, not just the TC.

Let me tell you something about being a PTL

2015-09-09T15:08:00+02:00

It's that time of the cycle, in OpenStack, when projects need to elect who's going to be the PTL for the next 6 months. People look at the, hopefully many, candidacies and vote based on the proposals that are more sound to them. I believe, for the PTL elections, the voting process has worked decently, which is why this post is not meant for voters but for the, hopefully many, PTL candidates.

First and foremost, thank you. Thanks for raising your hand and willing to take on this role. It's an honor to have you in the community and I wish you the best of lucks in this round. Below are a few things that I hope will help you in the preparation of your candidacy and that I also hope will help making you a better PTL and community member.

Why do you want to be a PTL?

Before even start writing your candidacy, please, ask yourself why you want to be a PTL. What is it that you want to bring to the project that is good for both, the project and the community. You don't really need to get stuck on this question forever, you don't really need to bring something new to the project.

In my opinion, a very good answer for the above could be: "I believe I'll provide the right guidance to the community and the project."

Seriously, one mistake that new PTLs often do is to believe they are on their own. Turns out that PTLs arent. The whole point about being a PTL is to help the community and to improve it. You're not going to do that if you think you're the one pulling the community. PTLs ought to work with the community not for the community.

This leads me to my next point

Be part of the community

Being a PTL is more than just going through launchpad and keeping an eye on the milestones. That's a lot of work, true. But here's a secret, it takes more time to be involved with the community of the project you're serving than going through launchpad.

As a PTL, you have to be around. You have to keep an eye on the mailing list in a daily basis. You have to talk to the members of the community you're serving because you have to be up-to-date about the things that are happening in the project and the community. There may be conflicts in reviews, bugs and you have to be there to help solving those.

Among all the things you'll have to do, the community should be in the top 2 of your priorities. I'm not talking just about the community of the project you're working on. I'm talking about OpenStack. Does your project have an impact on other projects? Is your project part of DefCore? Is your project widely deployed? What are the deprecation guarantees provided? Does your project consume common libraries? What can your project contribute back to the rest of the community?

There are many things related to the project's community and its interaction with the rest of the OpenStack community that are important and that should be taken care of. However, you're not alone, you have a community. Remember, you'll be serving the community, it's not the other way around. Working with the community is the best thing you can do.

As you can imagine, the above is exhausting and it takes time. It takes a lot of time, which leads me to my next point.

Make sure you'll have time

There are a few things impossible in this world, predicting time availability is one of them. Nonetheless, we can get really close estimates and you should strive, before sending your candidacy, to get the closest estimate of your upstream availability for the next 6 months.

Being a PTL is an upstream job, it's nothing - at the very least it shouldn't have - to do with your actual employer. Being a PTL is an upstream job and you have to be upstream to do it correctly.

If you think you won't have time in a couple of months then, please, don't run for PTL. If you think your manager will be asking you to focus downstream then, please, don't run for PTL. If you think you'll have other personal matters to take care of then, please, don't run for PTL.

What I'm trying to say is that you should sit down and think of what your next 6 months will look like time-wise. I believe it's safe enough to say that you'll have to spend 60% to 70% of your time upstream, assuming the porject is a busy one.

The above, though, is not to say that you shouldn't run when in doubt. Actually, I'd rather have a great PTL for 3 months that'll then step down than having the community being led by someone not motivated enough that was forced to run.

Create new PTLs

Just like in every other leading possition, you should help creating other PTLs. Understand that winning the PTL election puts you in a position where you have to strive to improve the project and the community. As part of your responsibilities with regards to the community, you should encourage folks to run for PTL.

Being a PTL takes a lot of time and energy and you'll have to step down[0], eventually. As a PTL, you may want to have folks from the community ready to take over when you'll step down. I believe it's healthy for the community to change PTLs every 2 cycles (if not every cycle).

Community decides

One of the things I always say to PTLs is that they are not dictators. Decisions are still supposed to be taken by the community at large and not by the PTL. However, being in a leading position gives you some extra "trust" that the community may end up following.

Remember that as a PTL, you'll be serving the community and not the other way around. You should lead based on what is best for the project and the community rather than based on what's best for your company or, even worse, based on what will make your manager happy. If those two things happen to overlap, then AWESOME! Many times they don't, therefore you should be ready to take a pragmatic decision that may not be the best for the company you work for and that, certainly, won't make your manager happy.

Are you ready to make that call?

Closing

By all means, this post is not meant to discourage you. If anything, It's meant to encourage you to jump in and be amazing. It's been an honor for me to have served as a PTL and I'm sure it'll be for you as well.

Despite it not being an exhaustive list and the role experiences varying from one project to another, I hope the above will provide enough information about what PTLs are meant to do so that your excitement and desire to serve as one will grow.

Thanks for considering being a PTL, I look forward to read your candidacy.

[0]: Note to existing PTLs, consider stepping down and helping others become PTLs. It's healthier for the community you're serving to change PTLs

Back from Vancouver, towards Liberty

2015-05-27T10:14:00+02:00

Fever is gone (actual fever), energies are coming back and the next six months are blurried by all the things we have ahead. Besides the fever, I'd say this is what a normal summit feels like. Or well, what the feeling after the summit is like.

Just like in every other summit, we had fun, we discussed things, we brainstormed, we (kinda) fought, we enjoyed the excitement of being there and we came back with plans that we consider are a common ground between what we and others think is best for our projects and our community.

Here's a brain dump (not all pages were dumped) of what the summit brought me:

Zaqar (Messaging Service)

If you've followed Zaqar's drama, you know it's gone through several ups and downs (look for previous post and m-l discussions). Short before the summit, it went through another down. The community response turned out to be great and the good news is that it's staying.

Cross-project user-facing notifications

https://etherpad.openstack.org/p/liberty-cross-project-user-notifications

Besides brainstorming a bit on what things should/should not be notified and what format should be used, we also talked a bit about the available technologies that could be used for this tasks. Zaqar was among those and, AFAICT, at the end of the session we agreed on giving this a try. It'll likely not happen as fast as we want but the action item out of this session was to write a cross-project spec describing the things discussed and the technology that will be adopted.

Heat + Zaqar

The 2 main areas where Zaqar will be used in Heat are Software Config and Hooks. The minimum requirements (server side) for this are in place already. There's some work to do on the client side that the team will get to asap.

Sahara (or other guest agent based services) + Zaqar

We discussed 3 different ways to enable services to communicate with their guest agents using Zaqar:

1) Using notification hooks: Assuming the guest agents doesn't need to communicate with the controller, the controller can register a notification hook that will push messages to the guest agent.

2) Inject keystone credentials: The controller would inject keystone credentials into the VM to allow the guest agent to send/receive messages using Zaqar.

3) PreSigned URLs: The controller injects a PreSigned URL in the controller that will grant the guest agent access to a specific tenant/queue with either read or read&write access.

Hallway Discussions

We had a chance to talk to some other folks from teams like Horizon that were also interested in doing some actual integration work with Zaqar as well. Not to mention that some other folks from the puppet team showed interest in helping out with the creation of puppet-manifests.

Glance (Image service)

V1 -> V2 -> V3... wait, WHAT?

We've been talking about killing v1 for several cycles. For better or for worse, we haven't been able to do so. We still want, though. Nonetheless, the big news is that there'll be an experimental V3 of Glance API. You might be wondering what's wrong with Glance's team but hold your breath for a bit, we didn't pull this out of .... a black box.

Back in Atlanta, Alexander Tivelkov and other folks proposed something called Artifacts. Artifacts is - in a very poor definition - a data sets model. An object based API that describes resources that can be as simple as an image or as complicated as a template with dependencies, versions and other more complex features.

They have been working on that since then but there was some push back from the community during Kilo. Part of the community (included myself) felt that Glance was not the right place to do it. To some extent this was related to Glance being a simple image service and Artifacts were way more than that. Without going into the details of why and how these discussions happened, we found ourselves discussing again, at the Liberty summit, what the future of Glance would be. The resolution of this discussion is summarized in this email.

In other words, the work around artifacts will be merged in Glance's code base and it'll be exposed as part of an experimental V3 API. Or, as Jesse Cook put it in that thread, Artifacts is the technical implementation of Glance's V3, which is no more than an object API.

Now, what's important about the above is not the experimental V3 but the radical change in the type of API that Glance will expose. It'll go from being an images API to being an objects API. The resource type, properties and API are completely different.

The images API - v1 and v2 - will still be supported and the transition to v3 will not happen in L. It'll be material for the M summit.

I'll be dedicating time myself to this migration. That is, we'll have a dedicate set of people working on moving images to artifacts in the future and making sure existing deployments remain untouched. I'll be also working closely with the DefCore team to provide the required info about this transition.

However, I'd like to encourage people, at least during the L cycle, to keep considering Glance as an Image Service until the experimental V3 API has been released and the team decides to completely moves towards a fully objects API.

This change is huge and it'll require time, lots of tests, even more discussions and some other changes that are not technical at all. Fun times ahead.

CIS -> SearchLight

The CIS (Catalog Index Service) side of Glance announced during the summit that they'll split out off of Glance into its own project to satisfy not only Glance's needs but many other project's needs. Therefore, expect glance to shrink a bit from this side but don't be to happy, it'll get fatter as soon as the v3 machinery gets going.

CIS folks have done an amazing job and they deserve all the best and glory for it.

Misc

Definitely not least important but certainly less controversial.

We also had sessions for topics like:

Optimize image's cache (link)
Image's uploads (link)
Support for OVF (link)
Research on a NoSQL database driver (link)
Image Signing and Encryption (link)

If you're interested in any of the above, please, do not hesitate to jump into *#openstack-glance@freenode` and ask about them.

Oslo

Unfortunately, I wasn't able to attend as many Oslo sessions as I'd have liked. The discussions above and other commitments took part of the time I had scheduled for Oslo and well, there are always overlaps.

However, there are many interesting things for Oslo and I highly encourage you to look into the etherpads and ask questions.

Personally, I'll dedicating more time to oslo.messaging during Liberty than other parts of it. We'll see.

Rhones-Alpes meetup summary

2014-12-24T14:35:00+01:00

I love meetups. They are a more intimate moment for local communities to get closer, interact and dive into topics related to the main focus of the meetup group. I organize myself 3 meetups in Milan and I really enjoy the opportunity to get to know local people and learn from them. However, it's also really important to participate in and learn from other non-local communities. Therefore, I'm always looking forward to attend - and hopefully talk - in other meetups around my continent - going to other continents for a meetup would be harder to afford.

Since I want everyone to learn from other communities as I do, I'll start writing about my experiences in these events. I'll start with the last meetup I attended outside Italy.

On December 4th, I had the pleasure to attend Lyon's OpenStack meetup. Among the OpenStack meetups I've attended so far, this has been the one with more attendees. At least 45 people were there and there were a variety of topics. The talks that were presented are:

C'est quoi OpenStack?
IaaS beyond the Infrastructure
OpenStack et NFV
La haute-disponibilité pour l'infrastructure OpenStack
OpenStack sous Debian

What surprised me about the Rhônes-Alpes' meetup is the good organization it has and the number of attendees. There were at least ~50 ppl, which is something we're still quite far from in Milan. The meetup was organized in a local college, which helped a lot with spreading the word and engaging more people.

Some pictures were tweeted through @openstackfr's account:

Enjoy!

What's coming in Kilo for Glance, Zaqar and Oslo?

2014-11-21T15:33:00+01:00

As usual, here's a write up of what happened last week during the OpenStack Summit. More than a summary, this post contains the plans we discussed for the next 6 months.

Glance

Lots of things happened in Juno for Glance. Work related to artifacts was done, async workers were implemented and glance_store was created. If none of these things excite you, I'm sorry to tell you that you're missing the big picture.

The 3 features mentioned above are the bases of many things that will happen in Kilo. For long time, we've been waiting for async workers to land and now that we have them we can't but use them. One of the first things that will consume this feature is image introspection, which will allow glance to read image's metadata and extract useful information from them. In addition to this, we'll messing with images a bit more by implementing basic support for image conversion to allow for automatic conversion of images during uploads and also as a manual operation. There are many things to take care of here and tons of subtle corner cases so please, keep an eye on these things and help us out.

The work on artifacts is not complete, there are still many things to do there and lots of patches and code are being written. This still seems to be the path the project is going down to for Kilo to allow more generic catalogs and support for storing data assets.

One more thing on Glance, all the work that happened in glance_store during Juno, will finally pay off in Kilo. We'll start refactoring the library and it'll likely be adopted by Nova in K-2. Noticed I said likely? That's because before we get there, we need to clean up the messy glance wrapper nova has. In that same session we discussed what to do with that code and agreed on getting rid of it and let nova consume glanceclient directly, which will happen in kilo-1 before the glance_store adoption. Here's the spec.

Zaqar

When thinking about Zaqar and Kilo, you need to keep 3 things in mind:

Notifications is something we've wanted to work on since Icehouse. We talked about them back in Hong Kong, then in Atlanta and we finally have a good plan for them now. The team will put lots of efforts on this feature and we'd love to get as much feedback as possible on the implementation, use cases and targets. In order to implement notifications and mark a fresh start, the team has also decided to bump the API version number to 2 and use this chance to clean up the technical debt from previous versions. Some of the things that will go away from the API are:

Get messages by id
FIFO will become optional
Queue's will be removed from the API, instead we'll start talking about topics. Some notes on this here.

One of the projects goal is to be easily consumed regardless of the device you're using. Moreover, the project wants to allow users to integrate with it. Therefore, the team is planning to start working on a persistent Transport in order to define a message-based protocol that is both stateless and persistent as far as the communication between the peers goes. The first target is websocket, which will allow users to consume Zaqar's API from a browser and even using a library without having to go down to raw TCP connections, which was highly discouraged at the summit. This falls perfectly in the projects goals to be easily consumable and to reuse existing technologies and solutions as much as possible.

Although the above two features sound exciting, the ultimate goal is to integrate with other projects in the community. The team has long waited for this opportunity and now that it has a stable API, it is the perfect time for this integration to happen. At our integration session folks from Barbican, Trove, Heat, Horizon showed up - THANKS - and they all shared use-cases, ideas and interesting opinions about what they need and about what they'd like to see happening for Kilo with regards to this integration. Based on the results of this session Heat and Horizon are likely to be the first targets. The team is thrilled about this and we're all looking forward for this collaboration to happen.

Oslo

No matter what I work on, I'll always have time for Oslo. Just like for the other projects I mentioned, there will be exciting things happening in Oslo as well.

Let me start by saying that new libraries will be released but not many of them. This will give the team the time needed to focus on the existing ones and also to work on the other, perhaps equally important, items in the list. For example, we'll be moving away from using namespaces - YAY!, which means we'll be updating all the already released libraries. Something that's worth mentioning is that the already released libraries won't be renamed and the ones to be released will follow the same standard for names. The difference is that they won't be using namespaces internally at all.

Also related to the libraries maintenance, the team has decided to stop using alpha versions for the libraries. One of the points against this is that we currently don't put caps on stable branches, however this will change in Kilo. We will pin to MAJOR.MINOR+1 in stable, allowing bug fixes in MAJOR.MINOR.PATCH+1.

I unfortunately couldn't attend all the Oslo sessions and I missed one that I really wanted to attend about oslo.messaging. By reading the etherpad, it looks like great things will happen in the library during kilo that will help with growing its community. Drivers will be kept in tree, zmq won't be deprecated, yet. Some code de-duplication will happen and both the rabbit and qpid driver will be merged into a single one now that kombu has support for qpid. Just like other projects throughout OpenStack, we'll be targeting full Py3K support like CRAZY!

Hopefully I didn't forget anything or even worse said something stupid. Now, if you may excuse me, I gotta go offline for the next 6 month. Someone has to work on these things.

Non-opinionated software can't exist

2014-10-31T22:50:00+01:00

Here's a thing. I don't believe there's such a thing like "non opinionated" software and I think we should all be more careful when we communicate what the goals of our projects are. The later may not be new to you, probably not even the former but yet, I keep hearing the former everywhere and I keep seeing the later being ignored.

Before I get into why I think non-opinionated software doesn't exist, I'd like to define some of the things that I'll argue about in this post. Let's start with opinions.

What's an opinion?

I'll start by emphasizing that opinions are not facts, therefore they do not represent the absolute truth and they are not verifiable. An opinion that can be verified becomes a fact, which means it's always been a fact and it was not an opinion in the first place. In fact, opinions are considered to be subjective and this prevents them from being absolute. Nonetheless, opinions can be supported by facts.

Opinions have been studied and argued about for a long time. Plato's analogy of the Divided Lines explains the difference between knowledge and belief but even before writing The Republic, Plato and other philosophers had already argued about opinions. Protagoras, for example, claimed that all men's opinions are true. The meaning of this claim and the contradiction that lies within itself were thoroughly discussed in Plato's Theaetetus dialogue. The high-level result of this dialogue is that opinions don't hold truth. I really encourage you to read the dialogue, I consider it to be enlightening.

Another common mistake with regards to opinions is that people commonly claim opinions are relative without claiming what their opinion is relative in terms of. Opinions ought to be subjective, they express something that is relative to the person providing such opinion in the context they are expressed. This subtle distinction is as important as understanding that opinions don't hold truth. The context an opinion is expressed in could affect the opinion itself.

One last thing about opinions, perhaps not so relevant for the content of this post, is that opinions have a weight. That is, depending on the source of the opinion an opinion may be more relevant than the ones coming from other less reliable sources. There are many things that can be argued about this last note. For example, if opinions don't hold truth and they are subjective, why should some opinions have more value than others? My personal opinion is that it all depends on the context where the opinion was provided. I'd probably give more value to a distributed system experts' opinion about my distributed software than I'd give to a web designer's.

What's a non-opinionated software?

Now that we've gone through some of the aspects related to opinions, lets get into what opinions mean when they're applied to software.

A quick google search returned this StackOverflow link where this same exact question was asked. Among the answers provided there, this is the one I think makes more sense:

Non-opinionated software, on the other hand, leaves lots of flexibility to the user (developer). It doesn't proscribe one method of solving a problem, but provides flexible tools that can be used to solve the problem in many ways. The downside of this can be that because the tools are so flexible, it may be relatively hard to develop any solution. Much more of the solution may have to be hand-coded by the user (developer) because the framework doesn't provide enough help. You also have to think much more about how to provide a solution and mediocre developers may end up with poorer solutions than if they had bought into some opinionated software. PERL is probably the classic example of non-opinionated software.

partial quote

Before I get into more details, I'd like to say that I'm not criticizing the answer provided on StackOverflow as such but the general misuse of the term "opinionated" in software development. That is to say that it is not, by any means, my intention to finger-point anyone.

I'm going to break the above down into several separate claims:

Non-opinionated software, on the other hand, leaves lots of flexibility to the user (developer). It doesn't proscribe one method of solving a problem, but provides flexible tools that can be used to solve the problem in many ways.

The fun thing about "non-opinionated software" is that it never claims where the opinion is not being provided by claiming it does have an opinion on something. As expressed in the StackOverflow answer, the non-opinionated software provides the necessary tools to fix a problem by leaving enough flexibility to the consumer to decide what the best thing to do is. There are 3 important things here:

The software is meant to solve a specific problem, therefore it has an opinion on what the final goal is, what the problem it aims to solve is, etc.
The software provides the tools to solve such problem, therefore the software has a very specific opinion about what the right tools to solve such problem are.
The software leaves lots of flexibility to the developer, therefore it is of the opinion the developer knows best how to use the tools provided by itself. This could also be interpreted as the software doesn't have an opinion on how the tools should be used, therefore it leaves it up to the user.

The above describes a pretty opinionated software with regards to a specific problem it aims to solve. It tells the user what problems it is meant to solve, what tools should be used and it claims the user should know best as of how these tools should be used.

Later on in his answer, tvanfosson, describes one of the downsides of non-opinionated software:

The downside of this can be that because the tools are so flexible, it may be relatively hard to develop any solution. Much more of the solution may have to be hand-coded by the user (developer) because the framework doesn't provide enough help. You also have to think much more about how to provide a solution and mediocre developers may end up with poorer solutions than if they had bought into some opinionated software

The difficulties described above are not an effect caused by the hypothetical lack of opinion but an excessively flexible abstraction that lacks of pragmatism. The absence of opinion doesn't make an implementation any more flexible. On the contrary, it is the author's opinion itself of keeping the abstraction flexible that generates such complexity. Every software reflects its authors' opinions.

While I'm aware that the above was excessively nitpicky, I still hope to have made a point with regards to the misconception about what "non-opinionated" software is or even better about why non-opinionated software can't be.

What's opinionated software then?

For the sake of consistency, I'm going to refer to the same answer I used in the previous section. In the above section, I partially quoted tvanfosson's answer and kept the part of it that refers to non-opinionated software. I'll now do the same with the part that refers to opinionated software.

Opinionated software means that there is basically one way (the right way™) to do things and trying to do it differently will be difficult and frustrating. On the other hand, doing things the right way™ can make it very easy to develop with the software as the number of decisions that you have to make is reduced and the ability of the software designers to concentrate on making the software work is increased. Opinionated software can be great to use, if done well, if your problem maps onto the solution nicely. It can be a real pain to solve those parts of your problem that don't map onto the tools provided. An example here would be Ruby on Rails.

I don't think there's anything wrong about this part of the answer. However, I'd like to highlight the quite sarcastic mention of right way™ therein. It's important to understand - I probably can't stress this enough - that opinions don't hold truth. In a software this means that opinionated software does not represent the right way of doing things - and I'm glad he used the ™ symbol there - but one way to do them, which may or may not be a good fit for the user.

Depending on the problems a software wants to solve, the choices made by the author may be the right ones. However, this does not mean the author's opinion is right. What this means is that based on external, proven, facts the choices the author made are the right ones to solve a specific problem. Remember that opinions can be supported by facts but the opinions themselves don't hold any truth.

With all the above said, I think opinionated software exists in the context of its own goals and regardless of what the opinions of the authors are, the software will be proved to be good or bad based on external facts within specific contexts. Opinionated software that follows existing principles and standards that have been proved to be good or bad carries the opinion of its author with regard to those principles and standards. Regardless of whether those principles the software has been based on are good or bad, the author certainly thinks they are valid, hence the software is being based on them. Still, the opinions of the author hold no truth and the facts these opinions are supported by are the ones that will determine the quality of the software.

Non-opinionated software unveiled

It's not my intention to go after authors that claim their software is non-opinionated but as a member of a community that supports such a claim about its own product, I'd like to take a few minutes and unveil how opinionated such product is.

I'm of course talking about OpenStack. For a long time - and I'm guilty of this myself - we claimed to be working on a non-opinionated cloud provider. The truth is that OpenStack is very opinionated in so many different areas and ways. From it's API to the kind of technologies it sits on to of. In OpenStack we don't even consider running it in something that is not Linux - besides the obvious technical limitations of other operating systems. Moreover, all the services supported by OpenStack have strong opinions on what they provide, how they provide it and what the yet-to-be-proved Right Way™ of doing things is.

To make the analysis more granular, let me go deeper into one of the services that exists within OpenStack. Zaqar, for example, claimed to be a non-opinionated messaging service akin to SQS. (Un)Fortunately, this is intrinsically wrong. A messaging service can't lack of opinion because it has to provide certain guarantees to its users, therefore it has to have an opinion on what those guarantees are. The service claimed to have a lack of opinion on what storage you could bake it with and again, this is wrong. The guarantees made by the API pretty much define what kind of storage you can or should use for this service. The fact that this service allows you to create your own driver doesn't mean the service lacks of opinion. It just means it's flexible enough to allow for a custom implementation on the storage layer. However, it has a strong opinion on how the driver should be implemented, how it should behave, etc. These opinions could make the implementation of such custom driver difficult or even impossible.

The same high-level analysis can be done on every single piece of OpenStack and I don't think this is wrong. I actually think that software that claims to lack of opinion is bad. If the author of such software does not have an opinion on what the best way to reach the goal is, then I think the result of his work has very few things that can be trusted. Note that I'm not suggesting that software shouldn't be flexible, what I'm stressing here is that flexibility should be based on opinions that are supported by facts. These opinions ought to be pragmatic and simple - I encourage you to watch this talk from Rich Hickey about simplicity - to allow for specific problems to be solved.

In case it wasn't clear, the point I wanted to make with this post is that non-opinionated software can't exists because from the moment a developer chooses a problem to solve and tries to solve it in a certain way, the developer's opinion will be reflected in the software, therefore the software will have an opinion on the way things should be done. Writing software is as important as knowing how to talk about it and it's our responsibility as authors of software to express precisely what the software is about, the opinions reflected there and what we think the best way to solve a problem is. If we fail to communicate this, we'll be simply fooling ourselves and even worse, we'll be trying to fool others.

P.S: I purposely avoided talking about strong or weak opinions. I think it goes without saying that there has to be a balance between them both and that we should all keep an opened mind all times.

Hiding unnecessary complexity

2014-10-29T00:29:00+01:00

This post does not represent a strong opinion but something I've been thinking about for a bit. The content could be completely wrong or it could even make some sense. Regardless, I'd like to throw it out there and hopefully gather some feedback from people interested in this topic.

Before I get into the details, I'd like to share why I care. Since I started programming, I've had the opportunity to work with experienced and non experienced folks in the field. This allowed me to learn from others the things I needed and to teach others the things they wanted to learn that I knew already. Lately, I've dedicated way more time to teaching others and welcoming new people to our field. Whether they already had some experience or not is not relevant. What is indeed relevant, though, is that there's something that needed to be taught, which required a base knowledge to exist.

As silly as it may sound, I believe the process of learning, or simply the steps we follow to acquire new knowledge, can be represented in a directed graph. We can't learn everything at once, we must follow an order. When we want to learn something, we need to start somewhere and dig into the topic of our interest one step at a time.

The thing I've been questioning lately is how deep does someone need to go to consider something as learned? When does the required knowledge to do/learn X ends? Furthermore, I'm most interested in what we - as developers or creators of these abstractions that will then be consumed - can do to define this.

Learning new things is fascinating, at least for me. When I'm reading about a topic I know nothing about, I'd probably read until I feel satisfied with what I've discovered whereas when I'm digging into something I need to know to do something else, I'd probably read until I hit that a-ha moment and I feel I know enough to complete my task. Whether I'll keep digging afterwards or not depends on how interesting I think the topic is. However, the important bit here is that I'll focus on what I need to know and I leave everything else aside.

I believe the same thing happens when we're consuming an API - regardless it's a library, a RESTFul API, RPC API, etc. We'll read the documentation - or just the API - and then we'll start using it. There's no need to read how it was implemented and, hopefully, no further reading will be necessary either. If we know enough and/or the API is simple enough - in terms of how it exposes the internal implementation, vocabulary, pattern, etc - we won't need to dig into any other topics that we may not know already.

Whenever we are writing an API, we tend to either expose too many things or too few things. Finding the right balance between the things that should be kept private and the ones that should be made public is a never-ending crusade. Moreover, keeping the implementation simple and yet flexible becomes harder as we move on writing the API. Should we expose all the underlying context? What is the feeling a consumer of this API should have?

By now, you are probably thinking that I just went nuts and this is all nonsense and you're probably right but I'll ignore that and I'll keep going. Let me try to explain what I mean by using some, hopefully more realistic, examples.

Imagine you're writing an API for a messaging system - you saw this example coming, didn't you? - that is supposed to be simple, intuitive and yet powerful in terms of features and semantics. Now, before thinking about the API you should think about the things you want this service to support. As a full featured messaging service, you probably want it to support several messaging patterns. For the sake of this post, lets make a short list:

These are the 2 messaging patterns - probably the most common ones - that you'd like to have support for in your API. Now, think about how you'd implement them.

For the Producer/Consumer case you'd probably expose endpoints that will allow your users to post messages and get messages. So far so good, it's quite simple and straightforward. To make things a little bit more complicated, lets say you'd like to support grouping for messages. That is, you'd like to provide a simple way to keep a set of messages separated from another set of messages. A very simple way to do that is by supporting the concept of queues. However, Queue is probably a more complex type of resource which implicitly brings some properties into your system. For example, by adding queues to your API you're implicitly saying that messages have an order, therefore it's possible to walk through it - pagination, if you will - and these messages cannot - or shouldn't - be accessed randomly. You probably know all this, which makes the implementation quite simple and intuitive for you but, does the consumer of the API know this? will consuming the API be as simple and intuitive as implementing it was for you? Should the consumer actually care about what queue is? Keep in mind the only thing you wanted to add is grouping for messages.

You may argue saying that you could use lightweight queues or just call it something else to avoid bringing all these properties in. You could, for example, call them topics or even just groups. The downside of doing this is that you'd be probably reinventing a concept that already exists and assigning to it a different name and custom properties. Nothing wrong with that, I guess.

You've a choice to make now. Are you going to expose queues through the API for what they are? Or are you going to expose them in a simpler way and keep them as queues internally? Again, should your users actually care? What is it that they really need to know to use your API?

As far as your user is concerned, the important bit of your API is that messages can be grouped, posting messages is a matter of sending data to your server and getting them is a matter of asking for messages. Nonetheless, many messaging services with support for queues would require the user to have a queue instance where messages should be posted but again: should users actually care?

Would it be better for your API to be something like:

MyClient.Queue('bucket').post('this is my message')

or would it be simpler and enough to be something like:

MyClient.post('this is my message', group='bucket')

See the difference? Am I finally making a point? Leave aside CS and OOP technicality, really, should the final user care?

Lets move onto the second messaging pattern we would like to have support for, publish/subscribe. At this point, you've some things already implemented that you could re-use. For instance, you already have a way to publish messages and the only thing you have to figure out for the publishing part of the message pattern is how to route the message being published to the right class. This shouldn't be hard to implement, the thing to resolve is how to expose it through the API. Should the user know this is a different message pattern? Should the user actually know that this is a publisher and that messages are going to be routed once they hit the server? Is there a way all these concepts can be hidden from the user?

What about the subscriber? The simplest form of subscription for an messaging API is the one that does not require a connection to persist. That is, you expose an API that allows users to subscribe an external endpoint - HTTP, APN, etc - that will receive messages as they're pushed by the messaging service.

You could implement the subscription model by exposing a subscribe endpoint that users would call to register the above-mentioned receivers. Again, should this subscriber concept be hidden from the user? What about asking the user where messages published to group G should be forwarded to instead of asking the users to register subscribers for the publish/subscribe pattern?

Think about how emails - I hate myself for bringing emails as a comparison - work. You've an inbox where all your emails are organized. Your inbox will normally be presented as a list. You can also send an email to some user - or group of users - and they'll receive that email as you receive other's emails. In addition to this, your email service also provides a way to forward email, filter email and re-organize it. Do you see where I'm going with this? have you ever dug into how your email service works? have you ever wondered how all these things are implemented server side? Is your email provider using a queue or just a simple database? You may have wondered all these things but, were they essential for you to understand how to use your email client? I'd bet they weren't.

Does the above make any sense? Depending on how you read the above it may seem like a silly and unnecessary way of reinventing concepts, theories and things that already exist or it may be a way to just ask the users to know what they really need to know to use your service as opposed to forcing them to dig into things they don't really need - or even care about. The more you adapt your service - or API - to what the user is expected to know, the easier it'll be for them to actually use it.

If you got to this point, I'm impressed. I'm kinda feeling I may be really going nuts but I think this post has got me to sort of a fair conclusion and probably an open question.

As much as purists may hate this, I think there's no need to force 'knowledge' into users just for the sake of it. People curious enough will dig into problems, concepts, implementations, etc. The rest of the people will do what you expect them to do, they'll use your API - for better or for worse - and they shouldn't care about the underlying implementation, theories or complexities. All these things should be hidden from the user.

Think about newcomers and how difficult it could be for a person not familiar at all with messaging system to consume a library that requires Producers and Consumers to be instantiated separately. Think about this newcomer trying to understand why there are are producers, consumers, publishers and subscribers. What if this newcomer just wanted to send a message?

As a final note, I'd probably add that the whole point here is not to use silly names for every well-known concept just to make lazy people happy. If that were the case, we wouldn't have sets and everything would be an array with random properties attached to it. The point being made here is that we tend to expose through our APIs lots of unnecessary theories and concepts that users couldn't care less about. When working on the APIs our users will consume, we should probably ask ourselves how likely it is for them to know all this already and how we can hide unnecessary concepts from them without preventing them for digging deeper into it.

Although all this may sound like "Writing APIs 101", I don't believe it is as obvious for everyone as it seems.

Mentoring others and yourself

2014-10-09T22:04:00+02:00

Mentoring is one of the things I enjoy the most doing. I don't consider myself the ultimate expert on things but I've definitely gone through enough things that had led me to become a mentor on different technical areas.

The reason I enjoy this process so much is because it allows me to relate with other great people, cultures and minds. Every interaction is full of joy, gratitude, knowledge and collaboration. It helps me understand how to interact with different people, how to work together with different cultures and it allows me to integrate more with the environment I live in. Everyday that goes by, either by being part of a program or in my daily actions, I try to give back to people more than what I've received in my life.

I believe, like me, many other people feel the same and/or are interested in this topic. Therefore, I'm writing this post with some of the experiences I've lived and the things I've learned so far by mentoring others.

Mentoring is more than just teaching

When you agree to be someone's mentor, you're agreeing to be this person's guide during a finite period of time. You're agreeing to do more than just teaching things. You're agreeing to live by example, to lead your mentees to whatever knowledge they are seeking, to trust your mentees' passion as they trust yours.

You cannot teach people to trust themselves if you don't trust yourself. You cannot help someone to become more secure if you are not secure enough yourself. Nonetheless, you also need to be careful on how much you think you know and how much you think your meentee doesn't know. One of the worst mistakes you can make as a mentor is to underestimate your mentee's knowledge. The more you think you know about this process, the less likely you'll succeed. You need to be alert, you need to listen as if you didn't know what to do. Everyday new day as a mentor brings new experiences that will make you think differently, embrace them.

Nevertheless, in order to succeed as a mentor, you need to first understand what your tasks are. It is extremely important for you to revisit these goals everyday and make sure you stand by them. Here's a list of questions which, although not exhaustive, I think is a good place to begin with:

What is it that you want to share?
How much time are you willing to dedicate?
How much time are you actually going to be able to dedicate?
How much are you willing to stop and listen to what your mentee has to say?
Are you ready to become part for someone's growth and let that person become part of yours?

Although the above questions may sound a bit philosophical, I believe they build the ground for a good start as a mentor. Understanding that you're not the only one who's going to teach something is mandatory and you, as a mentor, need to keep your mind open to this. These questions are valid regardless of the project you're mentoring on, they are the bases of what you will - or won't - do as a mentor.

Don't give your mentee fishes

This, obviously, comes from the proverb:

Give a man a fish and you feed him for a day. Teach a man to fish and you feed him for a lifetime

When there's lack of trust on one's knowledge, it is very likely that we'll wait and hope for someone to give us what we're looking for. As a mentor, depending on the time you've available, your patience, your dogs' urges to go for a walk, you'll be tempted to give the answer to get this over with. As the aforementioned proverb implies, this is not good at all and it'll work just once, so don't.

Giving the answer won't help neither you nor your mentee to grow. Your mentee won't learn how to find the answers he/she needs and you won't know how to teach your mentee to get to the answer without giving it up. Be smart, work on different ways that will lead your mentee to the answer. Teach your mentee how to ask the right questions and reply with more constructive questions that will "light the bulb" in your mentee's brain.

Avoid conversations like:

mentee: I think X is what I was looking for
mentor: Not exactly but you're almost there

Instead, look for more constructive answers that will bug your mentee's mind:

mentee: I think X is what I was looking for
mentor: If X is what you're looking for, could Y be sufficient as well? Have you thought about how X and Y may be related?

Teach your mentee how to switch perspective, how to think out of the box. Perspective is one of the most important things in emotional intelligence and you must lead your mentees towards a complete view of the problem in a way that they'll also learn the importance of this process for problem solving.

As hard as it can be, you need to be patient. Your mentees are smart, probably smarter than you, but they may not know it yet. Make sure your mentees know how smart they are at the end of the journey. Give your mentees the required tools to answer the questions they may find, because the more tools you teach them to use, the more independent they'll be at the end.

Agreeing feels nice, disagreeing feels better

One thing that many mentees find very difficult to do is to disagree with other people. There are several reasons for that to happen. Mentees may feel they don't know enough or that others with more experience are always right, which is far far far from being true. Mentees need to be told that disagreeing is good and good discussions are, most of the times, the source of great ideas and epiphanies.

Teaching mentees how to trust themselves is not enough to help them understand that disagreeing is usually better than agreeing. They also need to be taught how to disagree. It's not enough to say "I disagree". When you're teaching someone to disagree, you must let that person know that disagreeing is the first step towards an argument and that there's nothing bad about arguing.

I often challenge mentees when they agree with me on something. I'd ask them the reasons why they think I'm right and force them to think through possible different scenarios and whether what I'm saying is actually correct or not. It's really important to encourage mentees to think things through. The more they do this exercise, the easier it'll be for them to spot things in the future. Collaboration is not just about doing things together but about reviewing each others' work as well. I trust the team I work with on reviewing my work and providing feedback if something doesn't feel right. I want people to disagree with me and challenge me to think things through. I believe this is an important thing to learn that many communities with long time leaders lack of. Therefore, I believe this is a key thing to teach to your mentees.

Disagreeing is not just about discussing on someone's argument. It's also about communicating, humbly, one's opinions on that matter. I believe that what actually defines a community is not the group of people willing to work on a common goal but the ability of those people to communicate and discuss things that will then lead the work towards that goal. Two people can work on the same thing without even talking - it'd be hard, yes, but it's still possible. It's our duty, as mentors, to encourage mentees to communicate with the rest of the community regardless of its size and how scary it could be.

Co-mentoring is even better

Remember what we said about perspective? Perspective is one of the things that matters the most in our daily tasks. Depending on our perspective, things could go in many different ways. It's been said thousands of times that thinking out of the box is important and that changing perspective allows for a better and stronger personal growth.

If you enjoy mentoring other people like I do, you probably want to do it often and more importantly you always want to be there. Despite this being a great thing, you need to understand that mentees are not yours and that other people, with similar passions like yours and different perspective, may also help the mentee to grow. People with different experiences see the world in different ways and you want your mentee to know that. You want your mentee to learn from different people, you want your mentee to think out of the box and to do that, your mentee needs to change perspective.

I often encourage mentees to work on features with other experienced folks in the community that could guide them through different roads. This helps them to improve their communication skills, to improve their ability to work with distributed teams that may not be in the same TZ and it also helps them to learn from folks that may have different points of views.

Keep an eye on your mentee

Mentees need to be left alone as much as they need to be guided. Excessive hand-holding will hurt your mentee as much as excessive independence. Note that I'm not saying mentees shouldn't be proactive or do things by themselves. What I'm saying is that you need to be ready to provide guidance when it's required and you cannot know when your mentees need guidance if you're not keeping an eye on the things they are doing.

There's a huge difference between controlling mentees and keeping an eye on them. By keeping an eye on them you're just making sure you'll be ready to act when needed and you wont wait until it's too late to do so. You want your mentees to be proactive, to seek the answers to their questions, to experiment with different technologies and ideas. You just need to be there, just be there.

As part of the guidance I like to provide, I'm always looking forward to my mentees' patches. I encourage them to publish their work even if it's not ready yet. That helps them to share ideas that are still on the works and it helps me to see what path they're going down through. If there are things I don't agree with, I always try to understand why they think those things are a good idea and then provide any guidance needed if I still disagree.

Jerks are deprecated but still around

If you've read my 'Jerks are deprecated' post, I'm pretty sure you expected me to say something along those lines in this one. Despite my big wish to have jerks treated with some magical 'be-nice' cure, they are still around and you need to make sure your mentee knows how to interact with them as well.

For some people, one of the hardest things to do when it comes to become part of a community, team or work environment is to speak up. Unfortunately, jerks have no mercy. It doesn't matter whether you're a newbie, a young person making your way through this world or a very experienced person. For people afraid of speaking up, Jerks are probably the worst thing they have to face.

I believe jerks are like Trolls. The more you're 'afraid' of them the jerker they'll be. The more you feed them, the more they'll chase you. Therefore, the best way to interact jerks is not necessarily ignoring them - although that works - but treating them nicely.

Teach your mentee how to reply to jerks by keeping their reply in context, nice and direct, completely ignoring the fact that they're talking to a jerk. That's not going to change the fact that this person is a jerk but it'll help the mentee to understand that being nice is free and by replying nicely they'll feel good and still get their job done.

One thing that I also think is a great exercise is to talk at conferences. You'll never know who'll attend your talk and standing up in front of a diverse audience is not an easy task. By encouraging your mentees to talk at conferences, you're encouraging them to trust themselves and also be nice about it. You can guide your mentees through what talking in front of people means, you can give them some tips and tricks and you can also walk them through ways for sharing their knowledge with a nice and trustworthy attitude. Make sure they understand that many questions could be asked by many different people. There are nice ways to reply or even avoid jerk's questions without feeding the jerk.

Thanking is for free

This is a quote from a conversation between myself and an OPW applicant. I've nothing else to add here:

> flaper87 exploreshaifali: please, do keep digging. The more questions you ask, the clearer it will be for you :)
> exploreshaifali flaper87, sure :)
> exploreshaifali Thanks!!
> flaper87 exploreshaifali: thank *you* ;)
> exploreshaifali flaper87, thanking me for what?
> flaper87 exploreshaifali: for your time, interest and perseverance. I appreciate you willing to work on this and I thank you for that. I look forward to see you around long enough to do way more.

Come to the bright side

By mentoring someone, you're improving yourself as well. You'll learn how to interact with different people from different cultures. You'll understand that emotions matter more than people think and that people's past, present and future are important for this learning process. You'll learn that you can't teach things you don't believe yourself - these contradictions will come up and you'll have to admit failure. Furthermore, you'll learn that you're your mentee's mentee and that this learning process goes both ways. I've learned as much from my mentees as they have, I hope, learned from me.

Really, mentoring is one of the things I like the most. I enjoy good conversations, laughs and sharing. It's a very thankful - yes, thankful - job to do and it's also emotionally rewarding. If you haven't done it before, I encourage you to do so. Even if you may not think so, I'm sure you've many things to share. Just make sure you understand how critical this work is and how much responsibility you'll have in your hands.

Last but not least, I'd like to make sure you understand you don't need to be in an 'official' program to mentor people. By living as an example and helping others, you're already doing so. Nonetheless, volunteering for mentoring other people is both needed and nice. Don't hesitate to do so.

OPW
GSoC

Zaqar's pools explained

2014-09-24T07:28:00+02:00

Now that I've dedicated time to explain what Zaqar's path going forward is (Zaqar being a messaging service akin to SQS/SNS), I can move on and spend some time diving into some of Zaqar's internals. For this post, I'd like to explain how Zaqar's pools work.

Zaqar's scalability is more than just adding web heads or scaling the storage backend. Although both sides can scale horizontally to support big scales, there's still a chance for the storage backend to hit a limit where it needs to offload traffic to other clusters in order to keep scaling. This is where pools come handy.

Essentially, pools are storage clusters[0]. You could think about pools as you'd think about shards. They are independent and isolated storage nodes that contain part of the data stored in the system. You can add as many pools as you need, although it is recommended to scale each pool as much as possible before adding new ones.

Pools can be more than just a way for scaling Zaqar's storage but for the sake of this post, I'll just explain how they work.

Let me start by explaining how data is split across pools.

Zaqar balances data across pools in a per-queue basis. That means, message distribution happens within the storage cluster and it's not done by Zaqar itself - there are some reasons to it (some of them explained here) that I won't go through in this post.

As I've already mention in the past, distributing queues - buckets, containers, whatever - is not as effective as distributing messages. Doing distribution at a queue level has intrinsic limitations - like hard to balance storage nodes - that could be overcome by pushing distribution down to a message level. The later, however, brings in a whole lot of other issues that Zaqar is not willing to support just yet.

When pools were added, the team considered a set of algorithms that could be used to help balancing queues. Some of those algorithms didn't require much intervention from the operator side - like a hashring - whereas others - like a weight-based algorithm - require the operator to know it's loads, clusters distribution and capabilities. After having considered the available algorithms and the feedback from operators, the team chose to start with a weighted algorithm - we've been discussing supporting more algorithms in the future, but as of now there's just one - that would give deployers enough control over how data is distributed across pools and that would also make it easier to change the results of the algorithm easily and cheap. For example, if a pool wants to be dismissed, it's possible to set its weight to 0 and prevent it to get new queues.

The current weighted algorithm looks like this:

def weighted(objs, key='weight', generator=random.randint):
    """Perform a weighted select given a list of objects.

    :param objs: a list of objects containing at least the field `key`
    :type objs: [dict]
    :param key: the field in each obj that corresponds to weight
    :type key: six.text_type
    :param generator: a number generator taking two ints
    :type generator: function(int, int) -> int
    :return: an object
    :rtype: dict
    """
    acc = 0
    lookup = []

    # construct weighted spectrum
    for o in objs:
        # NOTE(cpp-cabrera): skip objs with 0 weight
        if o[key] <= 0:
            continue
        acc += o[key]
        lookup.append((o, acc))

    # no objects were found
    if not lookup:
        return None

    # NOTE(cpp-cabrera): select an object from the lookup table. If
    # the selector lands in the interval [lower, upper), then choose
    # it.
    gen = generator
    selector = gen(0, acc - 1)
    lower = 0
    for obj, upper in lookup:
        if lower <= selector < upper:
            return obj
        lower = upper

NOTE: Something to note about the current algorithm is that it doesn't take into account the number of queues that exist in each pool, which is something that could be added to it. Also, if you've any feedback as to how this algorithm can be improved, please, let us know - #openstack-zaqar @ freenode.

The above algorithm is used just once per queue. When a queue is created, the pooling driver looks up for an available pool and then registers the queue there. A registry that maps queues and pools is kept in a catalogue that is then queried to lookup the pool a queue has been registered into.

Right after the queue is registered in a pool, all the operations on that queue will happen in that specific pool. However, global operations like getting statistics, examine cluster's health or even listing queues will happen across all the available pools.

Pools' concept is very simple and the implementation has lots of room for improvements that we'd love to explore. In the future, it'd be useful to have support for queue's migration with 0 downtime and obviously no data loss. Moreover, we'd also like to have support for other algorithms that would help balancing queue's as even as possible without depending on the operator.

This is all I've to say about Zaqar's pools. If there's anything that looks broken or could be improved, please let us know or even better, contribute ;)

[0] Note that cluster refers to a replicated, fully reliable storage deployment. For example, a mongodb cluster could be either a replica set or a sharded mongodb environment.

Zaqar's path going forward

2014-09-23T09:46:00+02:00

Long time since I wrote my last post about Zaqar (ex Marconi) and I thought this one should be a summary of what has happened and what will, probably, happen going forward.

Let me start by sharing where Zaqar is in OpenStack.

At every end of the cycle - ~6 weeks before it ends, to be precise - every incubated project goes through a review process where things like API stability, integration with the OpenStack ecosystem, community etc - full list - are revisited in order to evaluate the project and determine if it's ready to be part of the OpenStack integrated release. Despite Zaqar having met all those requirements, it was not accepted into the integrated release. The story is long and it's not the intent of this post to walk you through it. However, if you're interested in getting more details about what happened exactly please go here: 1st meeting 2nd meeting 3rd meeting 1st thread 2nd thread Review.

One thing to get from the last review process, and definitely keep in mind, is that Zaqar is ready to be used in production environments. Technically speaking it met all the requirements imposed by the TC and as a project it's had a stable API for quite a bit already.

One of the discussions that happened during the last graduation review was related to whether Zaqar is a queuing service or a messaging service. To me, and as Gordon Sim mentioned in this email there's no real difference between those 2 besides the later being a more generic term than the former. This discussion led to other discussions like whether things like get-message-by-id makes sense, whether keeping queuing semantics is necessary or even whether guarantees like FIFO should be kept.

All the above discussions have been interesting but I'd like to take a step back and walk you through a perhaps less technical topic but not less important. It's clear to me that not everyone knows what the project's vision is. So far, we've made clear what Zaqar's API goals are, what kind of service Zaqar is and the use-cases it tries to address but we haven't neither explicitly explained nor documented well-enough what Zaqar's scalability goals are, what guarantees from a storage perspective it gives nor how much value the project is putting on things like interoperability.

Zaqar has quite a set of features that give operators enough flexibility to achieve different scales and/or adapt it to their know-how and very specific needs. Something we've - or at least I have - always said about Zaqar - for better or for worse - is that you can play with its layers as if they were Lego bricks. I still think this is true and it doesn't mean Zaqar is trying to address all the use cases or making everyone happy. We want to give them flexibility to add functionality for additional use cases that aren't supported out of the box.. I know this has lots of implications, I'll dig into it a bit more later.

Zaqar's vision is to provide a cross-cloud interoperable, fully-reliable messaging service at scale that is both, easy and not invasive, for deployers and users.

It goes with no saying that the service (and team) has strived to achieve the above since the very beginning and I believe it does that, modulo bugs/improvements, right now.

Reliability

Zaqar aims to be a fully-reliable service, therefore messages should never be lost under any circumstances except for when the message's expiration time (ttl) is reached - messages will not be around for ever (unless you explicitly request that). As of now, Zaqar's reliability guarantee relies on the storage ability to do so and on the service to be properly configured.

For example, if Zaqar is deployed on top of MongoDB - the current recommended storage for production - you'd likely do it by configuring a replica set or even a sharded cluster so that every message is replicated but if you use a single mongod instance, there's nothing the service can do to guarantee reliability. Well, there actually is, Zaqar could force deployers to configure either a replica set or a sharded cluster and die if they don't - we will likely force deployers.

Scalability

Zaqar's design was thought at scale. Not all storage technologies will be able to perform the same way under massive loads, hence it's really important to choose a storage backend capable of supporting the expected user base.

That said, Zaqar also has some built-in scaling capabilities that aim to make scaling storage technologies easier and push their limits farther away. Zaqar's pools allow users to scale their storage layer by adding new storage clusters to it and balancing queues across them.

For example, if you have a zaqar+mongodb deployment and your mongodb cluster (regardless it is a replica set or a sharded cluster) reaches it's scaling limits, it'd be possible to setup a new mongodb cluster and add it as a pool in Zaqar. Zaqar will then balance queues based on the pools' weight across your storage clusters.

Although the above may sound like a quite naive feature, it is not. The team is aware of the limitations related to pools and the things left to do to make it less so. Let me walk you through some of these things.

One thing that you may have spotted from the above is that pools work in a per-queue basis, which means there's no way to split queues across several storage clusters. This could be an issue for huge queues and it could make it more difficult to keep pools load balanced. Nonetheless, I still think it makes sense to keep it this way and here's why.

By balancing on queues and not messages, we're leaving the work of replicating and balancing messages to the technologies that have been doing it for years. This falls perfectly into Zaqar's will of relying as much as possible on the storage technology without re-inventing the wheel (nothing bad about the later, though). Though, I'd like to go a bit further than just "the service wants to rely on existing technologies".

Messages (data) distribution is not an easy task. I had the pleasure (?) to work on the core of these algorithms in the past and thankfully I know enough to want to be away from this while I can. For the sake of the argument, lets assume we add built-in message distribution in Zaqar. The way I think it would work is that we'd require a set of pools to exist so we can distribute messages across them. Then, the storage cluster itself will take care of the messages' replication. What this means is that deployers life would get more complicated since they'll be forced to create several storage clusters even for very basic Zaqar deployments in order to have messages replicated.

Now, to avoid forcing deployers to create several clusters, lets assume we implement message replication within Zaqar as well. This removes the need for deployers to create several clusters since even a single mongod instance - neither a replica set nor a sharded cluster is needed - would work perfectly as a pool since Zaqar would take care of replicating messages. Without getting into the technical details of how much logic we would need to move into Zaqar and the fact that we would be re-inventing things that had already been done elsewhere, I'd like us to ask ourselves why we should depend on external storage technologies if we already have everything needed to balance and replicate data ourselves? Lets not focus on the many tiny details but the overall picture. The service would be doing most of what's needed so why wouldn't we add the missing part and stop relying on external technologies?

All the above is to say that I'd rather spend time working on a swift driver - which we've discussed since the Icehouse summit - than working on having per-message balancing capabilities in Zaqar. Swift knows how to do this very well and it'd make perfectly sense to have zaqar on top of 1 swift pool and just scale that one. I'm not saying mongodb is not good for this job, although we (Zaqar team) should work on documenting better how to use a sharded mongodb cluster with Zaqar.

In other words, Zaqar's scaling focus most be balanced between the API, the guarantees it provides and the storage technology. I believe most of the focus should be invested in the later. The more pools you add, the more resources you'll need and the more complicated your deployment becomes.

There are definitely some issues related to balancing queues - a.k.a buckets, containers, toy's boxes, drawers, etc - and there's a very good reason why swift doesn't do it for containers. One of the things I'd like to see improved is the balancing algorithm Zaqar uses. As of now, it has a very naive and simple algorithm based on weights. The thing I like about this algorithm is that it gives the deployer enough control over the available clusters and the thing I don't like is that it gives the deployer enough control over the available clusters ;). I'd like to also have an algorithm that would make this balancing easier and that doesn't require the deployer to do anything besides adding new pools.

Again, I think pools are great but we should strive to scale the storage layer as much as possible before adding new pools.

Interoperability

Probably hard to believe but I think this is the most difficult task we have had and we will ever have. The project aims to preserve interoperability across clouds and to be able to do so, the features exposed through the API must be guaranteed to work on every cloud regardless of the storage technologies. As much as I'd like this to be true and possible, I think it's not and I also think this applies to every OpenStack service.

We cannot force deployers to deploy Zaqar in a way it'll preserve interoperability across clouds. Deployers are free to configure Zaqar (and any service) as they wish and install whatever driver they want (even non-official ones). If a deployer configures Zaqar on top of a single mongod instance or simply changes the write concern, Zaqar won't be reliable and message could be lost if the node goes down, hence the guarantee will broken.

In addition to the above, optional features, third-party drivers and custom settings - smaller maximum message size, for example - are neither part of this guarantee nor the team can do anything about them.

What we can do, though, is to make sure the features exposed through the API are supported by all the drivers we maintain, work on a set of deployment scenarios that would guarantee interoperability across clouds and make sure the default values of our configuration options are sane enough not to require any deployment to change them.

I'm sure there's a lot more to interoperability than what I'm stating here. What I want to get to is that we strive to make it easier for the service and deployers to preserve interoperability but I believe it cannot be guaranteed at 100%

As you may have noticed, Zaqar has been under a fire storm for the last ~4 weeks, which has been both exciting and stressful - nonetheless, please keep the fire coming.

Many people have many different expectations about Zaqar and it's impossible to make everyone happy so, if you're part of the unhappy group of people (or end up there), I'm sorry. The team has a clear vision of what this service has to provide and a mission to make that happen. I'm sure the service is not perfect and that you don't need to dig deep to find things that should work differently. If you do, please let us know, we're always looking forward to constructive feedback and making the service better.

Messaging is a broader enough field to cover tons of different tones of grey. While Zaqar is not trying to hold them all, it is definitely trying to provide enough to make the service worth it and suffice the use cases it has.

Juno preview for Glance and Marconi

2014-07-10T14:26:00+02:00

Yo!

You may probably know that I spend most of my time on OpenStack in general, I love tackling many things but I'm mostly focused on storage and queuing technologies - you can't do it all - so, I thought about giving you a heads up of what's being baked in the 2 projects I spend lot of my time.

Glance

Glance's team will focus on working on glance Artifacts. The plan for juno is to implemented models, API and everything needed to implement this feature without changing anything in the images API. That means images will remain the same during Juno and they'll be migrated later on during K or L depending on the status of the artifacts implementation. The artifacts work means Glance will move away from being a simply image registry to something more generic like a catalog of various data assets. In fact, the mission statement has already been changed.

Another thing that will happen in Glance during Juno is that the code for store libraries will be pulled out from the code base into its own library. This work started during Icehouse and it's now almost complete. The new library - glance.store - contains the old, already supported, store drivers with a slightly different API to support random access to image data, remove the dependencies on global configuration objects and a couple of more things.

The goal behind this library is to remove from Glance part of the code that is reusable, and to allow external consumers to better support direct access to image data by using the same library Glance uses to manage such data.

There's one more thing that is worth mentioning about Glance's plans for Juno. The async workers work is still moving forward. There's some support for it already - tasks base has been merged - and in the upcoming month the project will adopt taskflow as much as possible. There's still some work to do here and the feature is, unfortunately, moving slowly. An interesting thing about this new feature is that it'll allow Glance to do more things with the resources it has. For example, it'd be possible to do image introspection, convert and resize images without blocking requests.

Marconi

As of Marconi, the plans are to complete the API v1.1. This version of the API is just like the previous one but it addresses some of the feedbacks gotten from the community. Some of the new things that will change are:

Support for pop endpoints (get and delete)
Queues are now lazy resources, which means they don't have to be created in advance.

On the storage side of Marconi, the team will add one new storage driver to support redis and the support for storage engines is on the works. With storage engines (flavors) it'll be possible to create and tag clusters of storage and then use them based on their capabilities. This allows for a more granular billable and scalable deployments.

On top of the aforementioned storage engines, the team will add support for queues migrations between pools of the same type (flavor). It should be possible to do cross-type migrations but the team prefers to go with a more conservative approach and test the algorithm first and then improve it as needed.

Hope you find the above useful, any feedback is very welcome.

Marconi to AMQP: See you later

2014-06-17T19:22:00+02:00

In the last couple of weeks, Marconi's team has been doing lots of research around AMQP and the possibility of supporting traditional queuing systems. I have wanted this myself for quite some time. However, after digging more into what's needed and what supporting traditional brokers means for Marconi, I don't believe supporting such systems makes much sense anymore. Here's why.

Why is it important?

It is, right?

One of Marconi's goals since the very beginning has been to fit into your stack. What that means is that you shouldn't need to change your application tier in order to use Marconi. If you've deployed MongoDB, you can simply install Marconi and point it to your MongoDB replica set. The same thing should happen if you've a traditional broker installed. We wanted you to be able to put Marconi on top of that broker.

The above boils down to adoption, we want people with "Messaging Needs" to adopt Marconi if it is a good option for their use case. This is, in my opinion, the main reason we decided to go down this road and work on the support for traditional queuing systems.

Another motivation is performance. Traditional queuing technologies are known and designed to be fast. This doesn't mean Marconi isn't, what it means is that depending on the storage technology in use, Marconi will perform differently.

The third motivation is something that Marconi brings to traditional brokers. With Marconi, it's possible to add per-queue sharding capabilities to traditional brokers. By using Marconi's pools and flavors it is possible to create separate clusters of storage that will be used cooperatively based on the pool settings.

Current API

When the team started working on Marconi, it was decided to work on a unified api that would eventually support several message patterns. The API was developed based on feed semantics, which is very similar to the way other services like Azure Queues and SQS do it.

The API was designed to be browsable but without sacrificing the "messaging" feel of every queuing system. New things are being added to make it more message based without sacrificing browsability. Note that it is important for the API to be HTTP-friendly.

Even though Marconi has been designed to support multiple transports, we consider the HTTP API the main product and that's were most of the focus has been put on in the last year and a half. So far, the feedback about the API has been good. Nevertheless, the team knows the API is not perfect and there will be changes.

What's up with AMQP?

Let me start by saying there's no such thing as fully-non-opinionated software. Whatever it is the software is selling, it needs to have an opinion. As for Marconi, the API is the product, which means we care about its form, semantics and features. Moreover, we care about it being consistent, simple and interoperable.

After discussing the unified API plans and digging into both AMQP 1.0 and AMQP 0.9 it's quite clear that many things would need to be changed to support queuing technologies that are based in any of these protocols. Moreover, these changes would be needed even to support technologies that have a streamed API.

SIDE NOTE: I don't think these API changes are the real problem, though. The real problem is how those changes will affect the existing API, where they would live and how they fit into Marconi's goals. Kurt Griffiths offered 3 possible plans in this thread and I believe our only option as of now is C. Option A basically breaks the interoperability bit of the API whereas option B is basically a different product. With C, the number of storage technologies that Marconi will be able to support won't be high but I don't think this is a bad thing.

Back to AMQP. We wanted Marconi to be able to support brokers capable of speaking AMQP 1.0 - I won't go into the reasoning about this but I'll definitely say that the fact it's a standard played a big role here. Before I go into the details explaining whether AMQP 1.0 is a good fit or not for Marconi, I'd like to highlight that most of this research was done by Victoria Martínez de la Cruz. Tomasz Janczuk did some experiments with AMQP 0.9, the results were published in this thread.

Lets get to it.

Store and forward

Marconi relies on a store-forward message delivery. This means Marconi has no support for peer-to-peer communications. You send a message to Marconi, the message is then stored in the storage layer and it will then be available for consumption.

AMQP 1.0 is a protocol for message exchange between processes. Although the specification covers the presence of an intermediate process (a broker), it doesn't specify what functionalities such a broker should offer. Store and forward is also covered by the AMQP specification. For instance, Marconi could connect to an AMQP 1.0 capable broker to have store-forward message delivery. Since the AMQP 1.0 specification covers some of this areas but doesn't explicitly define how each one of them should be implemented, there's room for non-standard implementations on each broker, which means the interoperability of this storage driver could be broken.

While you go through the remaining points, remember that AMQP is just the protocol being used to talk to the broker, which means we need all the features to be supported not just by the protocol but the broker as well.

Message access by ID

In AMQP 1.0, the message id is an optional field and it must be set by the producer. The producer is also responsible for enforcing the id uniqueness. This is actually fine, it'd be sad to have to depend on an external service to generate ids but it's probably not a big deal.

Unfortunately, though, in AMQP1.0 it's not possible to access messages directly by using the message-id or any other field. This makes it impossible to support random access to messages in the queue, which is one of Marconi's features. I won't discuss the usefulness of this feature here, although there was a recent discussion arguing it.

Different Acknowledgement models

AMQP's acknowledgement does not need to be immediate. However, it does need to go through the same session used to get the message. Since Marconi is not the final consumer of the message, it can't acknowledge it until the user does it through Marconi's API. This is an issue already because Marconi doesn't have support for persistent connections, which means a message may be pulled from one node and acknowledged on a different node.

A message that has been pulled out the queue is in an acquired state, which means no other consumer should get it. This is good except for the part where the protocol doesn't allow you to unlock a message based on its id. This doesn't play nice with Marconi's claim workflow. In Marconi, a message - or a set of messages - can be claimed and then deleted or released.

No support for queues

AMQP 1.0 does not explicitly define what a queue is. It defines the state model for sending and acquiring messages but it doesn't mandates how it should be defined . Depending on how the broker implementation of AMQP 1.0 works, this feature may or may not be supported.

As of the time of this writing, queue's are a first-citizen resource in Marconi and I don't think that'll change in the near future. There have been some discussions and plans around getting rid of queues. Nonetheless, the team decided to keep them around at the last summit based on the feedback provided by the members of the community attending the sessions.

Conclusion

To summarize, the points we've mentioned are:

Store and forward
Message access by ID
Different Acknowledgement models
No support for queues

There are other very valid points that have not been mentioned in the above list. In order to keep this post readable, I just highlighted the ones that I considered more relevant and that have a bigger impact in the current API. Based on the above mentioned points I don't believe the trade-off between changes required and things that would be gained by supporting traditional messaging systems is fair at all. This all had me thinking that we're probably trying to support AMQP in the wrong layer. What happens if instead of having support for traditional brokers we just add a new AMQP transport?

We'll definitely reconsider this at the K summit once we'll figure out what the API 2.0 should look like. Although this doesn't mean it'll be supported nor that the API will be overhauled, it does mean the team is always revisiting existing technologies and open to expand the project to the best of its possibilities.

One more thing. This post doesn't claim the Marconi API is perfect, what it states is that based on the current supported API, it's not possible to support traditional messaging systems. Whether it'll be possible to do so in the future or not remains to be seen.

I'd like to give a final thank to Victoria Martínez de la Cruz and Tomasz Janczuk for the work, passion and time dedicated to this analysis.

MongoDB 2.6 is out, Marconi will benefit from it

2014-04-08T14:40:00+02:00

Those of you following closely MongoDB's development know that the new stable version (2.6) is out and that it brings lots of improvements and new features.

Since there are already presentations, documentation and general information about this new release, I wanted to take a chance and evaluate those changes from Marconi's perspective. Specifically, I wanted to evaluate which of the changes of this new version will help improving Marconi's MongoDB storage driver.

Index Intersection

For a long time, the only way to have queries that would use an index for 2 or more fields was using compound indexes. Although compound indexes still exist and they are recommended for several scenarios, it is now possible to intersect 2 indexes per query, which means that queries like this one are now possible:

> db.post.ensureIndex({a: 1})
> db.post.ensureIndex({t: 1})
> db.post.insert({t: "yasdasdasdasdaso", a: 673453})
> db.post.find({t: "mmh", a: {"$lt": 5}}).explain() // Complex Plan

If you've followed Marconi's development, you may know that it depends heavily on compound indexes in order to have fully indexed-covered queries. With the addition of index intersection, it is now possible to relax some of the compound indexes. For example, these two indexes ACTIVE_INDEX_FIELDS and the COUNTING_INDEX_FIELDS could be simplified into:

ACTIVE_INDEX_FIELDS = [
    ('k', 1), # Used for sorting and paging, must come before range queries
]

COUNTING_INDEX_FIELDS = [
    (PROJ_QUEUE, 1), # Project will be unique, so put first
    ('c.e', 1), # Used for filtering out claimed messages
]

Note that the index {p_q: 1, 'c.e': 1} is one of the most used ones in Marconi right now.

New Bulk Semantics

Marconi supports posting several messages at the same time. Bulk post, that is. Depending on the storage driver this has to be implemented differently. In the case of MongoDB's driver, Marconi relies on MongoDB's bulk inserts. Although we could have used continueOnError on previous MongoDB versions, we came up with a 2-step insert process that would ensure that either all or none of the messages are posted. This was done for several reasons, one of those being not having great semantics for bulk inserts and those not being extended to updates too.

In version 2.6, MongoDB introduced ordered bulk inserts. For Marconi, this means it can rely on a more deterministic behaviour when doing bulk-inserts. The determinism comes from the fact that with ordered bulk-inserts it'll be now possible to know the exact status of the insert in case of failures.

There are more things to analyse before being able to remove the 2-step inserts but this new feature definitely solves one of them.

$min / $max

This is another very cool feature to have. As of now, Marconi relies on ttl collections to delete expired messages and claims. Unfortunately, when creating a claim, there wasn't a way to update the message expiration date if it would've expired before the claim did. With the new $min/$max operators, it'll be now possible to do all this in a single operation.

new_values = {'e': message_expiration, 't': message_ttl}
collection = msg_ctrl._collection(queue, project)
        updated = collection.update({'_id': {'$in': ids},
                                     'c.e': {'$lte': now}},
                                    {'$set': {'c': meta,
                                              '$max': new_values}},
                                    upsert=False,
                                    multi=True)['n']

In other words, we'll be able to simplify this piece of code

Other things

There are several other new features and improvements that I'm very exited about. For instance, MongoDB 2.6 brings in RBAC (Role Based Access Control) down to a collection level. Although Marconi allows users to secure their databases, it doesn't directly rely on MongoDB's auth features. However, the new RBAC allows for a more secured distribution of messages throughout the database instance. It could be possible to create roles based on keystone roles and let the database enforce that for us. Whether or not this is a good idea, is out of the scope of this post, though.

MongoDB 2.6 also brings a brand new write protocol that integrates write operations with write concerns. The default write concern is safe-writes, which means that write failures are reported immediately.

Overall, this is a really exiting MongoDB release for me and for Marconi's team. Besides bringing several fixes and enhancements, it also brings new features that will make the storage driver simpler and safer. Please, read the full release notes for more information.

People don't like to queue up

2014-03-09T20:56:00+01:00

Lately, I've been thinking a lot about what queues represent in Marconi and how they could be improved. It's not a secret that I'm not a fan of the way we're currently thinking about queues. Throughout Marconi, a queue is a first-citizen resource that is used to group messages under a specific context. Other than that, a queue has no other responsibility and it doesn't do anything useful from a user perspective.

In addition to the aforementioned, queues are not as lazy as I would like them to be. In order to post a message, a queue must exist, which means it's necessary to create the queue before posting a message.

If we look at how some messaging tools and protocols work nowadays, we'll see that many of them have shift away from this concept to a more message oriented one. For instance, qpid-proton, which is implemented on top of amqp 1.0, needs no queue to send a message. Another good example is zmq, which doesn't have the concept of queue either.

At this point, you could already guess what my proposal is:

Let's get rid of queues

In my opinion, queues have reached the end of they're lives. The idea of having a first-citizen resource that piles up a bunch of messages and allows clients to subscribe to them is not affordable nor necessary anymore.

From a user perspective, the most important resource in a queuing system is the message. The user expects the queuing system to be reliable in terms of message delivery not queue's existence. Whether the queue plays an important role in this whole process is not as important to the user as being able to send and receive messages fast and reliably.

This is what keeping a queue means in Marconi:

API: Queue's have their own set of RESTFul endpoints. In order to create a queue, you need to send a POST request to the queue's endpoint. The same thing is necessary to get the metadata of that queue.

Storage: As of now, Marconi has just database like storage drivers. Regardless of what the driver is, Marconi can't simply trust that the queue exists. Therefore, it is necessary to verify the queue existence before doing any operation on the message. Depending on the storage, this may require either an extra query, an extra join or an extra call.

What If we just think about Topics?

I'd like to introduce the knowledge of Topic in Marconi. A topic describes the context a message belongs to - basically the way queues did. The difference between a topic like the one proposed here and queues is that the former ought not to be a first-citizen resource - instead it should be an attribute of the message - whereas the later is a first-citizen resource that acts as a categorizer of messages.

Note that this does not mean that drivers for 'queue-aware' system can't be implemented. The difference is that the concept of queue would be lazily hidden behind a topic.

What about queue's metadata?

When queue's metadata was first introduced in Marconi, the idea was to allow users to add custom properties for that queue. Behind that generosity hides the idea of re-using the queue metadata to add private properties like per-queue limits, flavors etc.

I don't think this necessarily needs to go away. The idea would be to make the queue resource - now called topic - completely optional. It would still be possible to query it, create it and delete it but it won't play such an important role as it does now.

What about existing code?

To be fully honest, I don't think this change would require any code changes in the client side. The existing library requires the user to have a queue instance to which the messages are posted. Those Queues instances could be made lazy without impacting the way the user code works.

For instance, this example won't need to be changed:

```python cli = client.Client(URL) queue = cli.queue(queue_name) queue.post(messages)

for msg in queue.messages(echo=True):
    print(msg.body)
    msg.delete()

```

Although it would also be possible to write it as:

```python cli = client.Client(URL) cli.post(messages, topic=queue_name)

for msg in cli.messages(topic=queue_name, echo=True):
    print(msg.body)
    msg.delete()

```

Conclusion

This post doesn't introduce anything that hasn't been heard before. The concept of topics has been around for quite some time and it's already been adopted by various of the existing queuing systems - it may mean something different in some cases. Many of those systems are still bound to queues but others - like the ones mentioned in this post - have abandoned it. I think this is the right moment for Marconi to make such a choice without brutally disrupting existing deployments.

I'm sorry for making this post short, rough and not as detailed as I'd have liked. This is just a proposal and it requires way more discussion and thoughts but I wanted to throw it out here.

Thoughts?

Real-Time Systems: High level introduction

2014-01-25T22:49:00+01:00

I've been working with real-time distributed systems for quite some time now. This is one of the topics I like the most. In this post I'd like to spend some time explaining at a high level what real-time systems are, where they're used, some of their requirements and I'd also like to conclude the post with a small section explaining why I think the term real-time is neither accurate nor correct to describe these systems.

Real Time Systems

A system is said to be real-time , when it's subjected to execution time constraints. What this means is that whatever it takes to complete the execution of a specific task will determine whether the task ended successfully or not. There are different groups of real-time systems. The first group is composed by systems that are fully dependent of a specific deadline. Any deviation from the time constraint will be considered a total failure. This systems are said to be hard real time systems. The second group is composed by systems whose quality may be affected by missed deadlines but those misses won't be considered failures. Nonetheless, missed deadlines will invalidate the usefulness of the result provided. This group is usually called firm and it's not so common. The third group, though, is said to be soft real time because the deadlines misses are not considered failures and the usefulness of their results will decrease when a deadline is missed but it won't be invalidated.

The different levels of constraints exist to satisfy multiple scenarios. For instance, hard real-time systems are commonly used for stock quotas transactions, airplane systems, car systems etc, whereas soft real-time systems are used when the availability of the result is important but not as much to make it mission critical. In other words, hard real-time systems main goal is to meet all deadlines, whereas soft real-time systems just need to meet a subset.

Real time systems implications

The hardest thing about real-time systems are not the systems themselves but the things they imply. A real-time system, for instance, implies some level of determinism, scale, fault-tolerance etc. This all depends on the level of strictness the system has. A system that needs to meet all the deadlines will need to have fault-tolerance as well. In the event of a node failure, the system has to send the result back before the deadline, otherwise it will be considered a total failure.

Lets dive into some of those implications.

Time Constraint

At this point it is clear that real-time systems are not the same thing as really-fast systems. A system is said to be real-time when its results are tight to an execution time constraint. Therefore, it is necessary to establish what that time constraint is and how strict it is. There's not just 1 time constraint that has to be established, though. If your real-time system is composed by more than 1 component, you'll need to establish a time constraint for the inter-communications of your system. Each one of the inner constraints have to be smaller than the constraint applied to the whole system. What that time constraint is, depends on the system itself and its purpose.

Besides defining the time constraint, it's also necessary to establish how it will be enforced, measured and what actions will be taken in case of failure. Furthermore, it is necessary to determine where this enforcement will happen. More about this later.

Integration Requirements

This pretty much falls into what systems integration is. However, since a real-time system is not necessarily distributed, this will also apply to non-distributed systems.

Integration requirements refer to the semantics, technology and methods used by the system to enforce both the execution and the distribution of the task. Things like whether the execution needs to be synchronous or asynchronous need to be sorted out in this step. Therefore, it is also necessary to establish what the components of the system are, what those components do and how they interact with the rest of the environment.

This section has different applications. Depending on the system, it may be implemented differently. An example of this are non distributed real-time system - perhaps applications would be more accurate here - where the integration with other systems through a non-deterministic environment is not required. However, it is necessary to integrate the different components of that single, most likely multi-threaded, system. Although this may seem obvious, and perhaps implicit in every system aiming to run concurrently.

Determinism

I won't go deep into Determinism, the concepts and rules behind this topic are big and out of the scope of this post. Visit wikipedia for more information.

Determinism is not a strict requirement for every real-time system. It is possible to have real-time systems that don't behave deterministically. Although this is certainly the least common case for real-time systems, the previous statement could be argued as a whole given the fact that these systems would benefit from a deterministic behavior.

Determinism brings predictability to the system, which allows it to be more reliable and lower the difficulties of meeting the goals of the tasks being executed.

Fault Tolerance

Just as with Determinism I won't go deep in the concepts behind fault-tolerance, for more information check wikipedia out.

Unlike determinism, fault-tolerance is applicable just to multi-component systems. That is, in most of the cases, a distributed system.

Fault tolerance is perhaps a stronger requirement for real-time systems that what determinism itself is. A system willing to respect the imposed time constraint has to survive possible failures and complete the task.

It is also worth mentioning that deterministic systems have to be fault-tolerant, which is not necessarily true the other way around. Failing to survive failures will introduce non-deterministic behaviors throughout the system, therefore making the whole system behave non-deterministically.

Requirements Enforcement

We've made it clear that every real-time system has intrinsic requirements that should be met in order for it to meet its goals. The list of requirements is far from being complete but it introduces some of the most relevant ones.

Some of the requirements described above need to be enforced at some point in time during the execution of every task and the life of the system. In order to do that, it is necessary to determine where this enforcement will happen and when.

This enforcement is usually implemented along side with the system itself. That means, if the system is distributed, the enforcement of the time constraint, support for determinism and support for fault-tolerance will be distributed as well.

This step adds more complexity to the system. For instance, determining whether the system is behaving deterministically, whether the system nodes' health is fine or even whether the goals are being met is often the most critical task of a successful real-time system.

`Real-Time` applicability

This post has been mostly focused on distributed systems. However, the term real-time is not bounded to those systems. Here's a list of other type of applications for this term:

Programming languages
Operating Systems
Network protocols

Real time systems misconception

At this point, I don't expect you to be an expert on this field. In fact, I think some of the topics explained above could certainly have been explained more in detail. However, I do expect you to know that real-time does not mean fast nor it means immediately. A real-time system is a system tight to time constraints, which in most cases are very low. Therefore, I believe the term is wrong and doesn't describe the real goal of the system.

Given the fact that there's no such thing as real-time and that computers are governed by the laws of physics, it'd be accurate to say that unless the point of reference for real-time systems is explicitly defined, the measurement could be relative to any point of the system since the task was executed. Common sense leads us to use the time when the task started as a reference point to measure the success of the execution. This, though, implies that no matter how fast the system is, the result won't ever be immediate and the execution time is not actually real.

In my humble opinion, a more accurate term for this kind of systems would be one that explicitly specifies the time constraints of the system itself. For example: time-bound, time-constrained, etc.

Unfortunately, the misconception around the term has led people to use it erroneously to describe things that are supposed to be fast as real-time.

Some References

How and When - OpenStack Services Integration

2013-12-08T21:31:00+01:00

As many of you know, OpenStack's is a fully distributed system. As such, it keeps its services as decoupled as possible and tries to stick to most of the distribution paradigms, deployments strategies and architectures. For example, one of the main tenets throughout OpenStack is that every module should be using 'Shared Nothing Architecture'. The Shared Nothing Architecture principle states that each node should be independent and self-sufficient. In other words, all nodes in a SNA are completely isolated from each other in terms of space and memory.

There are other distribution principles that are also part of OpenStack's tenets, however, this post is not about what principles OpenStack as a whole tries to follow. What I would like to discuss in this post is how OpenStack sticks together such a heavily distributed architecture and makes it work as one. The first thing we need to do is evaluate some of the integration methods that exist out there and how they're being used within OpenStack.

The integration of a distributed system relies on the capability of its components to communicate between them. This communication can happen in several ways and could go through different protocols. For example, we could integrate 2 services by sharing a file that contains the data we want to send from one service - source - to the other - destination. Although it seems obvious that there are many drawbacks from using this approach, it is still useful in scenarios where things like databases, rpc libraries and messaging, which are good replacements for the file-based method, can't be used.

There are many things besides communication that have to be taken under consideration. Nonetheless, establishing a channel between distributed systems is the first step to towards an integrated system.

Just like everything else, each one of this integration methods have some benefits and drawbacks. Some of them are consistent whereas others are more reliable and scalable.

I mentioned above that databases, rpc libraries and messaging could be good replacements for the file-based integration approach. Lets dig a bit more into those.

Databases

I'd dare to say databases are the most common way of integration nowadays. As mentioned before, integrating 2 or more services is about making those nodes communicate with each other, regardless the scope of that communication. The services able to communicate with each other will use that ability to send data from point A to B, regardless what that data is considered to be.

The data travelling between those services could be anything: events, messages, notifications, database records etc. It doesn't matter, at the very end it's data and what matters is how it is being generated, how it's being sent, how it's being serialized and how it's being consumed.

The reason I mentioned all that is because databases conceptually are not message brokers, although they allow 2 or more services communicate and share data between them. Databases are collections of structured - and don't play the unstructured NoSQL card here - data. This data is usually organized and can be created, read, updated and deleted at any time.

It is pretty common to see different services relying on a database as a way to share data. It is even more common to see that happening for a horizontally scaled service. Some people would argue saying that nodes of the same service that rely on a database are not actually being integrated by it, although they are. If a service relies on a database, it means it uses it to store information that will be useful to other nodes of the same service running in parallel. Otherwise, each node of that service would be a rogue, isolated instance and that would break the consistency of the whole distributed system.

This is the first level of integration that services throughout OpenStack use. Most of the services depend on a database. However, none of them share the database access with services that doesn't belong to their cluster. For example, Nova instances won't ever access Cinder's database directly, instead, they'll go through Cinder's API service. The motivation for this is pretty simple, Nova doesn't have any knowledge of how volumes should be handled, nor what state they are in or when their state should be changed. That's something Cinder has to take care of.

Nevertheless, different nodes of the same service do share access to the same database. For example, cinder-volume, cinder-scheduler and cinder-api access the same database concurrently to operate on volumes' data, among other things.

Remote Procedure Calls

As I already mentioned, I believe databases are the most common way to integrate services nowadays, many people do it without even knowing they are. However, as far as OpenStack is concerned, I believe RPC is the most used method. Almost 90% of the integrated projects - those that are part of OpenStack's release cycle - rely on RPC calls to communicate with other services.

But, what is RPC exactly?

Remote Procedure Call is a way of inter-process communication that allows clients to trigger the execution of subroutines in remote locations. There are many different implementations of it, although there's an - expired - RFC for it. Different patterns have been created based on those implementations.

RPC subroutines can do anything. In many cases, they're responsible for storing and retrieving data. In other cases, though, they just execute code and return the result to the caller. Likewise, the channel through which RPC calls are sent could be anything - message brokers, raw TCP connections etc - as long as it's possible to pack and send the message through it[0].

RPC has many benefits that databases don't. For example, it's possible to make synchronous calls and wait for responses whereas with databases the data transmission is always asynchronous. RPC also makes it easier for services to isolate operations and communications. However, just like databases, RPC is a very tightly coupled protocol that requires both endpoints to agree on a structure. This brings consistency and 'predictability' to the protocol at the cost of making it less flexible and obviously less decoupled.

Throughout OpenStack, modules using RPC rely on a messaging library - oslo.messaging - that takes care of the message serialization, transport communication and message acknowledgement. As for the time of this writing, the library has support for 3 different drivers - rabbit, qpid and zmq - that handle the underlying message bus this library sits on top of.

Some projects like Nova, Cinder and Neutron use RPC calls heavily. Almost everything that happens in those projects is triggered by RPC calls. For example, a volume creation request in Cinder is first sent from the cinder-api service to the cinder-scheduler which will then pick one of the cinder-volume nodes that are available and forward the request to it. All this happens in an asynchronous fashion.

Throughout OpenStack, asynchronous calls (cast) are used most of the time. One of the benefits behind this is that services can handle more load since they're not blocking on every call, therefore *-api services can return a response back to the client in less time.

[0] Notice we're not taking reliability, scalability under consideration here but just the ability to of the message channel to take the message from the sender to the receiver

Messaging

NOTE: This section is not about message brokers. It refers to the use of messages as an atomic unit for sharing data between services, regardless they're homogeneous or not.

Messaging is perhaps the most decoupled method of integration. It's based on messages that travel through a channel which gets them from the producer to the consumer. Services sending and consuming messages don't necessarily need to agreed on a structure to use. Furthermore, consumers are not suppose to consume all messages, it all depends on the messaging pattern being used.

Messaging - loose coupling to be more precise - gives flexibility and scalability to the cost of being more complex in terms of implementations. Although services being integrated are not required to agreed on a structure, they do expect to get a message they can read the data from. Furthermore, the message channel - whatever it is - may need to have support for message routing, message transformation and message filtering among other things. All this is mostly coupled to the architecture but not to the services that are part of it.

Something interesting about messaging is that it can be the base for other integration methods. For example, it is possible to send messages containing RPC requests. In order to do this, though, it is necessary to apply all RPC requirements to messaging, for instance, both parts - producer and consumer - will need to agreed on the message structure and type.

Within OpenStack, messages are mostly used by the RPC library - oslo.messaging - itself. It packages all RPC requests into messages that are sent to a message broker which then forwards them to the consumers. There are cases, though, where messaging is used in a almost fully decoupled and asynchronous way. Notifications, for instance, are messages sent by OpenStack services hoping there'll be a consumer ready to process them. These notifications contain information about events that had happened at a very specific moment. In Glance, for example, when an image is downloaded a image.download notification is sent and it'll hopefully be consumed. Whatever it is that happens to that notification, Glance simply doesn't necessarily care about it. However, Ceilometer is a good example - perhaps the only one at the time of this writing - of a service interested in those notifications. It consumes all these events to meter resources usage and to allow users to bill based on that information.

It is now clear that messaging is heavily used in OpenStack and that most of the messages sent between homogeneous services are RPC calls. This calls travel through the message broker picked in the form of atomic messages and they are processed - at least most of them - asynchronously. The asynchronous nature of OpenStack's interoperability helps keeping developers focused on making distributed nodes faster, more scalable and more reliable.

Wrapping up

This post covered 3 different integration methods. It also showed how they're used throughout OpenStack and how they're mixed together to reach great levels of integration. The post also covered some of the benefits of each one of the methods over the others and touched some of the drawbacks of each method as well.

One thing to keep in mind, though, is that OpenStack couldn't have been implemented in any other way. The fact that it relies on such a heavily distributed architecture has helped the project to succeed. The mixed integration styles it supports allow the project to have logically services - as opposed to functional ones - distributed in several nodes. Furthermore, it allows OpenStack to scale massively and dynamically. OpenStack's limits are set by other areas.

However, it's true that not everything is perfect and that there's definitely room for improvement. For instance, the fact that most of the operations throughout OpenStack rely on a message broker is not funny. We all know message brokers are hard - impossible - to scale. It is easy for message brokers to become a SPOF in the architecture, which means that a big part of your deployment will be fucked up if one of them would go down. I believe this is something that per-to-per protocols - amqp 1.0, for example - could alleviate.

I'll maybe cover this in one of my next posts