Céondo's Blog - Embrace Constraints To Evolve www.ceondo.com/ecte/feed/ They use Indefero: Elveos www.ceondo.com/ecte/2012/01/they-use-indefero-elveos 2012-01-31 13:17:10 GMT

A small notice to inform you that Elveos.org is using Indefero:

Elveos.org is a crowdfunding website for open source software. You are a free software developer? Elveos gives you a way to get paid for your work. You are a free software user? Elveos let you fund the features you need.

It reminds me KickStarter but more open. It is very nice to see offers targeting the OSS community.

Firefox is gone, Chrome is here, who is next? www.ceondo.com/ecte/2012/01/firefox-gone-chrome-here-whos-next 2012-01-31 09:44:47 GMT

Can you remember the old days? Life was simple, everything worth something was simple Netscape Navigator. Then Microsoft decided that the web was important and they took over the world and took a nap. Reborn from the ashes of Netscape, Mozilla brought us Firefox. But then, Google decided that the web was too important not to have their own browser, here we are, Chrome is the new leader.

  • Chrome: 50.92%
  • Firefox/Geco: 22.60%
  • Safari: 17.54%
  • Android: 6.12%
  • Opera: 1.64%
  • Internet Explorer, less than 1%...

Chrome is the new leader

Click on the image for a larger version. These are the results of a nearly 100% traffic burst from Hacker News this WE. They are the trend setters as early adopters. They are going to jump on the next Chrome, Firefox, IE, Netscape Navigator when it will come. But here, for the first time I wonder. Who could be the next? Can it be that we reached the goal of the Mozilla Fondation of a competitive field, where several players are taking the lead based on their merits? Could it be that the fragmentation resulting of the different form factors is the reason we will never see again a 80% market share for a single vendor? I hope so!

A router crashed, the websites have been slowing down www.ceondo.com/ecte/2012/01/router-crashed-card 2012-01-27 12:05:57 GMT

If you noticed a slow down in the past minutes, one of the routers of our provider had some issues. This slowed down the services for a short period of time. As you can see on the following graph, suddenly our GET requests to monitor the response time of the services went bong. 20 second response time, this is the equivalent of dead...

Router down

But I must say, this is where I am really pleased by our provider, OVH. They immediately explained what was going on: a card of the router crashed hard or died.

Instability of Cheméo www.ceondo.com/ecte/2012/01/instability-chemeo 2012-01-18 21:01:46 GMT

At the moment, we move a bit the VMs (Virtual Machines) powering up our PaaS (Platform as a Service). The platform has one virtual machine receiving the code updates and then the Cheméo application is updated from it. The application itself is running on another VM behind the web server. The work we do is to disconnect the web server from the application. So, at the end we get :

  • 1 VM with the web server answering your requests;
  • 2 VMs with the Cheméo application servers — they are doing the work like pulling the data out of the database;
  • 1 VM hosting the code and building the new application code which is then deployed on the previous 2 VMs.

This seams a bit complicated but the advantage is that it is very easy to add new application server VM to handle more requests.

Anyway, the split requires moving component here and there and synchronizing the data, this does not work always as expected and this why, one thing you can expect for sure is a bit of downtime during the next few days.

Migration of Indefero's Backup Server www.ceondo.com/ecte/2012/01/indefero-backup-server-migration 2012-01-13 17:03:04 GMT

For your information, we are in the process of migrating Indefero's main backup server on our new infrastructure. The new infrastructure has been running for a while and we are satisfied with the stability.

We are going to do at the same time a server upgrade, moving away from Ubuntu and back to the roots, that is Debian. Once the backup server will be up and running smoothly, the main server will follow.

Update: Got a bit of instability at the same time... upgrading here and there an old server is difficult. Time to get the migration to a better system completed!

Cheméo Now on our PaaS www.ceondo.com/ecte/2011/12/chemeo-on-pass 2011-12-31 13:31:03 GMT

As a nice end of the year present, Cheméo is now running on Céondo's private Platform as a Service (PaaS). This is a huge change compared to the previous system. After the normal period of ironing out the system, the infrastructure will be more robust and flexible to allow fast iterative development. The goal is simple, Cheméo must become the reference for chemical engineering data. We are not targeting biological activity but only chemical and process engineers. 2012 is going to be fun.

Launching Cheméo's Labs www.ceondo.com/ecte/2011/12/labs-chemeo 2011-12-21 08:21:50 GMT

Few days ago Cheméo's laboratories went life. The labs are running software experiments in the field of chemical and physical properties. They are kind of sandboxes where ideas can be tried without disturbing the main Cheméo website.

The labs are running on top of Céondo's private Platform as a Service (PaaS). This platform will soon host all the services we deliver, from our products Cheméo and Indefero to simpler websites like ceondo.com. In case of, a status website will be kept independently using another technology with a different provider. I will soon write a bit more about this private PaaS.

These are exciting times, the best to close 2011 and start 2012.

Use the DNS to Announce Your ZeroMQ Services www.ceondo.com/ecte/2011/12/dns-zeromq-services 2011-12-13 11:29:01 GMT

If you are building your private PaaS and use ZeroMQ you will end up with the issue of distributing the ZMQ end points of your services to your application. You can use a broker approach, but then, what if the broker is moved to another node? It means that you always end up with a situation where at least a minimal set of service entries need to be distributed to your applications and workers.

The simplest approach is rsync and a text file. Just put everything in a text file and copy it to all the servers. But then you need to manage the list of servers and be sure that each server get the latest version — even if you started the new server at the time you distributed the new version... This starts to be complicated, this is why the simplest approch is simply using the DNS infrastructure with the TXT record.

If you are using djbdns, you can simply add lines like this in your data file:

*.s.myapp.net:192.168.1.100:86400
'myservice.s.myapp.net:192.168.1.123\0729976;192.168.1.124\0729976:3600
'otherservice.s.myapp.net:192.168.1.123\0729976;192.168.1.124\0729976:3600  

Now, the day you want to access the myservice from your application, you simply get the DNS TXT record for the corresponding service.

print_r(dns_get_record('myservice.s.myapp.net', DNS_TXT));
Array
(
    [0] => Array
        (
            [host] => myservice.s.myapp.net
            [type] => TXT
            [txt] => 192.168.1.123:9976;192.168.1.124:9976
            [entries] => Array
                (
                    [0] => 192.168.1.123:9976;192.168.1.124:9976
                )

            [class] => IN
            [ttl] => 86400
        )
)

It is simple, robust and allows you to benefit from all the work done on the DNS — for example:

  • you can access it from any languages (PHP, Python, Java, etc.) and the libraries are usually very robust;
  • you can announce a different end point depending of the source ip, for example I am using this to announce the testing end points for the services running on the testing subnet;
  • your DNS server is nice, you can normally extract a lot of stats from it;
  • with several DNS servers, your system is robust.

Of course this is not to be used as a high speed distributed database, but to distribute service information, this is working perfectly.

Update: Someone nicely pointed out that a SRV record can be used too. Totally right, but in my particular case, I store a JSON string with more than just the end point definition, this is why I need the TXT record flexibility. But if SRV fits your requirements, you should jump for it.

Update: For your information, this is effectively to power Cheméo.

Improving the Response Time of Indefero www.ceondo.com/ecte/2011/12/indefero-performance 2011-12-02 17:35:14 GMT

Improving the speed of Indefero is challenging as it requires managing a lot of moving parts, from the git/subversion backends to the database. This week, I have been working on setting up Graphite for the infrastructure. This is working pretty well and provides graph like the following one.

Current response time of Indefero

This graph is extracted from a special Nginx log format which includes the time needed for Nginx to send the response back to the client. The only thing missing is that when I see a spike, I need a way to directly access the corresponding logs to figure out why. At the moment, there are no integrations between these metrics and the logs.

To improve a system, one needs to know the current state. Graphite is a bit hard to setup, but afterwards, it is really easy to push data in. A really nice tool.

The Problem with Performance Logging www.ceondo.com/ecte/2011/11/performance-logging-debugging 2011-11-30 08:32:02 GMT

To run a service like Indefero, you need to log a long list of metrics to follow the load on the system, find the bottlenecks and predict the future needed capacity. To do that, a very powerful system is Graphite, the only issue is that it is only storing and graphing numerical values. Of course, you cannot do different, but the problem is: correlation.

Basically: Once I see that every now and then component is not performing well, how can I drill down in my data to find the reason?

Graphite tells you: this day from 14:05 to 14:07, the rendering of a git tree view was slow. Good to know, the following question is of course: why? If you store more metrics, you can maybe find that I/O was slow on the server X, you can graph together many metrics and visually correlate them. But then, why was I/O slow?

At this point, you need to go one level deeper and take a look at the logs coming from server X from 14:05 to 14:07. This can bring you up to the application level where you figure out that a client repeatedly accessed a page which triggered a git command with a large output, thus loading the server. But to do that you need to access the logs too.

So, Graphite is wonderful, but what I need is that after identifying the subsystem and time range where we have an issue, being able to simply scan through all the corresponding logs in the time range. This would be a kind of integration between Graphite and Graylog2.

My problem now is that Graylog2 is overkill. That is, it tries to provide full text search on the logs, the result is that it requires a very big machinery where I just need aggregation of the logs and the equivalent of a time base search range with a filtering by component, for example webapp.backend.git.

This annoys me, I do not want to build a system by myself.