Hacking the Intel fan, for fun

An alternative headline is: “how to show your wife how much you love her, the geek way”.

From 17 to 22 of September I was in New Orleans participating in the discussions of the Linux Plumbers Conference, which has already turned into one of my favorite conferences. Lots of fun, talking to great people and good discussions about systemd, containers, cgroups, kernel modules, etc. However as the headline indicates this blog post is not to talk about the conference but rather about a toy the Intel booth was giving out: a fan with 7 leds in its propeller. See below:

intel_otc_fan

Fan distributed to attended during LPC

When turned on it shows a text message: “We’re hiring!”, “01.org/jobs”. So, if you are looking for a job and want to come to work with me, you already know where to apply ;-). The fun part is that in its box it’s written “programmable message fan”. The guys from the booth told me that the first question people were asking was how to change the message appearing there, but they had no idea. This post is to show how I did that.

Some days after arriving in Brazil I saw a post from Steven Rostedt on G+ regarding this fan and a blog post he found: http://hackingwithgum.com/2009/10/06/hacking-the-cenzic-pov-fan/. Disassembling our fan showed that it’s a little bit different than that one, changing the EEPROM and with 1 extra pin.

Disassembling the fan

Disassembling the fan

However looking carefully at the board we can see it’s pretty similar: It’s a T24C04A EEPROM that is programmable via I2C. I’m not sure if the extra pin is the the write-protect feature that is present in this EEPROM or if it’s to select the address (in which case we would just have another address on our side). Hence we are safe connecting it to ground. From T24C04A’s datasheet we figure it can work in the range 1.8V to 5.5V. So, instead of using a serial connector like in the other blog post, we can use any development board that has an I2C bus available for us to play with it, particularly BeagleBone Black that has a 3.3V I2C bus, which I’m using here. From the picture below you can notice that a) I didn’t have many HW components available and b) my drawing skills are horrible. I just did a quick hack to get it to work, i.e.: connect GND, VCC and the pull-up resistors (since in I2C the bus is high when nobody is transmitting) [ See UPDATE 2 ].

Wiring the fan to beaglebone

Wiring the fan to beaglebone

For reading from and writing to the EEPROM I’m using i2cdump and i2cset respectively. And i2cdetect to show me where the device was plugged and its address. Beware that in beaglebone the devices in /dev don’t match the ones in the HW schematics (thanks Koen for pointing out this to me).

Now the software part. Like in the other fan we have a column of 7 leds and each letter is “rendered” using 5 rows. However the way the strings are written is different. I tried to use the python script that was provided, but after some tests I figured I’d need to do some modifications. Below is how the strings are stored in our fan’s EEPROM:

fan_eeprom_storage

First byte is the number of strings present in the EEPROM. Each string then has as its first byte the string length. Then we have 5 bytes for each char, in reverse order. They encode the state of the LEDs in each column: 0 means ON and 1 is OFF. After some try and errors we realize that not only the string is reversed, but also the columns. So the first byte in the character encodes the right-most column. In the end we have a 7×5 matrix for each char. I started to draw all chars and change the python script to use them, but I got lazy and just finished the letters I was interested in (see UPDATE 1). The final result is the video shown above that says “Talita, I love you”, in Portuguese :-).

I used the following commands to dump the EEPROM, encode the text and write to it.

root@beaglebone:~# # dump what's in address 0x50 on bus 1 (use i2cdetect to find out the bus and address of your device)
root@beaglebone:~# i2cdump 1 0x50
root@beaglebone:~# # encode the message given as args
root@beaglebone:~# /tmp/ascii2fan "string1" "string 2 with space" "string 3" > ~/message.bin
root@beaglebone:~# # write the content of ~/message.bin into the EEPROM
root@beaglebone:~# i=0; od -An -t x1 ~/message.bin | while read line; do \
                   for c in $line; do \
                       cmd=$(printf "i2cset -y 1 0x50 0x%x 0x$c b" $i); $cmd; ((i++));
                   done;
               done

You can download my modified ascii2fan. I was using it on /tmp and lost it after power cycling, so I needed to change the file again and I didn’t confirm it’s still working. It’s almost the same as the one provided in hackingwithgum.com, it’s just the table that really changes.

UPDATES:

  1. I uploaded a new version of ascii2fan, containing all the letters.
  2. As Matt Ranostay pointed out, the beaglebone black has already an internal pullup resistor, so this is not really necessary.

Optimizing hash table with kmod as testbed

One thing that caught my interest lately was the implementation of hash tables, particularly the algorithms we are currently using for calculating the hash value. In kmod we use Paul Hsieh’s hash function, self entitled superfast hash. I fell troubled with anything that entitles itself as super fast, especially given the benchmarks provided are from some years ago, with older CPUs.

After some time spent on benchmarking and researching I realized there were much more things to look after than just the hash function. With this post I try to summarize my findings, showing some numbers. However do take them with a grain of salt.

The hash functions I’m using here for comparisons are: DJB, Murmur32, Paul Hsiesh and CRC32c (using the crc32c instruction present in SSE4.2). My goal is to benchmark hash functions when strings are used as key (though some of the algorithms above can be adpated for blobs). Also the use of these hash tables are only for fast lookups, so any cryptography-safe property of the functions or lack thereof is irrelevant here. For all the benchmarks I’m using depmod’s hash tables. There are 3 hash tables:

  1. module names: keys are very small, 2 to 20 chars
  2. symbol names: keys are mid-range, 10 to 40 chars
  3. module path names: keys are the largest: 10 to ~60 chars

On my benchmarks I identified the following items as important points to look after optmizations:

  • Hash functions: how do they spread the items across the table?
  • Time to find the bucket
  • Time to find the item inside the bucket
  • Hash functions: time to calculate the hash value

Whilst the first and the last ones depend on the algorithm chosen for calculating the hash value, the second and third are more related to the data structure used to accommodating the items and the size of the table.

Hash functions: how do they spread the items across the table?

It’s useless to find a hash function that is as fast as it can be if it doesn’t cope with it’s role: spreading the items across the tables. Ideally each bucket would have the same number of items. This ideal is what we call perfect hash, something is possible only if we known a priori all the items and can be accomplished by tools like gperf. If we are instead looking for a generic function that does its best on spreading the items, we need a function like the ones mentioned above: given any random string it calculates a 32 bits hash value used to find the bucket in which the value we are interested in is located. It’s not only desirable that we are able to calculate the hash value very fast, but also that this value can be in fact useful. A naive person could say the function below is a super fast constant time hash function.

uint32_t naive_hashval(const char *key, int keylen)
{
        return 1;
}

However as one can note it isn’t used anywhere because it fails the very first goal of a hash function: to spread the items on the table. Before starting the benchmarks I’ve read in several places that the crc32c instruction is like the function above (though not that bad) and couldn’t be used as a hash function for real. Using kmod as a test bed this isn’t what I would say. See the figures below.

paul djb murmur3 crc32c

We used all 3 hash tables used in depmod, dumping all the items just before destroying them. The graph shows how many items were in each bucket. For all algorithms we have almost the same average and standard deviation. So, which functions should we use for the next benchmarks? The answer is clearly all of them since they provide almost the same results, being good contenders.

Time to find the bucket

Once we calculated the hash value, it’s used to find in which bucket the item lies on. For starters, a bucket is nothing more than a row in the table as depicted below:

b0 it1 → it2 → it3
b1 it4
b2 it5 → it6 → it7 → it8
b3 it9 → it10 → it11

The hash table above has size 4, which is also the number of buckets. So we need to find a way to convert the (32-bit) hash value to a 3 bits. The most common way used in hash table implementations is to just take the value’s modulo:

uint32_t hashval = hashfunc(h, key, keylen);
int pos = hashval % h->size;

The size above is set when the hash table is created (some hash table implementations use a grow-able approach, which is not treated in this post). The modulo above is usually implemented by taking the rest of a division which uses a DIV instruction. Even if nowadays this instruction is fast, we can optimize it away if we pay attention to the fact that usually the hash table size is a power of 2. Since kmod’s inception we use size=512 for module names and paths; and size = 2048 for symbols. If size is always a power of 2, we can use the code above to derive the position, which will lead to the same result but much faster.

uint32_t hashval = hashfunc(h, key, keylen);
int pos = hashval & (h->size - 1);

The DEC and AND instructions above are an order of magninute faster than the DIV on today processors. However the compiler is not able to optimize the DIV away and use DEC + AND since it can’t ensure size is power of 2. Using depmod as a test bed we have the following clock cycle measurements for calculating the hash value + finding the bucket in the table:

keylen      before   after
2-10          79.0    61.9 (-21.65%)
11-17         81.0    64.4 (-20.48%)
18-25         90.0    73.2 (-18.69%)
26-32        104.7    87.0 (-16.82%)
33-40        108.4    89.6 (-17.37%)
41-48        111.2    91.9 (-17.38%)
49-55        120.1   102.1 (-15.04%)
56-63        134.4   115.7 (-13.91%)

As expected, the gain is constant regardless of the key length. The time to calculate the hash value varies with the key length, which explains the bigger gains for short keys. In kmod, to ensure the size is power of 2, we round it up in hash_init() to the next multiple with the following function:

static _always_inline_ unsigned int ALIGN_POWER2(unsigned int u)
{
	return 1 << ((sizeof(u) * 8) - __builtin_clz(u - 1));
}

There are other ways to calculate it, refer to kmod’s commit as to why this one is used.

Time to find the item inside the bucket

As noticed in the previous section we use a MOD operation (or a variation thereof) to find the bucket in the table. When there are collisions and
a bucket is storing more than 1 item, hash table implementations usually resort to a linked list or an array to store the items. Then the lookup ends up being:

  1. Calculate the hash value
  2. Use the hash value to find the bucket
  3. Iterate through the items in the bucket comparing the key in order to find the item we are interested in.
  4. Return the value stored.

So often the item is a struct like below:

struct hash_entry {
        const char *key;
        void *value;
};

In the struct above I’m considering the key is a string, but it’s also possible to have other types as key.

So once we have the bucket, we need to go through each item and strcmp() the key. In kmod since we use an array to store the items we have a little better approach: the array is kept sorted and during lookup it’s possible to use bsearch(). However as one can imagine, keeping the array sorted doesn’t come for free. We are speeding up lookup with the downside of slowing down insertion.

Thinking on this problem, the following came to mind: we use and benchmark complicated functions that do their best to give a good 32 bits value and then with the modulo operation we use just throw away most of it. What if we could continue using the value? If we don’t mind the extra memory used to store one more value in the struct hash_entry above, we can. We store the hash value of each entry and then compare them when searching inside the bucket. Since comparing uint32 is very fast, there’s no much point in keeping them sorted anymore and we can just iterate very fast through all items in the bucket checking first if the hash values match and only then strcmp() the key. With this we can drastically reduce the amount of string comparisons we do in a lookup-intensive path, the time to add and item (since it doesn’t need to be sorted anymore) and also the complexity of the code. The downside is the memory usage, with one extra 32 bit value for each entry. The table below shows the results.

keylen      before   after
2-10         222.8   127.7 (-42.68%)
11-17        231.2   139.1 (-39.85%)
18-25        273.8   181.3 (-33.78%)
26-32        328.7   236.2 (-28.13%)
33-40        366.0   306.1 (-16.34%)
41-48        354.0   341.7 (-3.48%)
49-55        385.1   390.5 (1.40%)
56-63        405.8   404.9 (-0.21%)

Hash functions: time to calculate the hash value

This was my original intention: to benchmark several hash functions and choose the best one based on data from real world, not random or fabricated strings. My desire was also to profit as much as I could from nowadays processors. So if there’s a new instruction, let’s use it for good.

Based on that I came to know the crc32c instruction in SSE4.2. By using it we have a blazing fast way to calculate crc32, capable of a throughput of around 1 char per clock cycle. I was also optimistic about it when I checked that its distribution was as good as the other contenders. The figure below shows the time to calculate the hash value for
each of the algorithms.

plot

As can be seen on the benchmark above, the winner is indeed crc32c. It’s much faster than the others. It’s noteworthy the fact that it increases much less than the other as the length goes up.

From the other contenders, DJB is the worst one reaching much higher times. It’s the simplest one to implement, but in my opinion the others are small enough (though not that simple) to take its place.

How much does the hash function affect the hash table lookup time? By lookup time I mean:

total_time

The figure below answers this question:

plot

 

We can see from the figure above that the crc32 implementation is the faster one. However, the gain is not as big as when we consider only the time to calculate the hash value. This is because the other operations take much more.

Conclusion

Although the CRC32 implementation is faster than its contenders and that I was tempted to change to it in kmod, I’m not really doing it. There would be a significant gain only if the key was big enough. As noted above, the time to calculate the hash value in CRC32 grows much more steadily than the others. If the keys were big enough, like 500 chars, then it could be a change to make. It’s important to note that if we were to change to this hash function we would need to add an implementation for architectures other than x86 as well as introduce the boilerplate code to detect in runtime if there’s support for the crc32c instruction and use the generic implementation otherwise.

The other 2 optimizations, although not ground breaking are easy to make and not intrusive. Nonetheless if people want to check the hash functions in their workload I will make available the code and benchmarks in a repository other than kmod’s.

Speeding up build on autofoo projects

First of all, a little digression about build systems.

I’d like to clarify that I’m no lover of autotools. But I’m no lover of any other build system, neither, and autotools is used on several open source projects, including most of the ones I participate. It’s easily copied and pasted by new projects. It’s known by distro maintainers how to hack on it to work on different scenarios like cross-compilation, distribute build across machines, different compilers, rpath, etc etc. Moreover, from my experience, project maintainers usually dislike changing the build system because it causes several headaches not related to the raison d’être of the project. So in general I prefer to modernize the build system rather than radically change it to another one.

Enlightenment window manager is about to be released and I was really bored by the amount it was taking to compile. I use icecream to spread the build across the machines on the local network, so I usually do things like “make -j40″. No matter how many jobs I could put there to parallelize the build it was still painfully slow, particularly while compiling the modules. Taking a look in the Makefile.am files, my suspicion was true: it was using recursive makefiles and since each module has a few source files (circa 3 ~ 6), the build was parallelized only among the files in the same directory. This is because the build is serialized to build each directory. There are plenty of links on the web about why projects should use non-recursive automake. I will enlist some:

So I decided to convert E17′s automake files to non-recursive ones. At least the modules part. After hours of repetitive tasks for converting it, fixing bugs, building out-of-tree, in-tree, distcheck, etc, I committed it and the build time was improved like below:

BeforeAfter
autogen.sh + configure0m47.6s0m36.2s
make -j313m1.9s0m49s
make -j31 with dirty modules only2m38s0m28.2s

So, after configuring it we can build E17 in roughly 1/4 of the previous time.

After the commit introducing the change there were several others to improve it even more, prettify the output, fix some other bugs. It also got reverted once due to causing problems to other developers, but in the end it was applied back.  The worst bug I found was related to subdir-objects option to Automake and Gettext’s Automake macros. That option means that the objects built are kept in the same directory as the correspondent source file. This is needed, particularly in a non-recursive automake scenario, so the objects from different modules don’t conflict due to being put in the same directory.  However, letting this option in configure.ac made “make distcheck” fail in some obscure ways and I later tracked it down to be gettext’s fault. A simple “fix” was to remove it from configure.ac and set it in the “AUTOMAKE_OPTIONS” variable of the modules’ Makefile.am. I really hope someone has the time and will to fix gettext macros – they are a horrible mess and I don’t want to play with them.

Analyzing chess.com tournaments

This year at ProFUSION we started to create chess tournaments to play using chess.com so we have fun not only coding but playing this game. However it’s even more fun when we put both together: chess and code :-)!

During the second tournament we realized chess.com was missing a nice feature: to allow participants to predict who would be the champion based on the current championship status and future results. To show the current state, Chess.com presents us with a table like this:

Not the missing games with a “_”. What we would like is to predict who can still win the tournament based on these missing games. One trick here is how to calculate the tie break, but it’s really straightforward to implement once we understand the formula:

So, for each sum up the result of each game multiplied by the current score of the opponent (“opp” in the formula above) the game was played against.

With that in mind I wrote a small tool, ccs, to allow you to predict the next results:

$ ./ccs.py data/example2.html 
css> state
Number of players: 8

                  1.      2.      3.      4.      5.      6.      7.      8.      |  Score | Tie Break
-------------------------------------------------------------------------------------------------------
1. demarchi        X      1 1     1 0     1 0     0       1 1     1 1     1 1     |     10 |        54
2. ulissesf       0 0      X      1 1     0 1     1       1 1     1 1     1 1     |     10 |        52
3. lfelipe        1 0     0 0      X      1 1     1 1     1 0     1 1     1 1     |     10 |        52
4. hdante         1 0     0 1     0 0      X      1 0     0 1     1 1     1 1     |      8 |        38
5. Gastal         1       0       0 0     1 0      X      1 1     1 1     1 1     |      8 |        34
6. marinatwp      0 0     0 0     1 0     0 1     0 0      X      1 1     1 1     |      6 |        22
7. yanwm          0 0     0 0     0 0     0 0     0 0     0 0      X      1 1     |      2 |         0
8. antognolli     0 0     0 0     0 0     0 0     0 0     0 0     0 0      X      |      0 |         0
ccs> push 1x5=1
Simulation added: demarchi beats Gastal
Number of players: 8

                  1.      2.      3.      4.      5.      6.      7.      8.      |  Score | Tie Break
-------------------------------------------------------------------------------------------------------
1. demarchi        X      1 0     1 1     1 0     0 1     1 1     1 1     1 1     |     11 |        62
2. lfelipe        1 0      X      0 0     1 1     1 1     1 0     1 1     1 1     |     10 |        53
3. ulissesf       0 0     1 1      X      0 1     1       1 1     1 1     1 1     |     10 |        52
4. hdante         1 0     0 0     0 1      X      1 0     0 1     1 1     1 1     |      8 |        39
5. Gastal         1 0     0 0     0       1 0      X      1 1     1 1     1 1     |      8 |        35
6. marinatwp      0 0     1 0     0 0     0 1     0 0      X      1 1     1 1     |      6 |        22
7. yanwm          0 0     0 0     0 0     0 0     0 0     0 0      X      1 1     |      2 |         0
8. antognolli     0 0     0 0     0 0     0 0     0 0     0 0     0 0      X      |      0 |         0
ccs>

Ccs parses the pairings table created by chess.com (given you saved it somewhere and passed as argument to the tool) and then gives you a “ccs> ” prompt, waiting for commands. Type ‘help’ to see the list of available commands. Basically it allows you to a. see the current state of the game (‘state’ command) and b. push and pop result simulations (sigh! ‘push’ and ‘pop’ commands).

A nice feature that I’d like to introduce soon is to export to a big svg with all the state transitions, marking leaf nodes when there’s a champion. I’m also releasing the source as open source, so anybody wanting to help can implement it :-). Code is available on my github: https://github.com/lucasdemarchi/ccs. GPL-2 as usual.

What can we predict in the example above?

  1. No matter the pending games, gastal can not win anymore, since he will reach at most 54 as tie break, leaving lfelipe with 56. That also implies lfelipe would be the champion if gastal wins all his pending games;
  2. If demarchi wins his last game he wins the tournament, with score=11 and tie-break=62. If ulissesf also wins, he will have the same score, but his tie-break will be 60, pushing demarchi’s tie-break to 64.
  3. If ulissesf wins and demarchi loses, ulissesf is the champion.

Since I am demarchi on the table above, now what I have to do is either win the last game or convince Gastal to give up his pending games :-).

Back from Linux Plumbers

I’m back from USA after one week attending Linux Plumbers Conference. This was my first time in LPC, in which I was part of the CoreOS, talking about “From libabc to libkmod: writing core libraries”.

It was a very good experience and I’m glad to meet so many developers, both kernel and userspace hackers. Some of them I only knew from IRC, mailing-lists, etc and it was great time to share our experiences, discuss the current problems in Linux and even fix bugs :-). We seem finally to have reached a consensus on how module signing should be done – the outcome of Rusty Russel’s talk is that he will now start applying some pending patches. There will be no required changes to kmod, except a cosmetic one in modinfo to show if a module is signed or not.

Rusty was also very helpful in fixing a long-standing bug in Linux kernel: a call to init_module() returns that a module is already loaded, even if it didn’t finish it’s initialization yet. This used to be “fixed” in module-init-tools by a nasty hack adding a “sleep(10000)” if the module state (checked on sysfs) is “coming”. I mean “fixed” because this approach is still racy, even though the race window is much shorter than without it. So we finally sat down and wrote a draft patch to fix it. This will probably reach Linus tree in the next merge window.

The above example only seconds what Paul McKenney said on his blog yesterday: “A number of the people I informally polled called out the hallway track as the most valuable part of the conference, which I believe to be a very good thing indeed!” – I was one of the people he informally polled ;-). I’d like to thank all the Committee and people involved in organizing this conference – it was a very great experience.

Finally, you can find my slides below (or download from Google Docs). I think soon the audio will be published. Meanwhile you may enjoy Lennart’s picture when he was a child in slide #5 (during the talk he claimed it’s not him, but I don’t believe – they are too similar :-)).
Continue reading

ConnMan in Archlinux

For sometime (I think it’s almost 2 years) I was maintaining the ConnMan package in AUR, the user repository in Archlinux.

After talking to Dave Falconindy and Daniel Wallace, the later accepted to maintain it in community repository. As a result I’m dropping the package in AUR. All of you that were using my package should upgrade to the latest version from the official Archlinux repository.

A great news is coming for Enlightenment users, too: a new ConnMan module, written from scratch, that works properly with recent versions. This is reaching e-svn very soon. Stay tunned. Thanks to Michael Blumenkrantz, too, who declared me a “hero” for doing this :-).

BlueZ to move to standard D-Bus interfaces

During the last week I was in Recife, Brazil, together with Henrique Dante, Ulisses Furquim and other fellows attending the BlueZ meeting. We discussed several topics, among them the upcoming BlueZ 5 API. That discussion started by Johan and Marcel saying BlueZ would not move to DBus.Properties and DBus.ObjectManager anymore for the next version. Main reason being that BlueZ now will have more frequent releases than it had in past and therefore there wasn’t enough time to convert the API. However I and Luiz von Dentz already had an almost working solution: I implemented DBus.Properties and he did the DBus.ObjectManager, so we received the news with much regret.

Since these changes break the API, not accepting them now means we’d need to wait for BlueZ 6 in order to have these interfaces, which expected to happen only on later next year. Thus we argued that and Marcel Holtmann challenged us to make a demo on Wednesday. Challenge accepted! After working one night to put the implementations together and finish the DBus.PropertiesChanged signal we could present such a demo. And it worked without any issues. To say the truth we only converted 2 interfaces (Adapter and Manager), but that was enough to convince them we can finish this in time for BlueZ 5.0. Or at least that’s what we hope. Final consensus: this change is back on track for the upcoming major release of BlueZ.

Now we are polishing the implementation and start sending the patches. The first patch set was already sent and hopefully soon all the others will reach BlueZ’s mailing list.

So, why is this important? By implementing these interfaces, that are part of the D-Bus specification, it becomes much easier to write client code to talk to bluetoothd and since D-Bus bindings already have implementations for them it’s much less error prone, too. At the same time we are aiming to simplify the code needed in bluetoothd to support our use-cases, so for both people who write bluetoothd and for those who write client applications this will be beneficial.

ELC 2012

Hey, this is my feedback of ELC 2012. If you didn’t read the first part, about ABS 2012, you can read the previous post first.

ELC is one of my favorite conferences as I can meet several talented people and have good talks about Linux in embedded devices. This time was not an exception and I enjoyed very much. The main reason I was there was because I was going to present kmod, the new tool to manage kernel modules. But that would be only on the last day of the conference. Let’s start from the beginning.

To open the conference Jon Corbet gave his usual kernel report starting from January 2011 and going on through the events in each month: the mess in ARM, death of the big kernel lock, userspace code in kernel tree (should we put libreoffice there, too?) and so on. Following this keynote I went to see Saving the Power Consumption of the Unused Memory. Loïc Pallard from ST-Ericsson talked about how memory consumption in increasingly important in embedded devices for the total power consumption. We are going from the usual 512 MB on smartphones to 2 ~ 4 GB of DDR RAM. There are some techniques to reduce this the power drained and he presented the PASR framework, that allows the kernel to turn on/off specific banks/dies of memory since not all of them is used all the time. Later on talking to the guys from Chromium OS I realized that this is especially true when the device is sleeping. We may want to discard  caches (therefore use much less memory when in sleep mode) and then turn off banks not used. In my opinion the battery consumption is one of the most important today for embedded Linux devices: I’m tired to have to charge my smartphone every day or every X hours. I hope we can improve the current state by using techniques as the one presented in this talk.

In Embedded Linux Pitfalls Sean Hudson from Mentor Graphics shared his experience while coming from closed embedded solutions to open source ones. Nice talk! I think people doing closed development should see presentations like this: one of the main reasons for failing in opensource is not being able to talk to each other: HW guys not talking to SW guys, NIH, not playing the rules of the communities and therefore having to carry a lot of patches, etc. I’ve always been involved with opensource so I don’t know very well how things work for companies doing closed development, but I do know that more often than not we see those companies trying to participate in communities/opensource and failing miserably. In my opinion one of the main reason is because they fail to talk, discuss and agree on the right solution with communities.

One of the best talks in ELC 2012 was Making RCU Safe for Battery-Powered Devices. Paul McKenney is one of the well known hackers of the Linux kernel, maintaining the RCU subsystem. Prior to this talk I had no idea RCU had anything to do with power consumption. He went through a series of slides showing how and why RCU got rewritten several times in the past years, how he solved the problems reported by community and how things get much more complicated with preemption and RT. He finished his presentation saying that the last decade was the most important of his carrier and that is because of the feedback he got from RCU being used in real life. I’d really love to see more people from Academia realizing this.

The next day Mike Anderson gave a great keynote about The Internet of things.  Devices in Internet are surpassing the number of people connected and soon they will be much more important. It’s a great opportunity for embedded companies and for Linux to become the most important Operating System in the world. Recent news about this are already telling us that 51% of the Internet traffic is non-human (although we can’t classify all of that as “good traffic”). Following his keynote I went to see Thomas Petazzoni from Free Electrons talk about Buildroot. I like Buildroot’s simplicity and from what Thomas said this is one thing they care about: Buildroot is a rootfs generator and not a meta-distro like openembedded. There were at least 3 people asking if Buildroot could support binary packages and he emphasized that it was a design decision not to support them. I like this: use the right tool for the each job. I already used Buildroot before to create a rootfs using uClibc and it was great to see that it was already packaging the last version of kmod before I went to ELC.

In the end of the second day I participated in Real-Time (BoFs) with Frank Rowand. It was great to have Steven Rostedt and Paul McKenney there as they contributed a lot to the discussion, pointing out the difficulties in RT, the current status of RT_PREEMPT patches regarding mainline and forecasts of when it will be completely merged. There were some discussions about “can we really trust in RT Linux? How does it compare with having an external processor doing the RT tasks?”. In the end people seemed to agree that it all boils down about what do you have in your kernel (you probably don’t want to enable crappy drivers), how do you tolerate fails (hard-RT vs soft-RT) and that RT is not a magic flag that you turn on and it’s done: it demands profiling, kernel and application tuning and expertise in the field. People gave several examples of devices using the RT_PREEMPT patches: from robots and aircrafts  in the space to cameras (the Sony cameras given away on the last day were 1 of the examples).

On Friday, the last day of the conference, I was much more worried about my presentation in the end of the day than with other talks. Nonetheless I couldn’t miss Koen Kooi from Texas Instruments talking about Beaglebone. It’s a very interesting device for those who like to DIY: it’s much smaller than its brothers like Beagleboard and Pandaboard and still has enough processing power for lots of applications. Koen was displaying his slides using node.js running on a Beaglebone. What I do like to see though is barebox replacing u-boot as the bootloader. If you attended Koen’s talk on ELCE last year, you know u-boot is one of the culprits for a longer boot. Jason from TI kindly gave me a Beaglebone so I can use it for testing kmod; when I have some spare time I’ll take a look on the things missing for using barebox on it, too.

The last talk of the conference was mine: Managing Kernel Modules With kmod. I received a good feedback from people there: they liked the idea behind kmod – designing a library and then the tools on top of that. I had some issues with my laptop in the middle of my presentation, but it all went well. I could show how kmod works, the details behind the scenes, the short history of the projects and how it’s replacing a well known piece of  userspace tools of Linux in all major desktop and embedded distros. When I was showing the timeline of the project I remember Mike Anderson saying: “tell us when it will be done”. I can’t really say it’s done now, but after the conference we already had versions 6 and 7 and contrary to other releases in the latest versions the number of commits is very small. After 3~4 months the project is reaching a maintenance phase as I said it would. If you would like to see my slides, download it here or see it online below. You can also watch the video of my talk as well as all the others in LF’s video website.

Continue reading

Android Builders Summit 2012

Four weeks ago, from Februart 13rd to Februrary 17th I was at Android Builders Summit and Embedded Linux Conference in San Francisco. I was a bit busy these last weeks so I didn’t have an opportunity to write here about the conferences as I usually do. I was going to do a post about both the conferences, but after writing a little bit I realized it would be very big. So I split in 2 and here is the one for ABS 2012; the other is coming soon.

This was my first time at Android Builders Summit. Since in the end of last year I participated in a project modifying Android internals, I felt it would be really good to be in touch with people doing the same things and learn with them. Before going there I was already surprised that Google was not sponsoring the conference, but there I was astonished that there was nobody from Android’s team and I don’t remember talking to Googlers, too. I don’t know what happened but it would be really good for the next conference if Google could be part of the conference since for the very nature of how they manage the project they are the people pushing the platform forward.

In the first day of the conference Greg Kroah-Hartman, Tim Bird and Zach Pfeffer answered the question “Android and the Linux Kernel Mainline: Where Are We“: it’s done. Well, not totally done, but most of the code needed in kernel is already in mainline: except for some pieces that render your device useful it’s already possible to boot Android userspace with a mainline kernel. I think the main point of this effort is to allow companies and enthusiasts to use features from the mainline kernel and newer versions than the ones available on AOSP. As the diff between mainline and Android’s kernel decreases it’s much easier to deploy an Android device with a different kernel. More details can be found in http://lwn.net/Articles/481661/.

From the other talks I attended on the first day, the one that caught my eyes was USB Device Support Workshop. Bernie Thompson from Plugable talked a bit about the lack of proper support in Android to deal with kernel modules: it’s really hard for device maker companies like his own to have products working on Android. And it’s not because they aren’t committed to developing Linux device drivers but because of the lack of support in Android to easily deal with kernel drivers: either the external device is supported by the company shipping the Android product or there’s no way for example to plug in an external camera and get it to work. Audience was a bit confused saying that that was a Linux problem and some voices telling that in Windows lands that doesn’t happen. Not true. Linux supports more devices that any other operating system in the world, however Android is currently missing some tools to profit from it. After some discussion Bernie prepared some tables with USB devices that people could hack on, get it supported in Linux/Android, etc.

On the second day I attended Real-Time Android, particularly because of my involvement with real-time since I graduated at university and because I was curious about applying it to Android. As I said one of the benefits of  having Android kernel closer to mainline is that it’s easier to do things like this. Wolfgang Mauerer from Siemens applied the RT_PREEMPT patches to Android’s kernel so you could have a real-time embedded system and still use Android’s app. As I was expecting RT would be applied for native applications, not java based ones.

Topics in Designing An Android Sensor Subsystem: Pitfalls and Considerations was advanced talk about Sensors in Android and how one would choose one strategy over another and the tradeoffs between battery life, sample rate,  external co-processor, DIY or license the algorithms used, etc. It was not a talk for the regular Joe doing an app that uses the Android’s Sensors API  (that was what I knew about it) but more for people creating devices that would like to use sensors.

It was a conference different from the conferences I’m used to attend like ELC/LinuxCon: there was very few people who I already knew and I had the feeling that we were talking about a product from someone else, not a product we were helping to develop – instead we were having talks about how to hack a platform we do not own. In general I liked the talks I could attend and talking to people at the corridors. They even gave me some insights for my talk about kmod, later on Friday at ELC. I’ll talk more about it on the next post.

For those wanting to see the slides/videos, Linux Foundation made them available at their site – go on and see for yourselves.

ANNOUNCE: codespell 1.4

codespell 1.4 is out! Nothing really new, just a maintenance release: 1 bug fix and some new entries to the dictionary. See the entire announcement on its mailing list.

As per patches I’m receiving it seems that codespell is being successfully used by opensource projects. I’m glad codespell can help those projects, particularly people who don’t have English as their mother tongue as I don’t. It’s also an opportunity to people starting on a project, as I said in last LinuxCon Brazil.

I’m not submitting patches anymore to Linux kernel myself using codespell because after doing that twice I started to receive a lot of emails from people using get_maintainer script. It’s very annoying to filter the good emails (that were indeed addressed to me) from that that I was in CC just because there was a misspelling that my patch fixed. Since that patch touched 2463 files, it’s very common to have my name in the output of get_maintainer :-(. I’m still trying to figure out how to properly filter that without losing important emails. Any tips (I’m a GMail user)?

Back to codespell announcement, the only missing item in TODO is to be able to separate comments, strings and source code in order to fix misspellings only in the first 2. Nonetheless codespell seems to be doing a good job without that feature and unless someone step in to implement it without impacting the parse time too much, my plan is to keep as is.

Go get it (and package for your distro) while it’s fresh!

by Lucas De Marchi