-
ANNOUNCE: codespell 1.0
I’m glad to announce codespell 1.0! After 3 RCs and patches submitted to several projects, I thought it was stable enough to call it 1.0. You can download the 1.0 version below:
http://packages.profusion.mobi/codespell/codespell-1.0.tar.bz2
See my previous post if you are willing to know what codespell is or read the README file inside the package.
I have already filled a TODO file with ideas for the next version. They came to mind after I’ve generated a giant patch for the Linux kernel. It is the biggest patch I’ve ever produced with codespell. Really, I think it’s the biggest patch I’ve produced ever. I hope Linus accept that patch as is because without the changes I’m planning for codespell 1.1, it’s a pain to fix some corner cases.
Some people are sending me suggestions and more misspellings to my email. I appreciate those emails and seeing they are successfully using codespell in other projects. If you want a faster way to have your changes incorporated into codespell, you may also send patches through git-send-email or just use my repository on github to send me pull requests.
UPDATE (04/11/2011): Linus accepted the patch. There’s also a discussion with further improvements for codespell on LKML.
-
On typos and misspellings
This week I’m in a release mood, so I’m releasing several projects I’m involved with. If you lost the first two, checkout dietsplash 0.3 and genslide 0.3 (though the announcement was in Portuguese).
After developing for several projects I’ve noticed most of them contain typos and misspellings. Even if this does not directly affect the source quality (unless the misspellings are in documentation), if we left the comment there, we’ve left it for a reason: because we want the reader of that code to stop and read it. It’s particularly good to have the correct spelling of each word when there are people from several parts of the word that maybe do not have English as their mother tongue (as I don’t). This way we can be more sure the correct message is being given through code, comments and documentation.
Thinking a bit on this I made some bash and awk scripts to fix misspellings based on the list of common misspellings available on wikipedia. I’ve successfully sent patches for projects like the Linux kernel, ConnMan, oFono and EFL. After some of them were accepted and after I decided to run the scripts again, I noticed how slow they were (if you are curious what they did, you can google on the oFono mailing list, in which I explain the scripts). So, I started a new, very short project: codespell. Measuring against the Linux kernel tree, it runs circa 20x faster than the previous scripts. Its current version is 1.0-rc1 and I’d like to have some more testers before I release the final 1.0.
Codespell is designed to fix misspellings in source code, but it can be applied to any type of text files. When possible, codespell will automatically fix the misspelling. Otherwise it will give some suggestions about possible changes. For example, running against the Linux kernel tree, it gives me several lines like below:
drivers/target/target_core_transport.c:2528: competion ==> competition, completion
drivers/edac/cpc925_edac.c:186: MEAR ==> wear, mere, mare
WARNING: Decoding file drivers/hid/hid-pl.c
WARNING: using encoding=utf-8 failed.
WARNING: Trying next encoding: iso-8859-1WARNING: Decoding file drivers/hid/hid-pl.cWARNING: using encoding=utf-8 failed. WARNING: Trying next encoding: iso-8859-1
drivers/net/niu.c:3276: clas ==> class | disabled because of name clash in c++
FIXED: ../kernel/drivers/scsi/aacraid/aacraid.h
FIXED: ../kernel/drivers/scsi/lpfc/lpfc_sli.c
FIXED: ../kernel/drivers/scsi/aacraid/aacraid.hFIXED: ../kernel/drivers/scsi/lpfc/lpfc_sli.c
(This is all in beautiful colored lines! Test it to see the true output)
The first two illustrate some changes that cannot be automatically done because that misspelling is a common one for more than one word. So, codespell gives you the file and line where they occur.
The WARNINGs are related to the encoding of the file. Codespell will default to parse files in UTF-8 encoding, which will handle ‘ascii’ as well. If it fails to decode any line, it will try the next available encoding, i.e. ISO-8859-1. Using these two encodings I have successfully ran codespell with all the projects I care about.
Codespell allows some changes to be disabled. This is shown by the “clas => class” fix, that are not always safe to do because of name clash with C++ code.
The lines prefixed with “FIXED” show the files that were automatically fixed. In current Linus’ master branch, this resulted in:
2545 files changed, 5007 insertions(+), 5007 deletions(-)
These were the automatic fixes, that may contain some false positives. The funniest one is the on found in Documentation/DocBook/kernel-hacking.tmpl:
/** Sun people can’t spell worth damn. “compatability” indeed.* At least we *know* we can’t spell, and use a spell-checker.*/As can be seen by the number above, this is not really true
.So, there it’s: codespell 1.0-rc1. Get it. Test it. Report problems. Tell me about projects that were successfully patched. -
git repository is down
I’m turning down my repository at politreco.com. All my projects are now available either on ProFUSION’s repo or on github.
-
Criando imagens customizadas de endereços de email
Endereços de email são muitas vezes mascarados na web para que não sejam facilmente obtidos por programas que tem por objetivo enviar spam. É relativamente fácil escrever um programinha que fica vasculhando a web em busca de endereços de email, cadastrá-los em um banco de dados e depois utilizá-los como quisermos. Para que isso não ocorra, várias vezes nos deparamos com endereços de email passados como “joao AT gmail DOT com” ou alguma coisa do tipo. Outra técnica muito utilizada é transformá-lo em uma imagem. Dessa forma o programinha malicioso que a princípio era fácil se torna bem mais difícil.
Eu acho que hoje em dia temos bons programas de anti-spam, o que torna os métodos acima não tão relevantes. Porém, acredito que converter o endereço de email em uma imagem o deixa bonito. Veja por exemplo a minha página de contatos. Fazer imagens desse tipo apara domínios conhecidos como “gmail.com”, “hotmail.com” e “yahoo.com” é bem fácil: vários sites permitem que você escolha um desses domínios e escreva o seu email. Um que eu gostei foi o do freetechjournal, pois tem vários domínios ali dos serviços de email mais populares. É fácil encontrar serviços semelhantes que permitem alguma customização.
Procurei bastante na web algum site que permitisse criar um email com uma imagem de domínio customizada, porém não achei nenhum. Como é uma operação bem simples, resolvi fazer um script em python para isso. Estou disponibilizando o iconifymail, que contém também a imagem usada para a empresa em que trabalho, ProFUSION, que pode servir como base para que você faça a sua própria imagem (obrigado, Marina). O uso é bem simples, basta passar como primeiro argumento a imagem a ser utilizada como domínio e, como segundo, o seu endereço de email (sem o domínio). Algumas coisas eu deixei hard-coded no script, mas creio que seja fácil adaptá-lo para o tamanho de imagem que você quiser. O resultado final é o que pode ser visto abaixo e também na minha página de contatos:
Essa é a versão 0.1 do iconifymail, open source, disponibilizado sob GPLv3. Se você quiser contribuir, pode usar os repositórios git: oficial ou mirror.
-
WebKit
After some time working with the EFL port of WebKit, I’ve been nominated as an official webkit developer. Now I have super powers in the official repository
, but I swear I intend to use it with caution and responsibility. I’ll not forget Uncle Ben’s advice: ”with great power comes great responsibility”.I’m preparing a post to talk about WebKit, EFL, eve (a new web browser based on WebKit + EFL) and how to easily embed a browser in your application. Stay tuned.
-
TinyOS
As I did with previous projects I had at my university, I’d like to share another one: it’s a project using TinyOS. It’s mainly intended for education purposes, so if you are trying to learn TinyOS, it’s a good example to look at.
What does it do?
It’s a platform to monitor temperature and humidity of various rooms in a house. All sensors must collect data every X seconds and send them to sink, a predefined node. Sink can also change the value X and sensors may not reach sink in a single hop, so nodes have to use their own routing protocol to send forward data from other nodes to sink.
If you are interested in this project, send me an email and so I could help you to understand it. I think it’s a good exercise to look at source code and discover which routing protocol I was talking about (before reading the README file).
As I think I won’t modify this code anymore, rather than put it in my git repository I made a .tar.bz2 which is available here: House Monitor source code.
-
Compiler’s compiler version
Today I was just wondering… what’s the version of the compiler which compiled my compiler. Quite a strange question to make myself and I really don’t know where this curiosity came from.
Looking in Wikipedia:
Early compilers were written in assembly language. The first self-hosting compiler — capable of compiling its own source code in a high-level language — was created for Lisp by Tim Hart and Mike Levin at MIT in 1962.[2] Since the 1970s it has become common practice to implement a compiler in the language it compiles, although both Pascal and C have been popular choices for implementation language. Building a self-hosting compiler is a bootstrapping problem — the first such compiler for a language must be compiled either by a compiler written in a different language, or (as in Hart and Levin’s Lisp compiler) compiled by running the compiler in an interpreter.
Interesting, don’t you think? So let’s see the version of your compiler’s compiler. If you use GCC, it will put a comment in section named (surprise!) .comment. Generate the assembly correspondent to a C source code and you are going to see in the end of the file an entry like this:
.size main, .-main
.ident “GCC: (GNU) 4.4.1″
.section .note.GNU-stack,”",@progbitsSo, let’s play with our already compiled compiler. First we have to check the compiler version:
[lucas@skywalker tmp]$ gcc -v
Using built-in specs.
Target: i686-pc-linux-gnu
Configured with: ../configure –prefix=/usr –enable-shared –enable-languages=c,c++,fortran,objc,obj-c++ –enable-threads=posix –mandir=/usr/share/man –infodir=/usr/share/info –enable-__cxa_atexit –disable-multilib –libdir=/usr/lib –libexecdir=/usr/lib –enable-clocale=gnu –disable-libstdcxx-pch –with-tune=generic
Thread model: posix
gcc version 4.4.1 (GCC)Ok! Version 4.4.1. Let’s use the readelf command to see the content of .comment section:
[lucas@skywalker tmp]$ readelf -p .comment /usr/bin/gcc
String dump of section ‘.comment’:
[ 1] GCC: (GNU) 4.4.0 20090630 (prerelease)
[ 29] GCC: (GNU) 4.4.0 20090630 (prerelease)
[ 51] GCC: (GNU) 4.4.1[ 63] GCC: (GNU) 4.4.1
(…)
[ 237] GCC: (GNU) 4.4.1
[ 249] GCC: (GNU) 4.4.1
[ 25b] GCC: (GNU) 4.4.0 20090630 (prerelease)
[ 283] GCC: (GNU) 4.4.1
[ 295] GCC: (GNU) 4.4.0 20090630 (prerelease)I didn’t understand if it’s 4.4.1 or 4.4.0, i.e. if it was used a prior version to compile the current version or if current was recompiled afterwards with this new compiler produced. Testing random binaries in /usr/bin seems to produce similar effects, having more than one version.
So… no answers yet. Any clues?

