Monthly Archives: July 2009

Trissa 3D is arriving

I’m proud to release version 0.98 of one of my projects, Trissa. Whilst this is not a version intended for the end-user, I’d like to have some more users to find bugs before releasing 1.0. Be warned: this is intended for developers only. There’s no package yet, so you have to compile it yourself. Some screenshots:

First screen

First screen

About menu

About menu

Options menu

Options menu

Start of a 4x4x4 Game

Start of a 4x4x4 Game

Choosing first position - Human Player

Choosing first position - Human Player

After some time

After some time

Minmax Player wins

Minmax Player wins

I’m putting online also a presentation I gave about this project in my university, Politecnico di Milano. Trissa is the name of the Tic-tac-toe game in italian and I made it as part of a course: Laboratory of Artificial Intelligence and Robotics. Thanks to professor Francesco Amigoni who helped me. [ .odp / .pdf ]

Source code is available at Github. To get the last version with git:

git clone git://github.com/lucasdemarchi/trissa.git

You will need some libraries that may or may not be pointed out by the building system:

  • scons
  • OIS
  • Ogre3D
  • CEGUI
  • Boost

Let me know if you have any trouble to compile it and if you find any bugs. In “TODO” file there are some known bugs.

Sharing some homeworks

So, I’ve spent some time last weeks doing some homeworks for courses at Politecnico di Milano. One of them is for the course “Image analysis and synthesis” in which I did together with a friend of mine some animations in Blender.

The goal was to make some animations of sport scenes and after apply a kind of “filter” in Matlab to simulate a high exposure of the camera, or, if you prefer, to draw the trace of the ball. As the only object moving in scene was the ball, these two are roughly the same thing. This is done by taking the mean of W frames and making a new one from them, where W is the “window” of the algorithm.

We took four sports: basketball, table tennis, golf and bowling. The amount of things produced is ~ 400MB, so I’m not putting it online for everyone. If anyone is interested in the Blender models, ask me privately (you find my email in the “About” link above). I put some animations in youtube, sou you can see them:
Bowling: http://www.youtube.com/watch?v=YrgkLCTrsVg
Table tennis: http://www.youtube.com/watch?v=HmrDPdDevRM
Basketball: http://www.youtube.com/watch?v=TjeOXhxceWA

3D animation is one of the areas I admire, but definitely isn’t an area I’m good at. Also, this was the first time I did something serious with blender, so don’t expect superb results (and continue reading this post if you gave up after seeing the videos ;-) ).

After thinking a while about the time Matlab took to make the final results I decided that it could be better… Two things could be improved: (i) use more threads to calculate the new images and (ii) use CPU optimized CPU instructions. I think it’s a shame to have a dual core with lot of specialized instructions to do things faster and don’t use them. These two goals brought me to code the algorithm in C.

I used pthreads to divide the work among threads (you say at command line the number of threads you want, or just say it to use the number of available processors) and freeimage to load and save images. I did made another version that uses gdk-pixbuf to load and store images and it took me so disappointed. To simply load and save images DIFFERENT images (so, things that are not correlated) it needs to use synchronization mechanisms. This leads to an algorithm not scaling very well as it could. I also tested another one with CImg… and the result was so disappointing too. I think not every developer is performance-ish as I am. Or not everyone cares about it as long as it is easy to use. So, stop talking and show the results:

Time to compute in each implementation

Time to compute in each implementation

The best version run in less than half of the time required by Matlab! I could also optimize more the algorithm in C to get even better results, but I was already ok with that (read: I have other things to do). It’s also worth noting the improvement when optimizing the code for my CPU: with one thread I got a boost of ~20% and with 2 threads of ~18%. Particularly in this type of algorithm, it’s important to use SSE/SSE2 instructions to get a faster mean of the images. In fact, viewing the assembly code generated in this case we find these instructions as below (the xorps, movaps are SSE instructions that operate with xmmN registers).

 8049517:       0f 57 f6                xorps  %xmm6,%xmm6
 804951a:       8b 41 04                mov    0x4(%ecx),%eax
 804951d:       8b 4d 08                mov    0x8(%ebp),%ecx
 8049520:       39 41 08                cmp    %eax,0x8(%ecx)
 8049523:       0f 82 36 06 00 00       jb     8049b5f <worker_thread+0x6ff>
 8049529:       8d 7d cc                lea    -0x34(%ebp),%edi
 804952c:       0f 29 b5 08 ff ff ff    movaps %xmm6,-0xf8(%ebp)
 8049533:       89 bd 04 ff ff ff       mov    %edi,-0xfc(%ebp)
 8049539:       89 3c 24                mov    %edi,(%esp)
 804953c:       89 44 24 08             mov    %eax,0x8(%esp)
 8049540:       c7 44 24 04 64 a2 04    movl   $0x804a264,0x4(%esp

Awesome!! GCC automatically generated a SSE optimized code. If you got interested, see the wikipedia article about SSE.

The code is available under GPLv2 license at github: http://github.com/lucasdemarchi/imgmean. Let me know if it’s useful to you.

The browser war continues…

One month ago I wrote a post talking about the new Chrome for linux, or better yet, that the Chromium (this one is opensource) is finally working well under Linux, and so fast.

Well, while the Chromium guys are working hard to get a decent browser, the firefox guys didn’t stop working and last week Mozilla released the so waited Firefox 3.5.

Continue reading