Playing with your cache

Another possible titles could be “how to really slow down your computer” or “why caches are so important in computer architectures”. I started playing with turning my cache on and off last week using Linux because there are some situations in which you have to know why a piece of software is not working as expected. A possible problem could be the well known cache trashing, in which the contents of the cache is thrown away very often. Turning it down may give an answer if this is really the problem since now even running very slowly this variable is eliminated.

For example in my stage we had some tests showing that in certain scenarios a quad-core machine is much slower than an equivalent single or dual core. Next post I’ll show how to play with your cores, activating and deactivating them, so you may create your own programs and test them against a 1, 2, 3,… cores machine.

Also I think it’s a good exercise to students of computer science/engineering who are enrolled in courses as “computer architectures” and “operating systems”. For those, two good books: “Modern Operating Systems” by Tanenbaum and “Computer Architecture: A Quantitative Approach” by Hennessy and Patterson.

In an Intel processor there are 3 ways to disable the cache (see section 10.5 of [1] for more details):

  1. Setting and unsetting the CD bit in register CR0;
  2. Using page-level cache control flags;
  3. Using MTRR (Memory Type Range Registers);

Since it’s possible to play with disabling and enabling the caches only in kernel mode and inside Linux MTRRs are already exported to user-space through the /proc/mtrr pseudo-file, the third method is the easiest and the one I chose to demonstrate here.

Verifying the availability of MTRRs

Before using MTRRs it’s important to see if the target processor support it. You can confirm it searching for a mtrr flag int the /proc/cpuinfo pseudo-file, which when read uses the CPUID assembly instruction to discover the processor’s capabilities. The following code shows an example.

[lucas@skywalker]$ cat /proc/cpuinfo | grep flags flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm lahf_lm ida tpr_shadow vnmi flexpriority

Other two tools that can be used to see details of the processor: dmidecode[3] (must run as root) and the stand-alone cpuid[4] program. Neither of them come installed by default on most distributions, but they give a lot of information about the processor which turn them worth installing. The first parses a table named SMIBIOS (some say DMI) reported by the BIOS while the second uses the CPUID instruction to query the processor.

Reading MTRRs

After verifying you have a supported CPU, the current way of accessing the caches can be seen reading the file /proc/mtrr. Example:

[lucas@skywalker ~]$ cat /proc/mtrr reg00: base=0x000000000 (    0MB), size= 2048MB, count=1: write-back reg01: base=0x0e0000000 ( 3584MB), size=  128MB, count=1: write-combining

The first field indicates the rule’s number. It’s used for deleting or changing a rule afterwards. Base is the start memory address, size is (sic) the size of the region in memory, count is the number of regions with this size, and the last field is the current “policy” of the cache.

In this example all the physical RAM (2 GB) is accessed using a write-back mechanism for caches. The second rule references to video memory and is automatically configured by the X server. The following code is a piece of <linux_source>/arch/x86/include/mtrr.h containing the available mechanisms for cache accesses. It can also be verified in table 10-8 of [1].

#define MTRR_TYPE_UNCACHABLE 0 #define MTRR_TYPE_WRCOMB     1 /*#define MTRR_TYPE_         2*/ /*#define MTRR_TYPE_         3*/ #define MTRR_TYPE_WRTHROUGH  4 #define MTRR_TYPE_WRPROT     5 #define MTRR_TYPE_WRBACK     6 #define MTRR_NUM_TYPES       7

Besides this way of obtaining MTRRs’ information, the /proc/mtrr pseudo-file has also an IOCTL interface (the one use above is named ASCII interface) that define some calls that could be used inside a program. [2] has some examples how this could be done. I stick with the first option to continue this short tutorial.

Writing MTRRs

You can create a new rule writting to /proc/mtrr as if it was a regular file, indicating the base, size and type fields. New rules are allowed to overlap existing ones, and the final result depends on the processor. For Intel processors there’s a precedence among types as below:

Uncachable > Write-combining > Write-through > Write-back.

Turning the caches off for a region in memory is just a matter of changing that region’s type to uncachable. You do so by writing the desired fields to /proc/mtrr. The following example creates a third rule setting all the memory to uncachable. Even if still exists a rule setting all memory to write-back, because of the precedence mentioned above the final result will be the application of this last rule.

[root@skywalker ~]# echo "base=0x00000000 size=0x80000000 type=uncachable" > /proc/mtrr [root@skywalker ~]# cat /proc/mtrr reg00: base=0x000000000 ( 0MB), size= 2048MB, count=1: write-back reg01: base=0x0e0000000 ( 3584MB), size= 128MB, count=1: write-combining reg02: base=0x000000000 ( 0MB), size= 2048MB, count=1: uncachable

Congratulations!! Now your system will start to run really really slowly and the it will look a little irresponsive, even in the simple task of writing in the terminal to enable the caches again. Be patient an issue the following command to have your system back again:
[root@skywalker ~]# echo "disable=2" > /proc/mtrr
It uses the number of the rule you want to delete. In our case is the one we have just created. The new set of rules:

[root@skywalker ~]# cat /proc/mtrr reg00: base=0x000000000 (    0MB), size= 2048MB, count=1: write-back reg01: base=0x0e0000000 ( 3584MB), size=  128MB, count=1: write-combining

Conclusion

Note you don’t have to set the whole memory to uncachable. You may set another base and size fields so to have just a piece of it as uncachable. Thus only programs that use this region would be running very slowly.

Although the applications of turning the caches off are really small, I think it’s very cool. There is one limitation however that I’d like to surpass but that is not possible with x86 (at least I didn’t find out a way to do it): to turn caches on and off independently of their level, that is, to have a level 1 cache turned off while still having it on on level 2 and vice versa.


[1] Intel 64 and IA-32 Architectures Software Developer’s Manual Volume 3A: System Programming Guide, Part 1. Intel. Available at: http://www.intel.com/Assets/PDF/manual/253668.pdf

[2] <linux_source>/Documentation/x86/mtrr.txt

[3] Dmidecode. Available at http://www.nongnu.org/dmidecode/

[4] Cpuid. Available at http://etallen.com/cpuid.html