Hello everyone, I decided to continue discussing my lab 6 results in a separate post, so instead today I will talk only about my project. Now in the last post I mentioned that I was going to do something related to video, however as another classmate is covering the same topic, I decided to focus on image compression instead.
The open source project I opted to study is Google’s own image compression routine called Guetzli, which is used to convert png images to jpeg, with minimal loss in quality (lossy). What I found interesting about it was not necessarily that it was from Google, or that it even produced good results. No, in fact I was actually interested in why the code execution is quite processor intensive and slow, when compared to other solutions like mozjpeg. I also wanted to see how feasible the compression solution might be on low powered machines with limited memory, similar to a cellphone. That is more in regards to the AArch_64 computers since another computer that will be used, does not really qualify for that, but still might be fun to test on. More on those machines down below.
That in mind, the developer is quite transparent about this fact:
Note: Guetzli uses a large amount of memory. You should provide 300MB of memory per 1MPix of the input image.
Note: Guetzli uses a significant amount of CPU time. You should count on using about 1 minute of CPU per 1 MPix of input image.
Scary stuff, but I was not daunted proceeded to test the code on 3 of the schools servers: “archie”(AArch_64), “charlie”(AArch_64), and “xerxes”(x86_64). Here is a very basic look at some of the specifications for these machines:
archie:
memory 4GB
cpu 0 1GHz 24 cores (don’t let the number of cores fool you, they are slow)
display GeForce GT 710 (may not be a factor in this as everything is done through the command line)
charlie:
memory 8GB
cpu 0 2.4GHz 8 cores, 8 enabled
xerexes:
memory 32GB
cpu 4GHz, 8 cores *2 threads
Just looking at this you can already tell that xerexes is hardly low powered, and will be able to brute force through the code execution. Therefore, I decided to progressively increase image resolution until I achieved a result that took several minutes to complete. In xerexes’ case that limit was a 1080p image, which is considered a 2MP (Megapixel) image, as far as a 16:9 aspect ratio is concerned. For the other machines, I kept the limit at 480p. In regards to compiling optimizations, I used levels 1, 2, 3, and “fast” , in addition to the default which had no specified level. Finally, I performed two executions of the program per test criteria.
That out of the way, here are the initial results with the stock build settings:
archie
$ time ./guetzli bees.png b.jpg real 3m18.660s user 3m14.108s sys 0m3.885s $ time ./guetzli bees.png b2.jpg real 3m18.665s user 3m14.488s sys 0m3.506s $ time ./guetzli 480.png 480_t.jpg real 16m42.537s user 16m21.184s sys 0m18.052s $ time ./guetzli 480.png 480_t2.jpg real 16m45.095s user 16m24.027s sys 0m17.824s
charlie:
$ time ./guetzli bees.png b_c1.jpg real 1m8.532s user 1m6.669s sys 0m1.747s $ $ time ./guetzli bees.png b_c2.jpg real 1m8.508s user 1m6.903s sys 0m1.497s $ $ time ./guetzli 480.png 480_c1.jpg real 5m42.873s user 5m35.678s sys 0m6.626s $ time ./guetzli 480.png 480_c2.jpg real 5m42.419s user 5m35.049s sys 0m6.857s
xerxes
$ time ./guetzli bees.png bx1.jpg real 0m15.298s user 0m14.718s sys 0m0.552s $ time ./guetzli bees.png bx2.jpg real 0m15.211s user 0m14.663s sys 0m0.518s $ time ./guetzli 720.png 720x1.jpg real 3m24.511s user 3m16.453s sys 0m7.637s $ time ./guetzli 720.png 720x2.jpg real 3m19.899s user 3m12.027s sys 0m7.483s $ time ./guetzli 1080.png 1080x1.jpg real 7m27.438s user 7m11.859s sys 0m14.645s $ time ./guetzli 1080.png 1080x2.jpg real 7m28.931s user 7m13.267s sys 0m14.730s
All tests started with the image “bees.png,” which was already provided in the source files for testing. It’s not a particularly large file, with a resolution of 444 x 258, but it served as a good baseline for what to expect. That in mind, I opted to use this file most with archie, and would only move up to 480p when there was a noticeable difference in conversion time for the bees image. In the case of charlie, I felt that I could proceed with the 480p image. Likewise with regards to xerxes, I believed working with 720p would be enough, as anything lower would not provide meaningful results.
On to the results:
For archie, the default total execution time hovered around 3 minutes, and 18.6 seconds, where much of the work was being done in software at around 3 minutes and 14 seconds. This is slow to be sure, but what’s even more surprising is that it progressively got slower to execute between levels 1 and 2; adding an extra 0.1, and at worst 0.3 seconds approximately. Level 3 was a bit different in that it was slower than level 1 on the first attempt, but a bit faster than the default compile settings with it’s second attempt; achieving around 3 minutes and 18.5 seconds. When using fast, I actually did 3 tests attempts as the results were peculiar: the first attempt was comparable to the default settings performance at 3 minutes and 18.6 seconds, but the second attempt was closer to the slow performance of levels 1 and 2 at 3 minutes and 18.8 seconds. The third attempt was again comparable to the default settings. One other thing to note was that it when using fast, the time spent accessing the hardware increased. Makes sense as instructions get executed extremely fast in hardware.
Here are the results for archie:
archie
---default--- $ time ./guetzli bees.png b.jpg real 3m18.660s user 3m14.108s sys 0m3.885s $ time ./guetzli bees.png b2.jpg real 3m18.665s user 3m14.488s sys 0m3.506s ---level 1--- $ time./guetzli bees.png bo1b.jpg real 3m18.751s user 3m14.541s sys 0m3.579s $ time./guetzli bees.png bo1b.jpg real 3m18.780s user 3m13.978s sys 0m4.186s ---level 2--- $ time./guetzli bees.png bo2a.jpg real 3m18.919s user 3m14.145s sys 0m4.116s $ time./guetzli bees.png bo2b.jpg real 3m18.787s user 3m14.241s sys 0m3.896s ---level 3--- $ time./guetzli bees.png bo31.jpg real 3m18.912s user 3m14.498s sys 0m3.746s $ time./guetzli bees.png bo32.jpg real 3m18.540s user 3m13.948s sys 0m3.945s ---fast--- $ time./guetzli bees.png bof1.jpg real 3m18.612s user 3m13.916s sys 0m4.045s $ time./guetzli bees.png bof2.jpg real 3m18.846s user 3m13.784s sys 0m4.393s $ time./guetzli bees.png bof3.jpg real 3m18.640s user 3m14.023s sys 0m3.957s
For charlie, the results were mostly similar to archie in that they showed a noticeable amount of slowdown when compared to the default configuration. Two observation however, was that level 2 produced the slowest results, while Level 3 managed to be faster than level 2, but averaged around 5 minutes 42.99 seconds. Now when looking at the fast setting, we begin to see a glimmer of hope as the execution performance for the first test attempt is slightly faster than the default compile configuration. However, the default setting’s second test result were still the fastest overall. This is definitely a start, however I believe more testing and optimizations that override the compiler, may be required to get better performance on AArch_64 hardware.
Here are test results for charlie:
charlie
---default--- $ time ./guetzli 480.png 480_c1.jpg real 5m42.873s user 5m35.678s sys 0m6.626s $ time ./guetzli 480.png 480_c2.jpg real 5m42.419s user 5m35.049s sys 0m6.857s ---level 1--- $ time ./guetzli 480.png 480_o1a.jpg real 5m42.741s user 5m35.375s sys 0m6.857s $ time ./guetzli 480.png 480_o1b.jpg real 5m42.751s user 5m35.635s sys 0m6.606s $ time ./guetzli 480.png 480_o1c.jpg real 5m44.614s user 5m37.364s sys 0m6.706s $ time ./guetzli 480.png 480_o1d.jpg real 5m42.441s user 5m35.278s sys 0m6.615s ---level 2--- $ time ./guetzli 480.png 480_o2a.jpg real 5m43.741s user 5m36.731s sys 0m6.446s $ time ./guetzli 480.png 480_o2b.jpg real 5m43.106s user 5m34.911s sys 0m7.654s ---level 3--- $ time ./guetzli 480.png 480_o3a.jpg real 5m42.989s user 5m35.897s sys 0m6.556s $ time ./guetzli 480.png 480_o3b.jpg real 5m42.994s user 5m35.282s sys 0m7.144s ---fast--- $ time ./guetzli 480.png 480_ofa.jpg real 5m42.625s user 5m35.075s sys 0m7.046s $ time ./guetzli 480.png 480_ofb.jpg real 5m42.589s user 5m35.429s sys 0m6.647s $ time ./guetzli 480.png 480_ofc.jpg real 5m42.818s user 5m35.515s sys 0m6.766s
In the case of xerxes, that machine was providing results that I expected:
as the optimizations increased, the faster the program executed.
Now if you recall, the 720p test the two test results ranged between 3 minutes and 24.5 seconds, to 3 minutes and 19.9 seconds. However with optimization level 1, the program managed to drop around 5 second, resulting in an average of 3 minutes and 19.5 seconds! This was a good start but, if you think this will play out like you expect, then you might want to keep reading:
xerxes
---default--- $ time ./guetzli 720.png 720x1.jpg real 3m24.511s user 3m16.453s sys 0m7.637s $ time ./guetzli 720.png 720x2.jpg real 3m19.899s user 3m12.027s sys 0m7.483s ---level 1--- $ time ./guetzli 720.png 720o1a.jpg real 3m19.412s user 3m11.553s sys 0m7.464s $ time ./guetzli 720.png 720o1b.jpg real 3m19.522s user 3m11.643s sys 0m7.488s ---level 2--- $ time ./guetzli 720.png 720o2a.jpg real 3m19.162s user 3m11.306s sys 0m7.468s $ time ./guetzli 720.png 720o2b.jpg real 3m19.973s user 3m12.013s sys 0m7.561s $ time ./guetzli 720.png 720o2c.jpg real 3m19.247s user 3m11.198s sys 0m7.654s $ time ./guetzli 720.png 720o2d.jpg real 3m19.248s user 3m11.365s sys 0m7.495s ---level 3--- $ time ./guetzli 720.png 720o3a.jpg real 3m19.703s user 3m11.747s sys 0m7.564s $ time ./guetzli 720.png 720o3b.jpg real 3m20.014s user 3m12.140s sys 0m7.481s ---fast--- $ time ./guetzli 720.png 720ofa.jpg real 3m19.145s user 3m11.188s sys 0m7.566s $ time ./guetzli 720.png 720ofb.jpg real 3m19.255s user 3m11.308s sys 0m7.559s
As you can see, setting the optimization levels did improve performance when using levels 1 -2 (level 2 did have one outlier though). However interestingly enough, level 3 produced consistently slower results than the first two levels. Perhaps some of the more unsafe settings in level 3 may have impacted the program in a negative way. The same can not be said for the last setting, as it lived up to it’s name and produced results that were indeed faster than all the other settings. It did however result in a 2 MB increase to the program file size (default settings weigh in at about 4 MB).
Well this was the first stage of my optimization research, I look forward to updating you all in the next project post. See you then!