Spo600 Project Stage 1

Hello everyone, I decided to continue discussing my lab 6 results in a separate post, so instead today I will talk only about my project. Now in the last post I mentioned that I was going to do something related to video, however as another classmate is covering the same topic, I decided to focus on image compression instead.

The open source project I opted to study is Google’s own image compression routine called Guetzli, which is used to convert png images to jpeg, with minimal loss in quality (lossy). What I found interesting about it was not necessarily that it was from Google, or that it even produced good results. No, in fact I was actually interested in why the code execution is quite processor intensive and slow, when compared to other solutions like mozjpeg. I also wanted to see how feasible the compression solution might be on low powered machines with limited memory, similar to a cellphone. That is more in regards to the AArch_64 computers since another computer that will be used, does not really qualify for that, but still might be fun to test on. More on those machines down below.

That in mind, the developer is quite transparent about this fact:

Note: Guetzli uses a large amount of memory. You should provide 300MB of memory per 1MPix of the input image.

Note: Guetzli uses a significant amount of CPU time. You should count on using about 1 minute of CPU per 1 MPix of input image.

Scary stuff, but I was not daunted proceeded to test the code on 3 of the schools servers: “archie”(AArch_64), “charlie”(AArch_64), and “xerxes”(x86_64). Here is a very basic look at some of the specifications for these machines:

archie:
memory 4GB
cpu 0 1GHz 24 cores (don’t let the number of cores fool you, they are slow)
display GeForce GT 710 (may not be a factor in this as everything is done through the command line)

charlie:
memory 8GB
cpu 0 2.4GHz 8 cores, 8 enabled

xerexes:
memory 32GB
cpu 4GHz, 8 cores *2 threads

Just looking at this you can already tell that xerexes is hardly low powered, and will be able to brute force through the code execution. Therefore, I decided to progressively increase image resolution until I achieved a result that took several minutes to complete. In xerexes’ case that limit was a 1080p image, which is considered a 2MP (Megapixel) image, as far as a 16:9 aspect ratio is concerned. For the other machines, I kept the limit at 480p. In regards to compiling optimizations, I used levels 1, 2, 3, and “fast” , in addition to the default which had no specified level. Finally, I performed two executions of the program per test criteria.

That out of the way, here are the initial results with the stock build settings:

archie

$ time ./guetzli bees.png b.jpg
real 3m18.660s
user 3m14.108s
sys 0m3.885s
$ time ./guetzli bees.png b2.jpg
real 3m18.665s
user 3m14.488s
sys 0m3.506s
$ time ./guetzli 480.png 480_t.jpg
real 16m42.537s
user 16m21.184s
sys 0m18.052s
$ time ./guetzli 480.png 480_t2.jpg
real 16m45.095s
user 16m24.027s
sys 0m17.824s

charlie:

$ time ./guetzli bees.png b_c1.jpg
real 1m8.532s
user 1m6.669s
sys 0m1.747s
$
$ time ./guetzli bees.png b_c2.jpg
real 1m8.508s
user 1m6.903s
sys 0m1.497s
$
$ time ./guetzli 480.png 480_c1.jpg
real 5m42.873s
user 5m35.678s
sys 0m6.626s

$ time ./guetzli 480.png 480_c2.jpg
real 5m42.419s
user 5m35.049s
sys 0m6.857s

xerxes

$ time ./guetzli bees.png bx1.jpg
real 0m15.298s
user 0m14.718s
sys 0m0.552s
$ time ./guetzli bees.png bx2.jpg
real 0m15.211s
user 0m14.663s
sys 0m0.518s
$ time ./guetzli 720.png 720x1.jpg
real 3m24.511s
user 3m16.453s
sys 0m7.637s
$ time ./guetzli 720.png 720x2.jpg
real 3m19.899s
user 3m12.027s
sys 0m7.483s
$ time ./guetzli 1080.png 1080x1.jpg
real 7m27.438s
user 7m11.859s
sys 0m14.645s
$ time ./guetzli 1080.png 1080x2.jpg
real 7m28.931s
user 7m13.267s
sys 0m14.730s

All tests started with the image “bees.png,” which was already provided in the source files for testing. It’s not a particularly large file, with a resolution of 444 x 258, but it served as a good baseline for what to expect. That in mind, I opted to use this file most with archie, and would only move up to 480p when there was a noticeable difference in conversion time for the bees image. In the case of charlie, I felt that I could proceed with the 480p image. Likewise with regards to xerxes, I believed working with 720p would be enough, as anything lower would not provide meaningful results.

On to the results:

For archie, the default total execution time hovered around 3 minutes, and 18.6 seconds, where much of the work was being done in software at around 3 minutes and 14 seconds. This is slow to be sure, but what’s even more surprising is that it progressively got slower to execute between levels 1 and 2; adding an extra 0.1, and at worst 0.3 seconds approximately. Level 3 was a bit different in that it was slower than level 1 on the first attempt, but a bit faster than the default compile settings with it’s second attempt; achieving around 3 minutes and 18.5 seconds. When using fast, I actually did 3 tests attempts as the results were peculiar: the first attempt was comparable to the default settings performance at 3 minutes and 18.6 seconds, but the second attempt was closer to the slow performance of levels 1 and 2 at 3 minutes and 18.8 seconds. The third attempt was again comparable to the default settings. One other thing to note was that it when using fast, the time spent accessing the hardware increased. Makes sense as instructions get executed extremely fast in hardware.

Here are the results for archie:

archie

---default---

$ time ./guetzli bees.png b.jpg
real 3m18.660s
user 3m14.108s
sys 0m3.885s
$ time ./guetzli bees.png b2.jpg
real 3m18.665s
user 3m14.488s
sys 0m3.506s

---level 1---

$ time./guetzli bees.png bo1b.jpg
real 3m18.751s
user 3m14.541s
sys 0m3.579s
$ time./guetzli bees.png bo1b.jpg
real 3m18.780s
user 3m13.978s
sys 0m4.186s

---level 2---

$ time./guetzli bees.png bo2a.jpg
real 3m18.919s
user 3m14.145s
sys 0m4.116s
$ time./guetzli bees.png bo2b.jpg
real 3m18.787s
user 3m14.241s
sys 0m3.896s

---level 3---

$ time./guetzli bees.png bo31.jpg
real 3m18.912s
user 3m14.498s
sys 0m3.746s
$ time./guetzli bees.png bo32.jpg
real 3m18.540s
user 3m13.948s
sys 0m3.945s

---fast---

$ time./guetzli bees.png bof1.jpg
real 3m18.612s
user 3m13.916s
sys 0m4.045s
$ time./guetzli bees.png bof2.jpg
real 3m18.846s
user 3m13.784s
sys 0m4.393s
$ time./guetzli bees.png bof3.jpg
real 3m18.640s
user 3m14.023s
sys 0m3.957s

For charlie, the results were mostly similar to archie in that they showed a noticeable amount of slowdown when compared to the default configuration. Two observation however, was that level 2 produced the slowest results, while Level 3 managed to be faster than level 2, but averaged around 5 minutes 42.99 seconds. Now when looking at the fast setting, we begin to see a glimmer of hope as the execution performance for the first test attempt is slightly faster than the default compile configuration. However, the default setting’s second test result were still the fastest overall. This is definitely a start, however I believe more testing and optimizations that override the compiler, may be required to get better performance on AArch_64 hardware.

Here are test results for charlie:

charlie

---default---

$ time ./guetzli 480.png 480_c1.jpg
real 5m42.873s
user 5m35.678s
sys 0m6.626s

$ time ./guetzli 480.png 480_c2.jpg
real 5m42.419s
user 5m35.049s
sys 0m6.857s

---level 1---

$ time ./guetzli 480.png 480_o1a.jpg
real 5m42.741s
user 5m35.375s
sys 0m6.857s
$ time ./guetzli 480.png 480_o1b.jpg
real 5m42.751s
user 5m35.635s
sys 0m6.606s
$ time ./guetzli 480.png 480_o1c.jpg
real 5m44.614s
user 5m37.364s
sys 0m6.706s
$ time ./guetzli 480.png 480_o1d.jpg
real 5m42.441s
user 5m35.278s
sys 0m6.615s

---level 2---

$ time ./guetzli 480.png 480_o2a.jpg
real 5m43.741s
user 5m36.731s
sys 0m6.446s
$ time ./guetzli 480.png 480_o2b.jpg
real 5m43.106s
user 5m34.911s
sys 0m7.654s

---level 3---

$ time ./guetzli 480.png 480_o3a.jpg
real 5m42.989s
user 5m35.897s
sys 0m6.556s
$ time ./guetzli 480.png 480_o3b.jpg
real 5m42.994s
user 5m35.282s
sys 0m7.144s

---fast---

$ time ./guetzli 480.png 480_ofa.jpg
real 5m42.625s
user 5m35.075s
sys 0m7.046s
$ time ./guetzli 480.png 480_ofb.jpg
real 5m42.589s
user 5m35.429s
sys 0m6.647s
$ time ./guetzli 480.png 480_ofc.jpg
real 5m42.818s
user 5m35.515s
sys 0m6.766s

In the case of xerxes, that machine was providing results that I expected:

as the optimizations increased, the faster the program executed.

Now if you recall, the 720p test the two test results ranged between 3 minutes and 24.5 seconds, to 3 minutes and 19.9 seconds. However with optimization level 1, the program managed to drop around 5 second, resulting in an average of 3 minutes and 19.5 seconds! This was a good start but, if you think this will play out like you expect, then you might want to keep reading:

xerxes

---default---

$ time ./guetzli 720.png 720x1.jpg
real 3m24.511s
user 3m16.453s
sys 0m7.637s
$ time ./guetzli 720.png 720x2.jpg
real 3m19.899s
user 3m12.027s
sys 0m7.483s

---level 1---

$ time ./guetzli 720.png 720o1a.jpg
real 3m19.412s
user 3m11.553s
sys 0m7.464s
$ time ./guetzli 720.png 720o1b.jpg
real 3m19.522s
user 3m11.643s
sys 0m7.488s

---level 2---

$ time ./guetzli 720.png 720o2a.jpg
real 3m19.162s
user 3m11.306s
sys 0m7.468s
$ time ./guetzli 720.png 720o2b.jpg
real 3m19.973s
user 3m12.013s
sys 0m7.561s
$ time ./guetzli 720.png 720o2c.jpg
real 3m19.247s
user 3m11.198s
sys 0m7.654s
$ time ./guetzli 720.png 720o2d.jpg
real 3m19.248s
user 3m11.365s
sys 0m7.495s

---level 3---

$ time ./guetzli 720.png 720o3a.jpg
real 3m19.703s
user 3m11.747s
sys 0m7.564s
$ time ./guetzli 720.png 720o3b.jpg
real 3m20.014s
user 3m12.140s
sys 0m7.481s

---fast---

$ time ./guetzli 720.png 720ofa.jpg
real 3m19.145s
user 3m11.188s
sys 0m7.566s
$ time ./guetzli 720.png 720ofb.jpg
real 3m19.255s
user 3m11.308s
sys 0m7.559s

As you can see, setting the optimization levels did improve performance when using levels 1 -2 (level 2 did have one outlier though). However interestingly enough, level 3 produced consistently slower results than the first two levels. Perhaps some of the more unsafe settings in level 3 may have impacted the program in a negative way. The same can not be said for the last setting, as it lived up to it’s name and produced results that were indeed faster than all the other settings. It did however result in a 2 MB increase to the program file size (default settings weigh in at about 4 MB).

Well this was the first stage of my optimization research, I look forward to updating you all in the next project post. See you then!

Leave a comment

Design a site like this with WordPress.com
Get started