Benchmarking: Graphics

Posted: December 21st, 2008 under ActionScript 3.0, Optimization.

These trials are concerned with rendering speed as well as common manipulations and methods of art assets. Graphics benchmarks are much harder to perform than tests on data and operations. In order to force Flash to redraw an asset in a easily controlled and measurable manner I resorted to using the bitmap.draw() function. There is an assumption that the drawing into a bitmap is comparable to how Flash draws all of its art. The Flash API is not always 100% accurate or as detailed as we would like (hence all this benchmarking), but it does say that the vector renderer is used to draw pixel information into a bitmap. If you are unfamiliar with Flash’s different rendering systems there are two main types, a vector renderer for vector art and a bitmap renderer for image files like .png and .jpg; any object that has filter effects or is set to cache as bitmap use the bitmap render – which is actually the faster of the two. The advantage of the vector renderer, and why it is the default setting, is that it is much more memory efficient (and artwork is always the most memory intensive part of any project). Cached objects and bitmaps are stored in ram memory, and if you flood a computer’s ram with artwork it will not matter how fast your code is, triggering a ram dump into hard drive space or going to virtual memory is going to bring everything to a halt. Adobe’s livedocs have comments that are more extensive on when to enable caching. Usually I will only cache large background objects. If you are ever curious about how much memory you are consuming you can use the System.totalMemory property of Flash.

Comparing Opaque and Alpha Draw Speed of Vector Graphics, 1,000 computations per trial. Run the benchmarks: Opaque, Alpha.

  10×10 Vector Square Relative Speed Average Std. Dev.
Computer A alpha 0.1165 0.0097
opaque 0.1165 0.0080
Computer D alpha + 17% 0.0374 0.0078
opaque 0.032 0
  50×50 Vector Square Relative Speed Average Std. Dev.
Computer A alpha + 7% 0.147 0.0110
opaque 0.137 0.0060
Computer D alpha + 2% 0.0481 0.004
opaque 0.047 0
  250×250 Vector Square Relative Speed Average Std. Dev.
Computer A alpha + 87% .6489 0.0205
opaque 0.3478 0.0119
Computer D alpha + 82% 0.1677 0.0071
opaque 0.092 0.0051

I have to point out the relative speed of this trial, only 1,000 computations per trial. We could do 10,000 numerical operations in the time it takes to draw one opaque 10 by 10 square. There is also a very strange relationship between drawing time and object size; you would assume that it would be linear. In the 25 fold increase from 10×10 to 50×50 execution time roughly doubled for computer A, then 25 times larger to 250×250 processor time increased by a magnitude of 7 for alpha draws and almost 12 times longer for opaque objects. Computer D showed similar increases in time to render different object sizes. The range of shape bytes for these 6 objects was consistent, values ranged from 23 to 28. Even more baffling is the inconsistency of alpha draw speed. Regardless it is always going to take longer to draw objects that have an alpha channel, previously I had heard that it takes about 50% longer to draw an object with an alpha compared to the same object without an alpha, which seems probable based my results. Relatively, that 50% is a big deal since drawing is so processor intensive. Before you go and redo your entire project’s art assets, there is a simple way to test if alpha drawing is crippling your performance. Setting the opaqueBackground property to a color will render your objects without an alpha (if they had one), just write a function to recursively set this property on every asset attached to the stage.

Comparing vector object draw speed to jpg and png files, 1,000 computations per trial. Run the benchmarks: jpg images, png images.

  10×10 Square Relative Speed Average Std. Dev.
Computer A Alpha Vector 0.1165 0.0097
PNG + 123% 0.2706 0.0074
  Opaque Vector 0.1165 0.0080
  JPG + 126% 0.2638 0.0141
Computer D Alpha Vector 0.0374 0.0078
PNG + 211% 0.1165 0.008
  Opaque Vector 0.032 0
  JPG + 237% 0.1081 0.0038
  50×50 Square Relative Speed Average Std. Dev.
Computer A Alpha Vector 0.147 0.0110
PNG + 126% 0.3321 0.0088
  Opaque Vector 0.137 0.0060
  JPG + 121% 0.3033 0.0113
Computer D Alpha Vector 0.0481 0.004
PNG + 160% 0.125 0
  Opaque Vector 0.047 0
  JPG + 141% 0.1135 0.007
  250×250 Square Relative Speed Average Std. Dev.
Computer A Alpha Vector 0.6489 0.0205
PNG + 112% 1.3749 0.0098
  Opaque Vector 0.3478 0.0119
  JPG + 122% 0.7711 0.0122
Computer D Alpha Vector 0.1677 0.0071
PNG + 78% 0.299 0.0051
  Opaque Vector 0.092 0.0051
  JPG + 120% 0.2021 0.007

I specifically chose to contrast computer A and computer D because they are respectively the oldest and newest computers in the trial and, as you would expect, represent the two extremes of this experiement. It is imporant to keep in mind that these are the simplest vector objects, since render time for vector objects increases with the objects complexity, png and jpg renders are still going to be the faster for most objects. Image files are much larger and are going to have a higher impact on your swf file size, but are particularly appealing for you alpha renders. If you are curious to see your per item swf memory costs go to: File -> publish settings -> flash tab -> generate size report, and the report will be traced to output when you run the application from the Flash editor (you might need to do some scrolling, there is an egregious amount of white space at its end). The more complicated the vector object the longer it takes to draw, how much longer is the subject of the next test.

Complex vector versus PNG speed, 1,000 computations per trial. Run the benchmark: complex vector speed comparison.

  Object Relative Speed Average Std. Dev.
Computer A Vector Complex + 1,123% 11.5323 0.4161
PNG Complex 0.9427 0.0413
Vector Square 0.4969 0.0270

I had a spare vector avatar from a Semiotic Technologies project that measured 100×300, the png file (not that it should matter since all pngs of the same size should take the same amount of time to render) was a snapshot of that vector avatar. The vector soldier was 5,700 shape bytes, the same size vector square was 27 shape bytes, and the png file compressed down to 3,800 bytes. So not only does this object burn CPU time, it uses extra memory as well. Results across the other computers was similar, the low bound took 11 times longer (+1,100%) to draw the vector asset compared to the png. Perhaps the most frightening thing about vector art, is that you could have a very small but complex vector object that is taking as much time to render as an 800×600 image (or conceivably worse). Rendering a complex vector graphic and rendering very large graphics (irrespective of type) are the most expensive things you can do. However, image files are usually memory intensive compared to their vector counterparts. It can be a tough trade off. Generally, I feel that it is easier to transition from vector art to pngs than the opposite; and it is definitely easier to manage the library without all of the extra image file references. Making wise decisions on what type of art assets to use will be your most effective optimization strategy.

Creating a procedurally drawn object versus an object in the library, 10,000 computations per trial. Run the benchmark: library vs. procedural.

  Object Relative Speed Average Std. Dev.
Computer A Library Star 0.5367 0.1156
Library Square 0.5909 0.1421
Procedural Square 0.4952 0.0734

This test did not involve any drawing, just creating the objects, and results concurred across computers. I have always drawn simple objects for menus procedurally, and whenever I have been working with artists usually the first pass for positioning and sizing was done procedurally. I am just faster at editing code than artwork, and fewer objects in the library will decrease the overall size of the swf file. I had always assumed that using the graphics class would be slower than using the same object in the library, if anything it is faster for simple objects. I actually had to run this test several times, the results for the library objects vary widely. It would be interesting to know if how many objects you have in the library effects time, but that is a test someone else can do.

getBounds() compared to getRect(), 100,000 computations per trial. Run the benchmark: rectangle bound functions.

  Object Relative Speed Average Std. Dev.
Computer A 50×50 getRect() 0.3628 0.0628
50×50 getBounds() 0.3627 0.0116
250×250 getBounds() 0.3618 0.0077

The reason for this test is pretty simple, Flash gives us two options for finding the rectangular bounds of an object (one includes strokes and one does not) one had to be faster or they were equally fast. If they are not exactly equal they are close enough in speed to not worry about the difference (getRect() showed a slight improvement in performance on other computers), and while it is a more expensive function, the size of the object has no effect on speed.

Sprite properties and methods, 1,000,000 computations per trial. Run the benchmarks: sprite properties, more sprite properties, sprite methods.

  Operation Relative Speed Average Std. Dev.
Computer A Sprite.x 0.0280 0.0084
Sprite.width + 2,106% 0.5515 0.0075
Sprite.numChildren 0.0162 0.0040
Sprite.alpha 0.0259 0.0073
Sprite.visible 0.0259 0.0073
Sprite.name + 1,186% 0.3215 0.1331
dispatchEvent() + 3,068% 0.7920 0.0552
contains() 0.0275 0.0120
swapChildren() + 1,854% 0.4887 0.0437

I expected that the graphical operations (even though it is a property), to be more expensive than getting a numerical value. I did not expect getting the name or swapping children to be as expensive as they are, and I was hoping that generating an event would be cheaper. The only operation that was pleasantly fast was contains(). Unlike Flash’s math class there are not any savvy alternatives to these properties and methods, but it does show how important it is to not generate extraneous events and other expensive calls. Since these are all methods intrinsic to Flash results were comparable across all computers.

hitTestPoint() versus using a bitmap and getPixel32(), 1,000,000 computations per trial. Run the benchmark: hitTestPoint vs getPixel.

  Operation Relative Speed Average Std. Dev.
Computer A hitTestPoint(x,y, true) + 8,395% 2.141 0.1251
hitTestPoint(x,y, true) w/ caching + 8,395% 2.141 0.3277
BitmapData.getPixel32() 0.0255 0.0077

Flash only provides us with two ways to analyze collisions, hitTestObject and hitTestPoint. hitTestObject returns true if the compared objects axis aligned bounding boxes are overlapping, hitTestPoint has some robustness issues but can give pixel accuracy. As an alternative, you can also draw the object into a BitmapData object and just look at the pixels yourself, and it would be over significantly faster. Computer A had the largest relative time differnce, the other extrema, computer D, was 44 times faster to observe the pixels yourself. This test does not include the initial draw into the BitmapData, clearly that is expensive, and the memory of all of those pixels adds up quickly too. In comparison, hitTestPoint does not cost any memory. However, most of the time that we are concerned with pixel accuracy, we are testing more than just one pixel and the same object repeatedly. If you are testing against the mouse position I would highly suggest switching to mouse events. Personally, after I conducted this test I vowed never to use hitTestPoint again.

There are some other optimizations that I have found in practice, but are not possible to quantify easily. In my experience the most significant of these is setting mouseEnabled and mouseChildren, every object that inherits from interactive object has both of these properties defaulted to true. Thus, all of your objects are generating mouse events regardless if you are listening for mouse events on those objects. Furthermore, these events flood the event system with irrelevant data for the objects you do have listening for events. A close second is overuse of the MovieClip type, if an object only has one frame it can be a Sprite. Also in the top three is overusing onEnterFrame functions, they are very intensive to begin with, then the issue is exacerbated when people do not realize that you have to explicitly remove the listener for them to stop receiving events.

Lessons learned:

  • Limit art on the stage, limit art with alpha rendering, limit animations, be careful with caching art.
  • Replace complex vector art and vector art with transparency with images, keeping the image file memory cost in mind.
  • Draw procedurally when you can.
  • hitTestPoint() is terrible, try to avoid it.
  • When you can, set mouseEnabled and mouseChildren to false.
  • The MovieClip type should only be used if the object has more than one frame, otherwise use Sprite in your ActionScript code, and as the base class when exporting a library object.

Drawing and related functions are so intensive that they will always be the most taxing process on the CPU, and is likely the source of performance problems. Without a doubt drawing and manipulating your artwork should be your primary target when optimizing. Thankfully, a few simple changes to your library or caching some vector graphics can drastically increase your performance. The remaining challenge is in deciding how much artwork and image quality to sacrifice for performance. Where you strike the balance on artwork will determine how much ActionScript you will need to optimize as well as how many computers can play your application without experiencing lags.

Continue to Results

This post is part of a series:
Introduction
Methodology
Operation Speed
Data Structure Speed
Graphics Speed <– You are here
Results

submit to reddit
Delicious Bookmark this on Delicious

1 Comment »

  • Comment by james — August 19, 2009 @ 5:33 am

    1

    Hi Stephen,

    thanks a lot for posting this article, it’s been a great help.

    One thing i was wondering, was whether there’s a better alternative to when an enterFrame handler is needed? The obvious that springs to mind is a Timer set to replicate the frame rate, but this is also generating an object on each frame and going through the event flow, so would it actually offer any improvements? I suppose at least it won’t be bubbling? Anyway, just wondered if you had an opinion on this?

    Thanks again,

    Jame


RSS feed for comments on this post. TrackBack URL

Leave a comment