What optimizations can I expect from Dalvik and the Android toolchain?

I’m working on a high-performance Android application (a game), and though I try to code for readability first, I like to keep in the back of my mind a picture of what is happening under the hood. With C++, I’ve developed a fairly good intuition about what the compiler will and won’t do for me. I’m trying to do the same for Java/Android.

Hence this question. I could find very little about this topic on the web. Will the Java compiler, Dalvik converter (dx) and/or JITter (on Android 2.2+) perform optimizations like the following?

  • Method inlining. Under what conditions? private methods can always safely be inlined; will this be done? How about public final methods? Methods on objects of other classes? static methods? What if the runtime type of the object can easily be deduced by the compiler? Should I declare methods as final or static wherever possible?

  • Common subexpression elimination. For example, if I access someObject.someField twice, will the lookup be done only once? What if it’s a call to a getter? What if I use some arithmetic expression twice; will it be evaluated only once? What if I use the result of some expression, whose value I know not to change, as the upper bound of a for loop?

  • Bounds checking on array lookups. Will the toolchain eliminate this in certain conditions, like the archetypical for loop?

  • Value inlining. Will accesses to some public static final int always be inlined? Even if they’re in another class? Even if they’re in another package?

  • Branch prediction. How big an issue is this even? Is branching a large performance hit on a typical Android device?

  • Simple arithmetic. Will someInt * 2 be replaced by someInt << 1?

Etcetera…

Here is Solutions:

We have many solutions to this problem, But we recommend you to use the first solution because it is tested & true solution that will 100% work for you.

Solution 1

This is Ben, one of the engineers working on the JIT @ Google. When Bill and I started on this project, the goal was to deliver a working JIT as soon as possible with minimal impact to resource contention (eg memory footprint, CPU hijacked by the compiler thread) so that it can run on low-end devices as well. Therefore we used a very primitive trace based model. That is, the compilation entity passed to the JIT compiler is a basic block, sometimes as short as a single instruction. Such traces will be stitched together at runtime through a technique called chaining so that the interpreter and code cache lookup won’t be invoked often. To some degree the major source of speedup comes from eliminating the repeated interpreter parsing overhead on frequently executed code paths.

That said, we do have quite a few local optimizations implemented with the Froyo JIT:

  • Register allocation (8 registers for v5te target since the JIT produces Thumb code / 16 registers for v7)
  • Scheduling (eg redundant ld/st elimination for Dalvik registers, load hoisting, store sinking)
  • Redundant null check elimination (if such redundancy can be found in a basic block).
  • Loop formation and optimization for simple counted loops (ie no side-exit in the loop body). For such loops, array accesses based on extended induction variables are optimized so that null and range checks are only performed in the loop prologue.
  • One entry inline cache per virtual callsite w/ dynamic patching at runtime.
  • Peephole optimizations like power-reduction on literal operands for mul/div.

In Gingerbread we added simple inlining for getters/setters. Since the underlying JIT frontend is still simple trace based, if the callee has branches in there it won’t be inlined. But the inline cache mechanism is implemented so that virtual getters/setters can be inlined without problems.

We are currently working on enlarging the compilation scope beyond a simple trace so that the compiler has a larger window for code analysis and optimization. Stay tuned.

Solution 2

I’m sure that my answer will not answer all of your questions but I guess it is a win if it answers even one.

You seem to have a profound knowledge on the subject and know what you want so you might want to do the following. Build an example application containing the aspects you want to investigate on.

Take the APK you get and run it through the APK Tool. Reverse engineering your own code to do just what you are intending is perfectly fine as we know.

The APK Tool will extract and decode your resources and will reverse engineer .dex files to .smali files. You might want to look up the smali project too to get more information about how to read the .smali files and about its limitations.

Again I’m pretty sure that this is not going to answer all of your questions but it might be a good start.

Solution 3

First, let me preface this by saying that I’m not an expert on dalvik, and some of my responses may be wrong. But I have dug into the JIT code in dalvik, and I’m quite familiar with the bytecode that dalvik runs.

  1. Method inlining – as far as I know, this never happens. I’m almost positive it never happens at the bytecode level, and I don’t think it happens at the JIT level currently – although it might in the future.

  2. Common subexpression elimination – I believe this would only be done for subexpressions that don’t use any non-final variables/fields. I’m not entirely positive if it would happen even then. If it is done, I would expect it to be done at the bytecode level, probably not the JIT level.

  3. Bounds checking on array lookups – no clue

  4. Value inlining – As far as I know, yes – they will be inlined in all of those scenarios.

  5. Branch prediction – not sure

  6. Simple arithmetic – not as far as I know

Also, I’d like to mention another avenue of approach to you – dx and dalvik are both open source, so you can dig into them all you like. Although, they’re obviously not small codebases, so would take a fair bit of effort to dig into them at that level

Note: Use and implement solution 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply