The Dirty Little Secret About Mobile Benchmarks
September 29, 2012 4 Comments
Last updated: February 28, 2013
Mobile benchmarks are supposed to make it easier to compare smartphones and tablets. In theory, the higher the score, the better the performance. You might have heard the iPhone 5 beats the Samsung Galaxy S III in some benchmarks. That’s true. It’s also true the Galaxy S III beats the iPhone 5 in other benchmarks, but what does this really mean? And more importantly, can benchmarks really tell us which phone is better than another?
8 Important Reasons Why Mobile Benchmarks Don’t Mean Much
- Benchmarks can be gamed – Manufacturers want the highest possible benchmark scores and are willing to cheat to get them. Sometimes this is done by optimizing code so it favors a certain benchmark. In this case, the optimization results in a higher benchmark score, but has no impact on real-world performance. Other times, manufacturers cheat by tweaking drivers to ignore certain things, lower the quality to improve performance or offload processing to other areas. The bottom line is that almost all benchmarks can be gamed. Computer graphics card makers found this out a long time ago and there are many well-documented accounts of Nvidia, AMD and Intel cheating to improve their scores.
- Not all mobile benchmarks are cross-platform — Many mobile benchmarks are Android-only and can’t help you to compare an Android phone to the iPhone 5. Here are a few popular mobile benchmarks which are not available for iOS and other mobile platforms (e.g. AnTuTu Benchmark, Neocore, NenaMark, Quadrant Standard and Vellamo).
- Benchmarks don’t measure real-world performance — Many benchmarks favor graphics performance, and have little bearing on the things real consumers do with their phones. For example, no one watches hundreds of polygons draw on their screens, but that’s exactly the types of things which benchmarks do. Even mobile gamers are unlikely to see increased performance on devices which score higher, because most popular games don’t stress the CPU and GPU the same way benchmarks do. Benchmarks like GLBenchmark 2.5 focus on things like high-level 3D animations. One reviewer recently said, “Apple’s A6 has an edge in polygon performance and that may be important for ultra-high resolution games, but I have yet to see many of those. Most games that I’ve tried on both platforms run in lower resolution with an up-scaling.”
- Opening the Facebook app is faster on the iPhone 4S (skip to 7:49 to see this).
- The iPhone 4S also recognizes speech much faster, although the iPhone 5 returns the results to a query faster (skip to 8:43 to see this). In a second test, the iPhone 4S once again beats the iPhone 5 in speech recognition and almost ties it in returning the answer to a math problem (skip to 9:01 to see this).
- App launches times vary, in some cases iPhone 5 wins, in others the iPhone 4S wins.
- The iPhone 4S beats the iPhone 5 easily when SpeedTest is run (skip to 10:32 to see this).
- The iPhone 5 does load web pages and games faster than the iPhone 4S, but it’s no where near twice as fast (skip to 12:56 to see this).
- Mobile benchmarks are not time-tested — Most mobile benchmarks are relatively new and not as mature as benchmarks like Futuremark’s PC Mark or 3DMark which are used to test Macs and PCs. The best benchmarks are real world, relevant and produce repeatable scores. There is some encouraging news in this area however. I recently read that one of the well-known companies who make PC benchmark software would be releasing a version for mobile devices. It would be nice if someone ported other time-tested software like SPECint to iOS as well.
- Benchmark scores can change after OS updates – Believe it or not, benchmark scores can change after you upgrade your phone to a new operating system. For example, the Samsung Galaxy S III running Android 4.0 gets a Geekbench score of 1560, which the same exact phone running Android 4.1 gets Geekbench score of 1781. That’s a 14% increase.
- Benchmark scores are not always repeatable – In theory, you should be able to run the same benchmark on the same phone and get the same results over and over, but this rarely happens. If you run a benchmark immediately after a reboot and then run the same benchmark during heavy use, you’ll get different results. Even if you reboot every time before you benchmark, you’ll still get different scores due to memory allocation, caching, memory fragmentation, OS house-keeping and other factors. In this video, the person testing several phones gets a Quadrant Standard score on the Nexus 4 that is 4569 on the first run and 4826 on a second run (skip to 14:25 to view).
- Different devices have different apps running in the background – A Nexus phone typically has less apps running in the background than a non-Nexus carrier-issued phone. Even after you close all running apps, there are still apps running in the background that you can’t see. Some apps run automatically to perform housekeeping for a short period and then close. The number and types of apps vary greatly from phone to phone and platform to platform, so this makes objective testing of one phone against another difficult. It’s even possible to fake benchmark scores as in this example.
Case Study: Is the iPhone 5 Really Twice as Fast?
Apple and most tech writers believe the iPhone 5′s A6 processor is twice as fast as the chip in the iPhone 4S. Benchmarks like the one in the above chart support these claims. This video tests these claims.
Results of side-by-side comparisons between the iPhone 5 to the iPhone 4S:
I found a few other comparison videos like this one, which show similar results. As the video says, “Even with games like “Wild Blood” (shown in the video at 5:01) which are optimized for the iPhone 5s screen size, looking closely doesn’t really reveal anything significant in terms of improved detail, highlighting, aliasing or smoother frame-rates.” He goes to say, “the real gains seem to be in the system RAM which does contribute to improved day to day performance of the OS and apps.”
So the bottom line is: Although benchmarks predict the iPhone 5 should be twice as fast as the iPhone 4S, in the real-world tests, the difference between the two is not that large and partially due to the fact that the iPhone 5 has twice as much memory. In some cases, the iPhone 4S is actually faster, because it has less pixels to display on the screen. I want to stress the fact that I’m not saying the iPhone 5 isn’t a great phone. I’m also not questioning whether the iPhone 5′s processors are faster. They clearly are, but they are no where near as fast as the benchmarks say they should be.
The same is true with tests on the iPad 4 which many reviewers say performs at least twice as fast as the iPad 3. However when it comes to actual gameplay, the same reviewer says, “I couldn’t detect any difference at all. Slices, parries and stabs against the monstrous rivals in Infinity Blade II were fast and responsive on both iPads. Blasting pirates in Galaxy on Fire HD 2 was a pixel-perfect exercise on the two tablets, even at maximum resolution. And zombie brains from The Walking Dead spattered just as well on the iPad 3 as the iPad 4.”
You should never make a purchasing decision based on benchmarks alone. Benchmarks do have their place however. Even though they are not perfect, they can still be useful if you understand their limitations. However you shouldn’t read too much into them. They are just one indicator, along with product specs and side-by-side comparisons between two different phones.
A Mobile Benchmark Primer
Copyright 2013 Rick Schwartz. All rights reserved. Linking to this article is encouraged.
Follow me on Twitter @mostlytech1