Author |
Message |
eudoxie
Maniac
Joined: 17 Sep 2012 13:36 Posts: 277 Location: 81.170.128.52
|
It isn't really that strange, since modern java code is compiled to native bytecode on runtime with JIT, utilizing all optimizations available for the host processor. A lot of the slowness of Java comes from the rather clunky object and garbage collection system (that my code avoids as much as possible in the speed-critical parts.)
I do have a multicore processor, but (and I've checked to make sure) the java vm only runs on one core, so it isn't doing some sort of sneaky parallel optimization.
Vectorization is done automatically with -O3 (I compiled the C test code with '-O3 -march=amdfam10'). With some further tweaking, I did manage to cut it down to 3 m 20 secs roughly, but that's still almost a minute slower than Java.
|
19 Jun 2009 19:13 |
|
|
Shaos
Admin
Joined: 08 Jan 2003 23:22 Posts: 22821 Location: Silicon Valley
|
Could you please send me sources of your Java and C benchmarks and I will run it on my Intel Core 2 Duo machine - I don't believe in miracles
|
19 Jun 2009 21:39 |
|
|
eudoxie
Maniac
Joined: 17 Sep 2012 13:36 Posts: 277 Location: 81.170.128.52
|
Actually, I found the problem. A 'volatile' had snuck into the C code from an experiment I did with an inline assembly hack that wasn't worth the added complexity (this volatile of course screwed up optimization).
Now C runs in 2 minutes (java runs in 2m 30s)
Still uploaded the benchmarking code if you want to have a look: http://www.nedopc.org/ternary/bench.tar.gz
On a completely unrelated side note, I tried compiling the java code with gcj (again -O3). Interestingly, that was slower than JIT-compiled Java (3 minutes 15 seconds).
|
19 Jun 2009 22:03 |
|
|
Shaos
Admin
Joined: 08 Jan 2003 23:22 Posts: 22821 Location: Silicon Valley
|
OK, thanks
My marks on Intel Core 2 Duo E4700 2.6GHz (JDK 1.6.0_14 and GCC 4.2.4):
Java 3m 10s
C-O2 2m 32s
C-O3 2m 03s
C-O3+ 1m 47s (-march=native -funroll-loops -fomit-frame-pointer)
C-O3++ 1m 45s (-march=native -funroll-loops -fomit-frame-pointer -fprefetch-loop-arrays)
C-O3+v 1m 43s (-march=native -funroll-loops -fomit-frame-pointer -fprefetch-loop-arrays -ftree-vectorize)
In all cases only 1 core was utilized
P.S. Modern proprietary JIT-compiler is much better than gcj, it's even better than most of commercial java native compilers
|
20 Jun 2009 00:33 |
|
|
eudoxie
Maniac
Joined: 17 Sep 2012 13:36 Posts: 277 Location: 81.170.128.52
|
Hmm, gcc manual says that -fomit-frame-pointer and -ftree-vectorize are both enabled at -O3 automatically, and -fprefetch-loop-arrays should be enabled at all levels but -Os, so they shouldn't really make any difference.
It's peculiar that your Java benchmark is so slow. You get roughly the same C speeds as me, but much slower java speeds (by almost a minute). Did you try running in the server vm as well as client?
|
20 Jun 2009 08:06 |
|
|
Shaos
Admin
Joined: 08 Jan 2003 23:22 Posts: 22821 Location: Silicon Valley
|
I have Slackware 12.2 and standard Java distribution
And you are right about -ftree-vectorize and -fprefetch-loop-arrays, but for some reason it gave me couple of seconds...
|
20 Jun 2009 08:23 |
|
|
eudoxie
Maniac
Joined: 17 Sep 2012 13:36 Posts: 277 Location: 81.170.128.52
|
Heh, we run the same operating system version
A 4 second speed difference is only around 1% on a test that runs for almost 2 minutes. The engineer in me tells me that small a difference is well within the size of the random measurement errors caused by the operating system.
|
20 Jun 2009 10:24 |
|
|
Shaos
Admin
Joined: 08 Jan 2003 23:22 Posts: 22821 Location: Silicon Valley
|
I repeated C-O3+ test 3 times and all of them were exactly 1m 47s
|
20 Jun 2009 14:43 |
|
|
eudoxie
Maniac
Joined: 17 Sep 2012 13:36 Posts: 277 Location: 81.170.128.52
|
Odd...
I've packaged the stuff I've written so far on the Java version of Tunguska, if anyone wants to poke around in the sources. It's not quite functional yet (I have only implemented around half the instruction set), but it's getting there...
Tunguska.zip
|
20 Jun 2009 15:21 |
|
|
Shaos
Admin
Joined: 08 Jan 2003 23:22 Posts: 22821 Location: Silicon Valley
|
| | | | Shaos wrote: OK, thanks My marks on Intel Core 2 Duo E4700 2.6GHz (JDK 1.6.0_14 and GCC 4.2.4): Java 3m 10s C-O2 2m 32s C-O3 2m 03s C-O3+ 1m 47s (-march=native -funroll-loops -fomit-frame-pointer) C-O3++ 1m 45s (-march=native -funroll-loops -fomit-frame-pointer -fprefetch-loop-arrays) C-O3+v 1m 43s (-march=native -funroll-loops -fomit-frame-pointer -fprefetch-loop-arrays -ftree-vectorize) | | | | |
PowerBook G4 1.67GHz MacOS X 10.4.11 with Java 1.5.0_13 and GCC 4.0.0:
Java 7m 55s
C-O2 3m 06s
C-O3 2m 32s
Any additional options didn't help at all (including -mcpu=G4)
|
20 Jun 2009 19:38 |
|
|
hemuman
|
I would love to work on those but, only next month
m away from my system
|
22 Jun 2009 13:29 |
|
|
eudoxie
Maniac
Joined: 17 Sep 2012 13:36 Posts: 277 Location: 81.170.128.52
|
No hurry. It won't be operational for a few more weeks. Right now I'm building a makeshift assembler so that I can begin to actually test the instruction code.
I've got a lot of free time now, though, so I get a lot of work done. It's already almost 2500 lines of code
|
23 Jun 2009 15:36 |
|
|
Shaos
Admin
Joined: 08 Jan 2003 23:22 Posts: 22821 Location: Silicon Valley
|
Is it possible to implement input and output interfaces as abstract as possible to make it easily portable to Android? Google Java doesn't have AWT and regular keyboard/mouse events...
|
23 Jun 2009 18:10 |
|
|
eudoxie
Maniac
Joined: 17 Sep 2012 13:36 Posts: 277 Location: 81.170.128.52
|
That will not be a problem, since I'm writing it as modularized as possible. The only packages I'm importing are from java.util and java.lang.reflect.
|
23 Jun 2009 18:32 |
|
|
Shaos
Admin
Joined: 08 Jan 2003 23:22 Posts: 22821 Location: Silicon Valley
|
|
23 Jun 2009 18:57 |
|
|