nedoPC.org

Electronics hobbyists community established in 2002
Atom Feed | View unanswered posts | View active topics It is currently 28 Mar 2024 14:55



Reply to topic  [ 41 posts ]  Go to page Previous  1, 2, 3  Next
Tunguska II 
Author Message
Maniac

Joined: 17 Sep 2012 13:36
Posts: 277
Location: 81.170.128.52
Reply with quote
It isn't really that strange, since modern java code is compiled to native bytecode on runtime with JIT, utilizing all optimizations available for the host processor. A lot of the slowness of Java comes from the rather clunky object and garbage collection system (that my code avoids as much as possible in the speed-critical parts.)

I do have a multicore processor, but (and I've checked to make sure) the java vm only runs on one core, so it isn't doing some sort of sneaky parallel optimization.

Vectorization is done automatically with -O3 (I compiled the C test code with '-O3 -march=amdfam10'). With some further tweaking, I did manage to cut it down to 3 m 20 secs roughly, but that's still almost a minute slower than Java.


19 Jun 2009 19:13
Profile
Admin
User avatar

Joined: 08 Jan 2003 23:22
Posts: 22412
Location: Silicon Valley
Reply with quote
Could you please send me sources of your Java and C benchmarks and I will run it on my Intel Core 2 Duo machine - I don't believe in miracles ;)


19 Jun 2009 21:39
Profile WWW
Maniac

Joined: 17 Sep 2012 13:36
Posts: 277
Location: 81.170.128.52
Reply with quote
Actually, I found the problem. A 'volatile' had snuck into the C code from an experiment I did with an inline assembly hack that wasn't worth the added complexity (this volatile of course screwed up optimization).

Now C runs in 2 minutes (java runs in 2m 30s)

Still uploaded the benchmarking code if you want to have a look: http://www.nedopc.org/ternary/bench.tar.gz

On a completely unrelated side note, I tried compiling the java code with gcj (again -O3). Interestingly, that was slower than JIT-compiled Java (3 minutes 15 seconds).


19 Jun 2009 22:03
Profile
Admin
User avatar

Joined: 08 Jan 2003 23:22
Posts: 22412
Location: Silicon Valley
Reply with quote
OK, thanks :)
My marks on Intel Core 2 Duo E4700 2.6GHz (JDK 1.6.0_14 and GCC 4.2.4):

Java 3m 10s
C-O2 2m 32s
C-O3 2m 03s
C-O3+ 1m 47s (-march=native -funroll-loops -fomit-frame-pointer)
C-O3++ 1m 45s (-march=native -funroll-loops -fomit-frame-pointer -fprefetch-loop-arrays)
C-O3+v 1m 43s (-march=native -funroll-loops -fomit-frame-pointer -fprefetch-loop-arrays -ftree-vectorize)

In all cases only 1 core was utilized

P.S. Modern proprietary JIT-compiler is much better than gcj, it's even better than most of commercial java native compilers ;)


20 Jun 2009 00:33
Profile WWW
Maniac

Joined: 17 Sep 2012 13:36
Posts: 277
Location: 81.170.128.52
Reply with quote
Hmm, gcc manual says that -fomit-frame-pointer and -ftree-vectorize are both enabled at -O3 automatically, and -fprefetch-loop-arrays should be enabled at all levels but -Os, so they shouldn't really make any difference.


It's peculiar that your Java benchmark is so slow. You get roughly the same C speeds as me, but much slower java speeds (by almost a minute). Did you try running in the server vm as well as client?


20 Jun 2009 08:06
Profile
Admin
User avatar

Joined: 08 Jan 2003 23:22
Posts: 22412
Location: Silicon Valley
Reply with quote
I have Slackware 12.2 and standard Java distribution

Quote:
-O also turns on -fomit-frame-pointer on machines where doing so does not interfere with debugging.


And you are right about -ftree-vectorize and -fprefetch-loop-arrays, but for some reason it gave me couple of seconds...


20 Jun 2009 08:23
Profile WWW
Maniac

Joined: 17 Sep 2012 13:36
Posts: 277
Location: 81.170.128.52
Reply with quote
Heh, we run the same operating system version :-)

A 4 second speed difference is only around 1% on a test that runs for almost 2 minutes. The engineer in me tells me that small a difference is well within the size of the random measurement errors caused by the operating system.


20 Jun 2009 10:24
Profile
Admin
User avatar

Joined: 08 Jan 2003 23:22
Posts: 22412
Location: Silicon Valley
Reply with quote
I repeated C-O3+ test 3 times and all of them were exactly 1m 47s :)


20 Jun 2009 14:43
Profile WWW
Maniac

Joined: 17 Sep 2012 13:36
Posts: 277
Location: 81.170.128.52
Reply with quote
Odd...

I've packaged the stuff I've written so far on the Java version of Tunguska, if anyone wants to poke around in the sources. It's not quite functional yet (I have only implemented around half the instruction set), but it's getting there...

Tunguska.zip


20 Jun 2009 15:21
Profile
Admin
User avatar

Joined: 08 Jan 2003 23:22
Posts: 22412
Location: Silicon Valley
Reply with quote
Shaos wrote:
OK, thanks :)
My marks on Intel Core 2 Duo E4700 2.6GHz (JDK 1.6.0_14 and GCC 4.2.4):

Java 3m 10s
C-O2 2m 32s
C-O3 2m 03s
C-O3+ 1m 47s (-march=native -funroll-loops -fomit-frame-pointer)
C-O3++ 1m 45s (-march=native -funroll-loops -fomit-frame-pointer -fprefetch-loop-arrays)
C-O3+v 1m 43s (-march=native -funroll-loops -fomit-frame-pointer -fprefetch-loop-arrays -ftree-vectorize)



PowerBook G4 1.67GHz MacOS X 10.4.11 with Java 1.5.0_13 and GCC 4.0.0:

Java 7m 55s
C-O2 3m 06s
C-O3 2m 32s

Any additional options didn't help at all (including -mcpu=G4)


20 Jun 2009 19:38
Profile WWW
Reply with quote
Quote:
by eudoxie on 2009/6/21 3:21:27

Odd...

I've packaged the stuff I've written so far on the Java version of Tunguska, if anyone wants to poke around in the sources. It's not quite functional yet (I have only implemented around half the instruction set), but it's getting there...

Tunguska.zip



I would love to work on those but, only next month :(
m away from my system :(


22 Jun 2009 13:29
Maniac

Joined: 17 Sep 2012 13:36
Posts: 277
Location: 81.170.128.52
Reply with quote
No hurry. It won't be operational for a few more weeks. Right now I'm building a makeshift assembler so that I can begin to actually test the instruction code.

I've got a lot of free time now, though, so I get a lot of work done. It's already almost 2500 lines of code :-)


23 Jun 2009 15:36
Profile
Admin
User avatar

Joined: 08 Jan 2003 23:22
Posts: 22412
Location: Silicon Valley
Reply with quote
eudoxie wrote:
No hurry. It won't be operational for a few more weeks. Right now I'm building a makeshift assembler so that I can begin to actually test the instruction code.

I've got a lot of free time now, though, so I get a lot of work done. It's already almost 2500 lines of code :-)


Is it possible to implement input and output interfaces as abstract as possible to make it easily portable to Android? Google Java doesn't have AWT and regular keyboard/mouse events... ;)


23 Jun 2009 18:10
Profile WWW
Maniac

Joined: 17 Sep 2012 13:36
Posts: 277
Location: 81.170.128.52
Reply with quote
That will not be a problem, since I'm writing it as modularized as possible. The only packages I'm importing are from java.util and java.lang.reflect.


23 Jun 2009 18:32
Profile
Admin
User avatar

Joined: 08 Jan 2003 23:22
Posts: 22412
Location: Silicon Valley
Reply with quote
eudoxie wrote:
That will not be a problem, since I'm writing it as modularized as possible. The only packages I'm importing are from java.util and java.lang.reflect.


http://developer.android.com/reference/ ... mmary.html
http://developer.android.com/reference/ ... mmary.html


23 Jun 2009 18:57
Profile WWW
Display posts from previous:  Sort by  
Reply to topic   [ 41 posts ]  Go to page Previous  1, 2, 3  Next

Who is online

Users browsing this forum: No registered users and 3 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group
Designed by ST Software.