Tunguska II
Moderator: haqreu
- 
				eudoxie
- Maniac
- Posts: 277
- Joined: 17 Sep 2012 13:36
- Location: 81.170.128.52
Re: Tunguska II
It isn't really that strange, since modern java code is compiled to native bytecode on runtime with JIT, utilizing all optimizations available for the host processor. A lot of the slowness of Java comes from the rather clunky object and garbage collection system (that my code avoids as much as possible in the speed-critical parts.)
I do have a multicore processor, but (and I've checked to make sure) the java vm only runs on one core, so it isn't doing some sort of sneaky parallel optimization.
Vectorization is done automatically with -O3 (I compiled the C test code with '-O3 -march=amdfam10'). With some further tweaking, I did manage to cut it down to 3 m 20 secs roughly, but that's still almost a minute slower than Java.
			
			
									
						
										
						I do have a multicore processor, but (and I've checked to make sure) the java vm only runs on one core, so it isn't doing some sort of sneaky parallel optimization.
Vectorization is done automatically with -O3 (I compiled the C test code with '-O3 -march=amdfam10'). With some further tweaking, I did manage to cut it down to 3 m 20 secs roughly, but that's still almost a minute slower than Java.
- 
				Shaos  
- Admin
- Posts: 24379
- Joined: 08 Jan 2003 23:22
- Location: Silicon Valley
Re: Tunguska II
Could you please send me sources of your Java and C benchmarks and I will run it on my Intel Core 2 Duo machine - I don't believe in miracles 
			
			
									
						
										
						
- 
				eudoxie
- Maniac
- Posts: 277
- Joined: 17 Sep 2012 13:36
- Location: 81.170.128.52
Re: Tunguska II
Actually, I found the problem. A 'volatile' had snuck into the C code from an experiment I did with an inline assembly hack that wasn't worth the added complexity (this volatile of course screwed up optimization). 
Now C runs in 2 minutes (java runs in 2m 30s)
Still uploaded the benchmarking code if you want to have a look: http://www.nedopc.org/ternary/bench.tar.gz
On a completely unrelated side note, I tried compiling the java code with gcj (again -O3). Interestingly, that was slower than JIT-compiled Java (3 minutes 15 seconds).
			
			
									
						
										
						Now C runs in 2 minutes (java runs in 2m 30s)
Still uploaded the benchmarking code if you want to have a look: http://www.nedopc.org/ternary/bench.tar.gz
On a completely unrelated side note, I tried compiling the java code with gcj (again -O3). Interestingly, that was slower than JIT-compiled Java (3 minutes 15 seconds).
- 
				Shaos  
- Admin
- Posts: 24379
- Joined: 08 Jan 2003 23:22
- Location: Silicon Valley
Re: Tunguska II
OK, thanks 
My marks on Intel Core 2 Duo E4700 2.6GHz (JDK 1.6.0_14 and GCC 4.2.4):
Java 3m 10s
C-O2 2m 32s
C-O3 2m 03s
C-O3+ 1m 47s (-march=native -funroll-loops -fomit-frame-pointer)
C-O3++ 1m 45s (-march=native -funroll-loops -fomit-frame-pointer -fprefetch-loop-arrays)
C-O3+v 1m 43s (-march=native -funroll-loops -fomit-frame-pointer -fprefetch-loop-arrays -ftree-vectorize)
In all cases only 1 core was utilized
P.S. Modern proprietary JIT-compiler is much better than gcj, it's even better than most of commercial java native compilers
			
			
									
						
										
						
My marks on Intel Core 2 Duo E4700 2.6GHz (JDK 1.6.0_14 and GCC 4.2.4):
Java 3m 10s
C-O2 2m 32s
C-O3 2m 03s
C-O3+ 1m 47s (-march=native -funroll-loops -fomit-frame-pointer)
C-O3++ 1m 45s (-march=native -funroll-loops -fomit-frame-pointer -fprefetch-loop-arrays)
C-O3+v 1m 43s (-march=native -funroll-loops -fomit-frame-pointer -fprefetch-loop-arrays -ftree-vectorize)
In all cases only 1 core was utilized
P.S. Modern proprietary JIT-compiler is much better than gcj, it's even better than most of commercial java native compilers

- 
				eudoxie
- Maniac
- Posts: 277
- Joined: 17 Sep 2012 13:36
- Location: 81.170.128.52
Re: Tunguska II
Hmm, gcc manual says that -fomit-frame-pointer and -ftree-vectorize are both enabled at -O3 automatically, and -fprefetch-loop-arrays should be enabled at all levels but -Os, so they shouldn't really make any difference.
It's peculiar that your Java benchmark is so slow. You get roughly the same C speeds as me, but much slower java speeds (by almost a minute). Did you try running in the server vm as well as client?
			
			
									
						
										
						It's peculiar that your Java benchmark is so slow. You get roughly the same C speeds as me, but much slower java speeds (by almost a minute). Did you try running in the server vm as well as client?
- 
				Shaos  
- Admin
- Posts: 24379
- Joined: 08 Jan 2003 23:22
- Location: Silicon Valley
Re: Tunguska II
I have Slackware 12.2 and standard Java distribution
			
			
									
						
										
						And you are right about -ftree-vectorize and -fprefetch-loop-arrays, but for some reason it gave me couple of seconds...-O also turns on -fomit-frame-pointer on machines where doing so does not interfere with debugging.
- 
				eudoxie
- Maniac
- Posts: 277
- Joined: 17 Sep 2012 13:36
- Location: 81.170.128.52
Re: Tunguska II
Heh, we run the same operating system version 
A 4 second speed difference is only around 1% on a test that runs for almost 2 minutes. The engineer in me tells me that small a difference is well within the size of the random measurement errors caused by the operating system.
			
			
									
						
										
						
A 4 second speed difference is only around 1% on a test that runs for almost 2 minutes. The engineer in me tells me that small a difference is well within the size of the random measurement errors caused by the operating system.
- 
				Shaos  
- Admin
- Posts: 24379
- Joined: 08 Jan 2003 23:22
- Location: Silicon Valley
Re: Tunguska II
I repeated C-O3+ test 3 times and all of them were exactly 1m 47s 
			
			
									
						
										
						
- 
				eudoxie
- Maniac
- Posts: 277
- Joined: 17 Sep 2012 13:36
- Location: 81.170.128.52
Re: Tunguska II
Odd...
I've packaged the stuff I've written so far on the Java version of Tunguska, if anyone wants to poke around in the sources. It's not quite functional yet (I have only implemented around half the instruction set), but it's getting there...
Tunguska.zip
			
			
									
						
										
						I've packaged the stuff I've written so far on the Java version of Tunguska, if anyone wants to poke around in the sources. It's not quite functional yet (I have only implemented around half the instruction set), but it's getting there...
Tunguska.zip
- 
				Shaos  
- Admin
- Posts: 24379
- Joined: 08 Jan 2003 23:22
- Location: Silicon Valley
Re: Tunguska II
PowerBook G4 1.67GHz MacOS X 10.4.11 with Java 1.5.0_13 and GCC 4.0.0:Shaos wrote: OK, thanks
My marks on Intel Core 2 Duo E4700 2.6GHz (JDK 1.6.0_14 and GCC 4.2.4):
Java 3m 10s
C-O2 2m 32s
C-O3 2m 03s
C-O3+ 1m 47s (-march=native -funroll-loops -fomit-frame-pointer)
C-O3++ 1m 45s (-march=native -funroll-loops -fomit-frame-pointer -fprefetch-loop-arrays)
C-O3+v 1m 43s (-march=native -funroll-loops -fomit-frame-pointer -fprefetch-loop-arrays -ftree-vectorize)
Java 7m 55s
C-O2 3m 06s
C-O3 2m 32s
Any additional options didn't help at all (including -mcpu=G4)
- 
				hemuman
Re: Tunguska II
I would love to work on those but, only next monthby eudoxie on 2009/6/21 3:21:27
Odd...
I've packaged the stuff I've written so far on the Java version of Tunguska, if anyone wants to poke around in the sources. It's not quite functional yet (I have only implemented around half the instruction set), but it's getting there...
Tunguska.zip

m away from my system

- 
				eudoxie
- Maniac
- Posts: 277
- Joined: 17 Sep 2012 13:36
- Location: 81.170.128.52
Re: Tunguska II
No hurry. It won't be operational for a few more weeks. Right now I'm building a makeshift assembler so that I can begin to actually test the instruction code. 
I've got a lot of free time now, though, so I get a lot of work done. It's already almost 2500 lines of code
			
			
									
						
										
						I've got a lot of free time now, though, so I get a lot of work done. It's already almost 2500 lines of code

- 
				Shaos  
- Admin
- Posts: 24379
- Joined: 08 Jan 2003 23:22
- Location: Silicon Valley
Re: Tunguska II
Is it possible to implement input and output interfaces as abstract as possible to make it easily portable to Android? Google Java doesn't have AWT and regular keyboard/mouse events...eudoxie wrote: No hurry. It won't be operational for a few more weeks. Right now I'm building a makeshift assembler so that I can begin to actually test the instruction code.
I've got a lot of free time now, though, so I get a lot of work done. It's already almost 2500 lines of code

- 
				eudoxie
- Maniac
- Posts: 277
- Joined: 17 Sep 2012 13:36
- Location: 81.170.128.52
Re: Tunguska II
That will not be a problem, since I'm writing it as modularized as possible. The only packages I'm importing are from java.util and java.lang.reflect.
			
			
									
						
										
						- 
				Shaos  
- Admin
- Posts: 24379
- Joined: 08 Jan 2003 23:22
- Location: Silicon Valley
Re: Tunguska II
http://developer.android.com/reference/ ... mmary.htmleudoxie wrote: That will not be a problem, since I'm writing it as modularized as possible. The only packages I'm importing are from java.util and java.lang.reflect.
http://developer.android.com/reference/ ... mmary.html
 
				