De site waar de client gemaakt is :
http://episteme.arstechni.../122097561/m/766004683831

EDIT : Totally unoptimized... Linux only...!!!
http://episteme.arstechni.../122097561/m/766004683831
Het begin is er nu nog de optimalisatiespurchased an 8800 GTX a little while back (after watching Ian Buck's presentation on CUDA {Stanford Univ. EE380 video}).
I initially focused my efforts toward accelerating xvid. I implemented the half-pel and quarter-pel interpolation algorithms and found that the overhead of moving data to and from the GPU was killing the performance gain.
So, I started looking at the motion compensation routines (where xvid spends most of it's time). The current MC code has a large number of conditionals and I was weary to attempt any kind of implementation without having a reasonably good ideal of what all of the conditional paths are for.
I decided to look for an algorithm that has a relatively small kernel and is seriously compute bound. RC5 fit the bill. I started hacking a CUDA core into dnetc on Sat. afternoon and finally got things working smoothly an hour ago.
The CUDA core is totally UN-optimized and still manages to be well over 12x the performance of the next fastest core on my E4300 (stock speed).
I have posted the sources in my mercurial repo.
http://dungeon.darktech.org/hg/dnetc_cuda/
Because I used the public dnetc snapshot, it is not possible to build an official client with this code. Also, I hacked up the configure script, so I doubt it is even sane on any archs other than x86-linux with CUDA and nvcc present. But, if you are an enterprising hacker, have fun with the code.
paul@sr71 ~/code/dnetc_cuda $ ./dnetc -test RC5-72 10
distributed.net client for Linux Copyright 1997-2006, distributed.net
Please visit http://www.distributed.net/ for up-to-date contest information.
dnetc v2.9012-497-CFR-06032022 for Linux (Linux 2.6.20).
Please provide the *entire* version descriptor when submitting bug reports.
The distributed.net bug report pages are at http://www.distributed.net/bugs/
[Mar 12 04:15:42 UTC] Automatic processor type detection did not
recognize the processor (tag: "6547:06F2")
[Mar 12 04:15:42 UTC] RC5-72: using core #10 (CUDA 1-pipe).
[Mar 12 04:15:42 UTC] RC5-72: Test 01 passed: C9:0C0353C04E1FE85-C9:0C0353C04E1FE85
[Mar 12 04:15:42 UTC] RC5-72: Test 02 passed: DE:EE0C6279:BF66F898-DE:EE0C6279:BF66F898
[Mar 12 04:15:42 UTC] RC5-72: Test 03 passed: 0F:556979E7:6C009260-0F:556979E7:6C009260
[Mar 12 04:15:42 UTC] RC5-72: Test 04 passed: 9E8B648C6:00003A3C-9E8B648C6:00003A3C
[Mar 12 04:15:42 UTC] RC5-72: Test 05 passed: C8:B3631100:0000EAF0-C8:B3631100:0000EAF0
[Mar 12 04:15:42 UTC] RC5-72: Test 06 passed: FE:40080000:00006F64-FE:40080000:00006F64
[Mar 12 04:15:42 UTC] RC5-72: Test 07 passed: 28:69000000:0000204D-28:69000000:0000204D
[Mar 12 04:15:42 UTC] RC5-72: Test 08 passed: 6E:00000000:0000172F-6E:00000000:0000172F
[Mar 12 04:15:42 UTC] RC5-72: Test 09 passed: C6:E9386A44:C0F9D107-C6:E9386A44:C0F9D107
[Mar 12 04:15:42 UTC] RC5-72: Test 10 passed: 2B:E01C5B9D65CCAD7-2B:E01C5B9D65CCAD7
[Mar 12 04:15:42 UTC] RC5-72: Test 11 passed: 97:2C0F244D:EFC54E4F-97:2C0F244D:EFC54E4F
[Mar 12 04:15:42 UTC] RC5-72: Test 12 passed: A8:8960B40B:1F46AD1F-A8:8960B40B:1F46AD1F
[Mar 12 04:15:42 UTC] RC5-72: Test 13 passed: B1:FFE95917:B38E4396-B1:FFE95917:B38E4396
[Mar 12 04:15:42 UTC] RC5-72: Test 14 passed: C6:46E7E19D:9CD65C85-C6:46E7E19D:9CD65C85
[Mar 12 04:15:42 UTC] RC5-72: Test 15 passed: E3686400B:7EFB2180-E3686400B:7EFB2180
[Mar 12 04:15:42 UTC] RC5-72: Test 16 passed: 85:EA3678CF:91DB0D2C-85:EA3678CF:91DB0D2C
[Mar 12 04:15:42 UTC] RC5-72: Test 17 passed: D6:BE71026E:348165EE-D6:BE71026E:348165EE
[Mar 12 04:15:42 UTC] RC5-72: Test 18 passed: 5F:71AD1E37:82BC4D50-5F:71AD1E37:82BC4D50
[Mar 12 04:15:42 UTC] RC5-72: Test 19 passed: 11:4134BDB0:175A077F-11:4134BDB0:175A077F
[Mar 12 04:15:42 UTC] RC5-72: Test 20 passed: 94:888FF8CB:282E6E5F-94:888FF8CB:282E6E5F
[Mar 12 04:15:42 UTC] RC5-72: Test 21 passed: D9:48A2E6E4:CD610000-D9:48A2E6E4:CD610000
[Mar 12 04:15:42 UTC] RC5-72: Test 22 passed: E5:71448E830860001-E5:71448E830860001
[Mar 12 04:15:42 UTC] RC5-72: Test 23 passed: 3E:ED6D9F85:A6D70002-3E:ED6D9F85:A6D70002
[Mar 12 04:15:42 UTC] RC5-72: Test 24 passed: 2504F6B0E:16AD0003-2504F6B0E:16AD0003
[Mar 12 04:15:42 UTC] RC5-72: Test 25 passed: 05:45C2E10D:273D0000-05:45C2E10D:273D0000
[Mar 12 04:15:42 UTC] RC5-72: Test 26 passed: 56:30E19DF4:8C460000-56:30E19DF4:8C460000
[Mar 12 04:15:42 UTC] RC5-72: Test 27 passed: 85:3B37FFD3:9F140000-85:3B37FFD3:9F140000
[Mar 12 04:15:42 UTC] RC5-72: Test 28 passed: 80:B75263C5:41660000-80:B75263C5:41660000
[Mar 12 04:15:42 UTC] RC5-72: Test 29 passed: 03:52A1DF428A30000-03:52A1DF428A30000
[Mar 12 04:15:42 UTC] RC5-72: Test 30 passed: 87:23A58F8F5940000-87:23A58F8F5940000
[Mar 12 04:15:42 UTC] RC5-72: Test 31 passed: CC:9661BA34:7604002A-CC:9661BA34:7604002A
[Mar 12 04:15:42 UTC] RC5-72: Test 32 passed: 21:E765D2F6:C6110000-21:E765D2F6:C6110000
[Mar 12 04:15:42 UTC] RC5-72: 32/32 Tests Passed (0.064004 seconds)
paul@sr71 ~/code/dnetc_cuda $ ./dnetc -bench RC5-72
distributed.net client for Linux Copyright 1997-2006, distributed.net
Please visit http://www.distributed.net/ for up-to-date contest information.
dnetc v2.9012-497-CFR-06032022 for Linux (Linux 2.6.20).
Please provide the *entire* version descriptor when submitting bug reports.
The distributed.net bug report pages are at http://www.distributed.net/bugs/
[Mar 12 04:11:47 UTC] Automatic processor type detection did not
recognize the processor (tag: "6547:06F2")
[Mar 12 04:11:47 UTC] RC5-72: using core #0 (SES 1-pipe).
[Mar 12 04:12:07 UTC] RC5-72: Benchmark for core #0 (SES 1-pipe)
0.00:00:17.08 [3,716,277 keys/sec]
[Mar 12 04:12:07 UTC] RC5-72: using core #1 (SES 2-pipe).
[Mar 12 04:12:27 UTC] RC5-72: Benchmark for core #1 (SES 2-pipe)
0.00:00:17.25 [6,228,036 keys/sec]
[Mar 12 04:12:27 UTC] RC5-72: using core #2 (DG 2-pipe).
[Mar 12 04:12:45 UTC] RC5-72: Benchmark for core #2 (DG 2-pipe)
0.00:00:16.59 [4,967,345 keys/sec]
[Mar 12 04:12:45 UTC] RC5-72: using core #3 (DG 3-pipe).
[Mar 12 04:13:05 UTC] RC5-72: Benchmark for core #3 (DG 3-pipe)
0.00:00:16.57 [6,231,719 keys/sec]
[Mar 12 04:13:05 UTC] RC5-72: using core #4 (DG 3-pipe alt).
[Mar 12 04:13:24 UTC] RC5-72: Benchmark for core #4 (DG 3-pipe alt)
0.00:00:17.46 [5,665,622 keys/sec]
[Mar 12 04:13:24 UTC] RC5-72: using core #5 (SS 2-pipe).
[Mar 12 04:13:43 UTC] RC5-72: Benchmark for core #5 (SS 2-pipe)
0.00:00:16.30 [5,274,208 keys/sec]
[Mar 12 04:13:43 UTC] RC5-72: using core #6 (GO 2-pipe).
[Mar 12 04:14:03 UTC] RC5-72: Benchmark for core #6 (GO 2-pipe)
0.00:00:17.11 [6,207,954 keys/sec]
[Mar 12 04:14:03 UTC] RC5-72: using core #7 (SGP 3-pipe).
[Mar 12 04:14:22 UTC] RC5-72: Benchmark for core #7 (SGP 3-pipe)
0.00:00:16.63 [6,567,384 keys/sec]
[Mar 12 04:14:22 UTC] RC5-72: using core #8 (MA 4-pipe).
[Mar 12 04:14:42 UTC] RC5-72: Benchmark for core #8 (MA 4-pipe)
0.00:00:16.95 [5,364,069 keys/sec]
[Mar 12 04:14:42 UTC] RC5-72: using core #9 (MMX 4-pipe).
[Mar 12 04:15:01 UTC] RC5-72: Benchmark for core #9 (MMX 4-pipe)
0.00:00:16.64 [4,298,758 keys/sec]
[Mar 12 04:15:01 UTC] RC5-72: using core #10 (CUDA 1-pipe).
[Mar 12 04:15:19 UTC] RC5-72: Benchmark for core #10 (CUDA 1-pipe)
0.00:00:16.28 [84,343,980 keys/sec]
EDIT : Totally unoptimized... Linux only...!!!
[ Voor 94% gewijzigd door Cpt00kirk op 12-03-2007 08:53 ]