Navigation menu
Personal tools
Not logged in
Talk
Contributions
Create account
Log in
Namespaces
WoW
Talk
English
Views
Read
Edit
History
More
Search
Navigation
Home
Random page
Help using wiki
Editions
for WoW
for WildStar
for Solar2D
Documentation
for WoW
for WildStar
Reference
WoW
⦁ FrameXML
⦁ AddOns
⦁ API
⦁ WoW Lua
WildStar
⦁ AddOns
⦁ API
⦁ WildStar Lua
Engine
Tools
What links here
Related changes
Special pages
Page information
Site
Recent Changes
Editing
WoW:StringHash/Analysis
(section)
Jump to navigation
Jump to search
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
= Tests performed and result analysis = <div style="margin-left: 3%;"> Below is an array of tests that both algorithms have been submitted to. For a hash analysis, these tests are fairly basic and the result analysis is somewhat unscientific, but they will do to illustrate the difference between the Java hash and StringHash, which is all this document set out to do. * [[#Performance test|Performance test]] - raw measurement of relative hash speed * [[#Bit distribution test|Bit distribution test]] - how often is each bit of the 32-bit output used? * [[#Collision frequency test|Collision frequency test]] - compare observed collision statistics to expected numbers * [[#Bit transmutation test|Bit transmutation test]] - determine how many bits are changed as a result of 1 input bit changing * [[#Collision finding test|Collision finding test]] - find and count collisions for exhaustive input sets == Performance test == <div style="margin-left: 3%;"> This test simply hashes the string "abcdefghijklmnopqrstuvwxyz" in a loop 20 000 times and clocks the start and stop time. The loop is included in the resulting time. * Java hash execution time: 1.719s * StringHash execution time: 1.313s === Result analysis === <div style="margin-left: 3%;"> StringHash is slightly faster than the Java hash; likely not enough to matter, but certainly enough that the perceived "increased size" of StringHash can be discounted as an argument against it - it's actually faster! The reason is of course that the math.mod() call is relatively expensive compared to extracting characters from a string and performing multiplications. StringHash does 3 extract + multiply per mod() call. The Java hash in its standard implementation has to do one mod() call per character. On the testing computer, a P4 at ~3Ghz, StringHash hashes in the neighborhood of half a megabyte per second; certainly not bad for a scripted language using only standard math library operations! </div> </div> == Bit distribution test == <div style="margin-left: 3%;"> This test measures how many times each bit is 1 for a given set of inputs. The target value is for each bit to be 1 50% of the time, and 0 50% of the time, though there will obviously be deviations from this, which are expected and accepted, up to a certain point. In this test, we feed each hash the same set of inputs: 10000 random strings consisting of "aaaaaaaaa" followed by 15 random bytes. The results show the absolute difference between 50% and the result, i.e. 49.4 and 50.6 will both be reported as "0.6". Only differences of 0.5% or greater are shown (0.5% is in no way a ceiling of the expected difference; it just cuts down on the amount of output). === Inputs 0-255 === <div style="margin-left: 3%;"> {| |- ||Java||StringHash |- valign=top | 2: 0.8% 4: 0.6% 9: 0.6% 11: 0.5% 13: 0.5% 14: 0.5% 15: 1.0% 17: 1.1% 20: 0.5% 21: 0.9% 23: 0.8% 25: 0.9% 28: 0.6% 31: 0.7% 32: 1.2% | 2: 0.5% 3: 0.5% 7: 0.7% 12: 0.6% 15: 0.9% 16: 1.0% 17: 0.9% 20: 0.7% 21: 1.1% 25: 0.6% 29: 0.8% 30: 0.6% 31: 0.7% |} </div> === Inputs 32-127 === <div style="margin-left: 3%;"> {| |- ||Java||StringHash |- valign=top | 2: 0.9% 7: 0.7% 8: 0.6% 9: 0.7% 10: 0.6% 11: 0.6% 12: 1.5% 15: 0.8% 16: 0.5% 18: 0.7% 22: 0.8% 27: 0.5% 29: 0.5% 30: 0.6% 32: 0.5% | 1: 1.0% 3: 0.5% 6: 0.8% 14: 0.6% 19: 0.5% 23: 0.5% 25: 0.7% 28: 0.5% |} </div> === Result analysis === <div style="margin-left: 3%;"> Both algorithms perform roughly as expected. With these samples, StringHash performs a little bit better, but that is to be expected as it mixes more bits; either way, the differences are not really meaningful. The test only attempts to evaluate if all possible result bits are used efficiently without unexpected "holes" in the output. A more thorough test would attempt to find biases in different sets of bits, but that is really more meaningful for [[wikipedia:PRNG|PRNG]] testing and [[wikipedia:cryptographic hash function|secure hashes]] than it is for this type of light string hash. </div> </div> == Collision frequency test == <div style="margin-left: 3%;"> This test takes the result of 20000 random strings consisting of ASCII characters 0-255, and selects 15 bits for testing. The resulting value becomes 0--32767, which, for anything other than a [[wikipedia:perfect hash|perfect hash]], will result in a fair amount of collisions. We expect the frequency of collisions to match a [[wikipedia:Poisson distribution|Poisson distribution]] and compare the test results to it. === Long random string 32-127 bits 1-15 === <div style="margin-left:3%;"> Java SH Expected 1: 10940 10981 10863 2: 6552 6554 6630 3: 2043 1998 2023 4: 388 396 411 5: 65 65 62 6: 12 6 7 </div> === Long random string 0-255 bits 1-15 === <div style="margin-left:3%;"> Java SH Expected 1: 10717 10728 10863 2: 6796 6724 6630 3: 1995 2073 2023 4: 376 420 411 5: 110 55 62 6: 6 7 </div> === Long random string 0-255 bits 6-20 === <div style="margin-left:3%;"> Java SH Expected 1: 10788 10875 10863 2: 6768 6628 6630 3: 1989 1971 2023 4: 400 460 411 5: 55 60 62 6: 6 7 </div> === Long random string 0-255 bits 12-26 === <div style="margin-left:3%;"> Java SH Expected 1: 10917 10791 10863 2: 6630 6612 6630 3: 1881 2049 2023 4: 492 452 411 5: 80 90 62 6: 6 7 </div> === Long random string 0-255 bits 18-32 === <div style="margin-left:3%;"> Java SH Expected 1: 10828 10795 10863 2: 6590 6860 6630 3: 2127 1932 2023 4: 380 360 411 5: 75 35 62 6: 18 7 </div> === Result analysis === <div style="margin-left:3%;"> Both hashes perform roughly as expected without showing too great variations over different bits; there are one or two aberrations in the higher collision counts, but that may just be the result of a too-small sample size. I will not investigate further as this test is only a rough indication of performance. A more complete analysis would include testing smaller modulii, and certainly ones that are not a power of 2. A modulus constructed around series involving 31 and 32 will most likely produce "interesting" results for the Java hash, but that is an expected side effect of its construction. </div> </div> == Bit transmutation test == <div style="margin-left: 3%;"> This test measures how many bits of the hash changes as a result of a single input bit changing, i.e. how close to an [[wikipedia:avalanche effect|avalanche effect]] we're getting. The result is described as the number of times a given number of bits have been seen to change for the entire run. The input starts out as a string of 0s (ASCII NULs), flipping bits on and off one at a time. Then a number of runs are executed starting out with random values. All the below tests work with 13-byte strings, and execute for 50 runs, i.e. one run based on ASCII NULs, and 49 runs with fresh random numbers each time. A good generic hash algorithm will tend toward flipping 50% of the bits for each input bit changed, though that is obviously unattainable. We do however expect a fair hump around 13-17 bits changed, and very low counts toward the extremes. 0 bits flipping may indicate a broken hash algorithm (the hash result didn't change at all as a result of changing the input). === Java, flip all bits in bytes 1 through 13 === <div style="margin-left: 3%;"> 0: 0 1: 173 2: 208 3: 194 4: 188 5: 159 6: 160 7: 150 8: 200 9: 231 10: 228 11: 292 12: 343 13: 476 14: 503 15: 469 16: 418 17: 318 18: 210 19: 143 20: 69 21: 45 22: 11 23: 9 24: 2 25: 1 26: 0 27: 0 28: 0 29: 0 30: 0 31: 0 32: 0 This test does not show 0 bits changing, but the changes we do see are spread quite flat over the spectrum. There is a hump around 13-17 but not as marked as it should be. We get quite a few single bit changes, which is quite expected given the algorithm. The last few bytes in the string do not end up affecting very much of the end result. </div> === StringHash, flip all bits in bytes 1 through 13 === <div style="margin-left: 3%;"> 0: 0 1: 0 2: 0 3: 45 4: 70 5: 77 6: 71 7: 56 8: 45 9: 53 10: 115 11: 203 12: 327 13: 405 14: 533 15: 588 16: 620 17: 593 18: 499 19: 371 20: 259 21: 146 22: 76 23: 32 24: 12 25: 3 26: 1 27: 0 28: 0 29: 0 30: 0 31: 0 32: 0 StringHash has a much more pronounced hump around the 16 bits changed mark. There are no single bit changes, and no two-bit changes in the tests. This is expected as each bit changed is guaranteed to affect at least three bits of the hash, even for the last few bytes of the input. </div> === Java, flip all bits in bytes 12 through 13 === <div style="margin-left: 3%;"> 0: 0 1: 173 2: 208 3: 154 4: 108 5: 75 6: 53 7: 17 8: 6 9: 1 10: 3 11: 2 12: 0 (rest 0) Here is where the weakness of the hash shows the most: changes in the last few bytes affect relatively few bits, thereby increasing chances for collisions. </div> === StringHash, flip all bits in bytes 12 through 13 === <div style="margin-left: 3%;"> 0: 0 1: 0 2: 0 3: 45 4: 70 5: 77 6: 71 7: 56 8: 36 9: 26 10: 36 11: 40 12: 54 13: 51 14: 57 15: 49 16: 42 17: 39 18: 23 19: 15 20: 9 21: 3 22: 1 23: 0 24: 0 (rest 0) While not precisely shining compared to strong hashes, StringHash stirs quite a few more bits over a much larger spectrum. </div> === Result analysis === <div style="margin-left: 3%;"> StringHash outperforms the Java hash by distributing bit changes over more result bits, thus reducing the collision chance. </div> </div> == Collision finding test == <div style="margin-left: 3%;"> This test exhaustively computes all possible values of a given input set and tests for collisions. The size of the input sets is unfortunately constrained by RAM and CPU, but it is believed to be a fair indication of hash strength. === ASCII codes 64-89 (upper case A-Z), 3 bytes === <div style="margin-left: 3%;"> JavaHash 17576 non-colliding values StringHash 17576 non-colliding values </div> === ASCII codes 64-89 (upper case A-Z), 4 bytes === <div style="margin-left: 3%;"> JavaHash 456976 non-colliding values StringHash 456976 non-colliding values </div> === ASCII codes 32-127 (all printable), 2 bytes === <div style="margin-left: 3%;"> JavaHash 62 non-colliding values 62 instances of 2 inputs producing the same hash (124 strings) 2638 instances of 3 inputs producing the same hash (7914 strings) 279 instances of 4 inputs producing the same hash (1116 strings) StringHash 9216 non-colliding values Already here, with only searching all possible combinations of 2 letters of printable ASCII, the Java hash is showing poor results. </div> === ASCII codes 32-127 (all printable), 3 bytes === <div style="margin-left: 3%;"> This is about as exhaustive we can go in Lua without resulting to data compression techniques for the result sets: all possible combinations of 3 letters of printable ASCII for a total of 884736 outputs. JavaHash 62 non-colliding values 62 instances of 2 inputs producing the same hash (124 strings) 1630 instances of 3 inputs producing the same hash (4890 strings) 224 instances of 4 inputs producing the same hash (896 strings) 62 instances of 5 inputs producing the same hash (310 strings) 1630 instances of 6 inputs producing the same hash (9780 strings) 62 instances of 7 inputs producing the same hash (434 strings) 224 instances of 8 inputs producing the same hash (1792 strings) 68606 instances of 9 inputs producing the same hash (617454 strings) 5214 instances of 10 inputs producing the same hash (52140 strings) 5214 instances of 11 inputs producing the same hash (57354 strings) 9672 instances of 12 inputs producing the same hash (116064 strings) 558 instances of 13 inputs producing the same hash (7254 strings) 558 instances of 14 inputs producing the same hash (7812 strings) 558 instances of 15 inputs producing the same hash (8370 strings) StringHash 884736 non-colliding values </div> === ASCII codes 32-127 (all printable), 3 bytes, with prefix === <div style="margin-left: 3%;"> The same input values as the 32-127 x 3 test, except all prefixed with the string "01234567890123456789". This is to test for loss of accuracy. JavaHash 62 non-colliding values 62 instances of 2 inputs producing the same hash (124 strings) 1630 instances of 3 inputs producing the same hash (4890 strings) 224 instances of 4 inputs producing the same hash (896 strings) 62 instances of 5 inputs producing the same hash (310 strings) 1630 instances of 6 inputs producing the same hash (9780 strings) 62 instances of 7 inputs producing the same hash (434 strings) 224 instances of 8 inputs producing the same hash (1792 strings) 68606 instances of 9 inputs producing the same hash (617454 strings) 5214 instances of 10 inputs producing the same hash (52140 strings) 5214 instances of 11 inputs producing the same hash (57354 strings) 9672 instances of 12 inputs producing the same hash (116064 strings) 558 instances of 13 inputs producing the same hash (7254 strings) 558 instances of 14 inputs producing the same hash (7812 strings) 558 instances of 15 inputs producing the same hash (8370 strings) StringHash 884736 non-colliding values </div> === ASCII codes 32-127 (all printable), 3 bytes, with suffix === <div style="margin-left: 3%;"> The same input values as the 32-127 x 3 test, except all suffixed with the string "01234567890123456789". This is to test for loss of "old" information (data from earlier on in the string) in the mixing function. JavaHash 62 non-colliding values 62 instances of 2 inputs producing the same hash (124 strings) 1630 instances of 3 inputs producing the same hash (4890 strings) 224 instances of 4 inputs producing the same hash (896 strings) 62 instances of 5 inputs producing the same hash (310 strings) 1630 instances of 6 inputs producing the same hash (9780 strings) 62 instances of 7 inputs producing the same hash (434 strings) 224 instances of 8 inputs producing the same hash (1792 strings) 68606 instances of 9 inputs producing the same hash (617454 strings) 5214 instances of 10 inputs producing the same hash (52140 strings) 5214 instances of 11 inputs producing the same hash (57354 strings) 9672 instances of 12 inputs producing the same hash (116064 strings) 558 instances of 13 inputs producing the same hash (7254 strings) 558 instances of 14 inputs producing the same hash (7812 strings) 558 instances of 15 inputs producing the same hash (8370 strings) StringHash 884736 non-colliding values </div> === ASCII codes 64-82 (A-S), 5 bytes === <div style="margin-left: 3%;"> The Lua memory churn becomes somewhat ugly here, consuming several hundred megabytes. Even attempting A-Z with the same length brings a machine with 1GB RAM to its knees. JavaHash 2476099 non-colliding values StringHash 2476099 non-colliding values </div> === Result analysis === <div style="margin-left: 3%;"> The Java hash is shown to produce many more collisions even for rather simple inputs. The only cases where it behaves well are the A-Z cases; as long as the input subset is constricted to 32 unique values, it is collision free for 6 bytes (6 * 5 bits = 30 bits), in other words, it is a perfect hash for e.g. short strings consisting of only upper case ASCII letters A-Z. This is an expected property of the algorithm. Though taken with the rather abysmal results for more generic strings above (or, indeed, any sort of string longer than 6 bytes), the Java hash demonstrably makes for a poor choice for general input. </div> </div> </div>
Summary:
Please note that all contributions to AddOn Studio are considered to be released under the Creative Commons Attribution-NonCommercial-ShareAlike (see
AddOn Studio Wiki:Copyrights
for details).
Submissions must be written by you, or copied from a public domain or similar free resource (see
AddOn Studio Wiki:Copyrights
for details).
Cancel
Editing help
(opens in new window)