1
0
Fork 0
mirror of https://github.com/ruby/ruby.git synced 2022-11-09 12:17:21 -05:00

hash.c (rb_num_hash_start): avoid pathological behavior

The OR-ing itself is bad for a hash function, and shifting 3 bits
left was not enough to undo the damage done by shifting
(RUBY_SPECIAL_SHIFT+3) bits right.  Experimentally, shifting 16-17
bits seemed to work well in preparing the number for murmur hash.

Add a few more benchmarks to based on bm_hash_shift to ensure
we don't hurt performance too much with tweaks.

I'm pretty confident about this change and commiting it now;
especially since we're still using Murmur behind it (but perhaps
we can update to a newer hash from Murmur...)

[ruby-core:72028] [Feature #11405]

target 0: a (ruby 2.3.0dev (2015-12-11 trunk 53027) [x86_64-linux]) at "/home/ew/rrrr/b/ruby"
target 1: b (ruby 2.3.0dev (2015-12-11 master 53027) [x86_64-linux]) at "/home/ew/ruby/b/ruby"

benchmark results:
minimum results in each 5 measurements.
Execution time (sec)
name	a	b
hash_aref_dsym	0.279	0.276
hash_aref_dsym_long	4.951	4.936
hash_aref_fix	0.281	0.283
hash_aref_flo	0.060	0.060
hash_aref_miss	0.409	0.410
hash_aref_str	0.387	0.385
hash_aref_sym	0.275	0.270
hash_aref_sym_long	0.410	0.411
hash_flatten	0.252	0.237
hash_ident_flo	0.035	0.032
hash_ident_num	0.254	0.251
hash_ident_obj	0.252	0.256
hash_ident_str	0.250	0.252
hash_ident_sym	0.259	0.270
hash_keys	0.267	0.267
hash_shift	0.016	0.015
hash_shift_u16	0.074	0.072
hash_shift_u24	0.071	0.071
hash_shift_u32	0.073	0.072
hash_to_proc	0.008	0.008
hash_values	0.263	0.264

Speedup ratio: compare with the result of `a' (greater is better)
name	b
hash_aref_dsym	1.009
hash_aref_dsym_long	1.003
hash_aref_fix	0.993
hash_aref_flo	1.001
hash_aref_miss	0.996
hash_aref_str	1.006
hash_aref_sym	1.017
hash_aref_sym_long	0.998
hash_flatten	1.061
hash_ident_flo	1.072
hash_ident_num	1.012
hash_ident_obj	0.987
hash_ident_str	0.993
hash_ident_sym	0.959
hash_keys	0.997
hash_shift	1.036
hash_shift_u16	1.039
hash_shift_u24	1.001
hash_shift_u32	1.017
hash_to_proc	1.001
hash_values	0.995

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@53037 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
This commit is contained in:
normal 2015-12-11 08:23:46 +00:00
parent a26d09da85
commit e52b3a0e2c
5 changed files with 37 additions and 2 deletions

View file

@ -1,3 +1,8 @@
Fri Dec 11 16:48:57 2015 Eric Wong <e@80x24.org>
* hash.c (rb_num_hash_start): avoid pathological behavior
[ruby-core:72028] [Feature #11405]
Fri Dec 11 11:58:46 2015 SHIBATA Hiroshi <hsbt@ruby-lang.org>
* NEWS: Mentioned rubygems-2.5.1

View file

@ -0,0 +1,10 @@
h = {}
(16384..65536).each do |i|
h[i] = nil
end
300000.times do
k, v = h.shift
h[k] = v
end

View file

@ -0,0 +1,10 @@
h = {}
(0xff4000..0xffffff).each do |i|
h[i] = nil
end
300000.times do
k, v = h.shift
h[k] = v
end

View file

@ -0,0 +1,10 @@
h = {}
(0xffff4000..0xffffffff).each do |i|
h[i] = nil
end
300000.times do
k, v = h.shift
h[k] = v
end

4
hash.c
View file

@ -191,11 +191,11 @@ rb_num_hash_start(st_index_t n)
* - (n >> (RUBY_SPECIAL_SHIFT+3)) was added to make symbols hash well,
* n.b.: +3 to remove most ID scope, +1 worked well initially, too
* n.b.: +1 (instead of 3) worked well initially, too
* - (n << 3) was finally added to avoid losing bits for fixnums
* - (n << 16) was finally added to avoid losing bits for fixnums
* - avoid expensive modulo instructions, it is currently only
* shifts and bitmask operations.
*/
return (n >> (RUBY_SPECIAL_SHIFT + 3) | (n << 3)) ^ (n >> 3);
return (n >> (RUBY_SPECIAL_SHIFT + 3) ^ (n << 16)) ^ (n >> 3);
}
long