1
0
Fork 0
mirror of https://github.com/ruby/ruby.git synced 2022-11-09 12:17:21 -05:00
ruby--ruby/test/ruby/test_jit.rb

1064 lines
29 KiB
Ruby
Raw Normal View History

# frozen_string_literal: true
require 'test/unit'
require 'tmpdir'
require_relative '../lib/jit_support'
return if RbConfig::CONFIG["MJIT_SUPPORT"] == 'no'
# Test for --jit option
class TestJIT < Test::Unit::TestCase
include JITSupport
mjit.c: introduce JIT compaction [experimental] When all compilation finishes or the number of JIT-ed code reaches --jit-max-cache, this compacts all generated code to a single .so file and re-loads all methods from it. In the future, it may trigger compaction more frequently and/or limit the maximum times of compaction to prevent unlimited memory usage. So the current behavior is experimental, but at least the performance improvement in this commit won't be removed. === Benchmark === In this benchmark, I'll compare following four conditions: * trunk: r64082 * trunk JIT: r64082 w/ --jit * single-so JIT: This commit w/ --jit * objfcn JIT: This branch https://github.com/k0kubun/ruby/tree/objfcn w/ --jit, which is shinh's objfcn https://github.com/shinh/ruby/tree/objfcn rebased from this commit ``` $ uname -a Linux bionic 4.15.0-29-generic #31-Ubuntu SMP Tue Jul 17 15:39:52 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux ``` * Micro benchmark Using this script https://gist.github.com/k0kubun/10e6d3387c9ab1b134622b2c9d76ef51, calls some amount of different methods that just return `nil`. The following tables are its average duration seconds of 3 measurements. Smaller is better. ** 1 method (seconds) | | trunk | trunk JIT | single-so JIT | objfcn JIT | |:------|:------------------|:------------------|:------------------|:------------------| | Time | 5.576067774333296 | 5.915551971666446 | 5.833641665666619 | 5.845915191666639 | | Ratio | 1.00x | 1.06x | 1.05x | 1.05x | ** 50 methods (seconds) | | trunk | trunk JIT | single-so JIT | objfcn JIT | |:------|:------------------|:------------------|:------------------|:------------------| | Time | 3.1661167996666677| 6.125825928333342 | 4.135432743666665 | 3.750358728333348 | | Ratio | 1.00x | 1.93x | 1.31x | 1.18x | ** 1500 methods (seconds) | | trunk | trunk JIT | single-so JIT | objfcn JIT | |:------|:------------------|:------------------|:------------------|:------------------| | Time | 5.971650823666664 | 19.579182102999994| 10.511108153999961| 10.854653588999932| | Ratio | 1.00x | 3.28x | 1.76x | 1.82x | * Discourse Using the same benchmark strategy as https://bugs.ruby-lang.org/issues/14490 with this branch https://github.com/k0kubun/discourse/commits/benchmark2 forked from discourse v1.8.11 to support running trunk. 1. Run ruby script/bench.rb to warm up profiling database 2. Run RUBYOPT='--jit-verbose=1 --jit-max-cache=10000' RAILS_ENV=profile bin/puma -e production 3. WAIT 5-15 or so minutes for all jitting to stop so we have no cross talk 4. Run ab -n 100 http://localhost:9292/ 5. Wait for all new jitting to finish 6. Run ab -n 100 http://localhost:9292/ ** Response time (ms) Here is the response time milliseconds for each percentile. Skipping 99%ile because it's the same as 100%ile in 100 calls. | | trunk| trunk|single|objfcn| | | | JIT|so JIT| JIT| |:----|:-----|:-----|:-----|:-----| | 50% | 38 | 45 | 41 | 43 | | 66% | 39 | 50 | 44 | 44 | | 75% | 47 | 51 | 46 | 45 | | 80% | 49 | 52 | 47 | 47 | | 90% | 50 | 63 | 50 | 52 | | 95% | 60 | 79 | 52 | 55 | | 98% | 91 | 114 | 91 | 91 | |100% | 97 | 133 | 96 | 99 | ** Ratio (smaller is better) Here is the response time increase ratio against no-JIT trunk's one. | | trunk| trunk|single|objfcn| | | | JIT|so JIT| JIT| |:----|:-----|:-----|:-----|:-----| | 50% | 1.00x| 1.18x| 1.08x| 1.13x| | 66% | 1.00x| 1.28x| 1.13x| 1.13x| | 75% | 1.00x| 1.09x| 0.98x| 0.96x| | 80% | 1.00x| 1.06x| 0.96x| 0.96x| | 90% | 1.00x| 1.26x| 1.00x| 1.04x| | 95% | 1.00x| 1.32x| 0.87x| 0.92x| | 98% | 1.00x| 1.25x| 1.00x| 1.00x| |100% | 1.00x| 1.37x| 0.99x| 1.02x| While 50 and 60 %ile are still worse than no-JIT trunk, 75, 80, 90, 95, 98 and 100% are not slower than that. So now it's a little harder to say "MJIT slows down Rails applications". Probably I can close [Bug #14490] now. Let's start improving it. Close https://github.com/ruby/ruby/pull/1921 git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@64094 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-07-28 12:14:56 -04:00
IGNORABLE_PATTERNS = [
Recompile JIT-ed code without optimization based on inline cache when JIT cancel happens by that. This feature was in the original MJIT implementation by Vladimir, but on merging MJIT to Ruby it was removed for simplification. This commit adds the functionality again for the following benchmark: https://github.com/benchmark-driver/misc/blob/52f05781f65467baf895bf6ba79d172c9b0826fd/concurrent-map/bench.rb (shown float is duration seconds. shorter is better) * Before ``` $ INHERIT=0 ruby -v bench.rb ruby 2.7.0dev (2019-04-13 trunk 67523) [x86_64-linux] -- 1.6507579649914987 $ INHERIT=0 ruby -v --jit bench.rb ruby 2.7.0dev (2019-04-13 trunk 67523) +JIT [x86_64-linux] -- 1.5091587850474752 $ INHERIT=1 ruby -v bench.rb ruby 2.7.0dev (2019-04-13 trunk 67523) [x86_64-linux] -- 1.6124781150138006 $ INHERIT=1 ruby --jit -v bench.rb ruby 2.7.0dev (2019-04-13 trunk 67523) +JIT [x86_64-linux] -- 1.7495657080435194 # <-- this ``` * After ``` $ INHERIT=0 ruby -v bench.rb ruby 2.7.0dev (2019-04-13 trunk 67523) [x86_64-linux] last_commit=Recompile JIT-ed code without optimization -- 1.653559010999743 $ INHERIT=0 ruby --jit -v bench.rb ruby 2.7.0dev (2019-04-13 trunk 67523) +JIT [x86_64-linux] last_commit=Recompile JIT-ed code without optimization -- 1.4738391840364784 $ INHERIT=1 ruby -v bench.rb ruby 2.7.0dev (2019-04-13 trunk 67523) [x86_64-linux] last_commit=Recompile JIT-ed code without optimization -- 1.645227018976584 $ INHERIT=1 ruby --jit -v bench.rb ruby 2.7.0dev (2019-04-13 trunk 67523) +JIT [x86_64-linux] last_commit=Recompile JIT-ed code without optimization -- 1.523708809982054 # <-- this ``` git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67530 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2019-04-14 00:52:02 -04:00
/\AJIT recompile: .+\n\z/,
/\AJIT inline: .+\n\z/,
mjit.c: introduce JIT compaction [experimental] When all compilation finishes or the number of JIT-ed code reaches --jit-max-cache, this compacts all generated code to a single .so file and re-loads all methods from it. In the future, it may trigger compaction more frequently and/or limit the maximum times of compaction to prevent unlimited memory usage. So the current behavior is experimental, but at least the performance improvement in this commit won't be removed. === Benchmark === In this benchmark, I'll compare following four conditions: * trunk: r64082 * trunk JIT: r64082 w/ --jit * single-so JIT: This commit w/ --jit * objfcn JIT: This branch https://github.com/k0kubun/ruby/tree/objfcn w/ --jit, which is shinh's objfcn https://github.com/shinh/ruby/tree/objfcn rebased from this commit ``` $ uname -a Linux bionic 4.15.0-29-generic #31-Ubuntu SMP Tue Jul 17 15:39:52 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux ``` * Micro benchmark Using this script https://gist.github.com/k0kubun/10e6d3387c9ab1b134622b2c9d76ef51, calls some amount of different methods that just return `nil`. The following tables are its average duration seconds of 3 measurements. Smaller is better. ** 1 method (seconds) | | trunk | trunk JIT | single-so JIT | objfcn JIT | |:------|:------------------|:------------------|:------------------|:------------------| | Time | 5.576067774333296 | 5.915551971666446 | 5.833641665666619 | 5.845915191666639 | | Ratio | 1.00x | 1.06x | 1.05x | 1.05x | ** 50 methods (seconds) | | trunk | trunk JIT | single-so JIT | objfcn JIT | |:------|:------------------|:------------------|:------------------|:------------------| | Time | 3.1661167996666677| 6.125825928333342 | 4.135432743666665 | 3.750358728333348 | | Ratio | 1.00x | 1.93x | 1.31x | 1.18x | ** 1500 methods (seconds) | | trunk | trunk JIT | single-so JIT | objfcn JIT | |:------|:------------------|:------------------|:------------------|:------------------| | Time | 5.971650823666664 | 19.579182102999994| 10.511108153999961| 10.854653588999932| | Ratio | 1.00x | 3.28x | 1.76x | 1.82x | * Discourse Using the same benchmark strategy as https://bugs.ruby-lang.org/issues/14490 with this branch https://github.com/k0kubun/discourse/commits/benchmark2 forked from discourse v1.8.11 to support running trunk. 1. Run ruby script/bench.rb to warm up profiling database 2. Run RUBYOPT='--jit-verbose=1 --jit-max-cache=10000' RAILS_ENV=profile bin/puma -e production 3. WAIT 5-15 or so minutes for all jitting to stop so we have no cross talk 4. Run ab -n 100 http://localhost:9292/ 5. Wait for all new jitting to finish 6. Run ab -n 100 http://localhost:9292/ ** Response time (ms) Here is the response time milliseconds for each percentile. Skipping 99%ile because it's the same as 100%ile in 100 calls. | | trunk| trunk|single|objfcn| | | | JIT|so JIT| JIT| |:----|:-----|:-----|:-----|:-----| | 50% | 38 | 45 | 41 | 43 | | 66% | 39 | 50 | 44 | 44 | | 75% | 47 | 51 | 46 | 45 | | 80% | 49 | 52 | 47 | 47 | | 90% | 50 | 63 | 50 | 52 | | 95% | 60 | 79 | 52 | 55 | | 98% | 91 | 114 | 91 | 91 | |100% | 97 | 133 | 96 | 99 | ** Ratio (smaller is better) Here is the response time increase ratio against no-JIT trunk's one. | | trunk| trunk|single|objfcn| | | | JIT|so JIT| JIT| |:----|:-----|:-----|:-----|:-----| | 50% | 1.00x| 1.18x| 1.08x| 1.13x| | 66% | 1.00x| 1.28x| 1.13x| 1.13x| | 75% | 1.00x| 1.09x| 0.98x| 0.96x| | 80% | 1.00x| 1.06x| 0.96x| 0.96x| | 90% | 1.00x| 1.26x| 1.00x| 1.04x| | 95% | 1.00x| 1.32x| 0.87x| 0.92x| | 98% | 1.00x| 1.25x| 1.00x| 1.00x| |100% | 1.00x| 1.37x| 0.99x| 1.02x| While 50 and 60 %ile are still worse than no-JIT trunk, 75, 80, 90, 95, 98 and 100% are not slower than that. So now it's a little harder to say "MJIT slows down Rails applications". Probably I can close [Bug #14490] now. Let's start improving it. Close https://github.com/ruby/ruby/pull/1921 git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@64094 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-07-28 12:14:56 -04:00
/\ASuccessful MJIT finish\n\z/,
]
# trace_* insns are not compiled for now...
TEST_PENDING_INSNS = RubyVM::INSTRUCTION_NAMES.select { |n| n.start_with?('trace_') }.map(&:to_sym) + [
# not supported yet
:defineclass,
:opt_call_c_function,
2019-09-03 08:38:32 -04:00
].each do |insn|
if !RubyVM::INSTRUCTION_NAMES.include?(insn.to_s)
warn "instruction #{insn.inspect} is not defined but included in TestJIT::TEST_PENDING_INSNS"
end
end
def self.untested_insns
@untested_insns ||= (RubyVM::INSTRUCTION_NAMES.map(&:to_sym) - TEST_PENDING_INSNS)
end
def setup
unless JITSupport.supported?
skip 'JIT seems not supported on this platform'
end
# ruby -w -Itest/lib test/ruby/test_jit.rb
if $VERBOSE && !defined?(@@at_exit_hooked)
at_exit do
unless TestJIT.untested_insns.empty?
2019-08-01 06:42:48 -04:00
warn "you may want to add tests for following insns, when you have a chance: #{TestJIT.untested_insns.join(' ')}"
end
end
@@at_exit_hooked = true
end
end
def test_compile_insn_nop
assert_compile_once('nil rescue true', result_inspect: 'nil', insns: %i[nop])
end
def test_compile_insn_local
assert_compile_once("#{<<~"begin;"}\n#{<<~"end;"}", result_inspect: '1', insns: %i[setlocal_WC_0 getlocal_WC_0])
begin;
foo = 1
foo
end;
insns = %i[setlocal getlocal setlocal_WC_0 getlocal_WC_0 setlocal_WC_1 getlocal_WC_1]
assert_eval_with_jit("#{<<~"begin;"}\n#{<<~"end;"}", success_count: 3, stdout: '168', insns: insns)
begin;
def foo
a = 0
[1, 2].each do |i|
a += i
[3, 4].each do |j|
a *= j
end
end
a
end
print foo
end;
end
def test_compile_insn_blockparam
assert_eval_with_jit("#{<<~"begin;"}\n#{<<~"end;"}", stdout: '3', success_count: 2, insns: %i[getblockparam setblockparam])
begin;
def foo(&b)
a = b
b = 2
a.call + 2
end
print foo { 1 }
end;
end
def test_compile_insn_getblockparamproxy
2019-07-14 05:04:14 -04:00
assert_eval_with_jit("#{<<~"begin;"}\n#{<<~"end;"}", stdout: '4', success_count: 3, insns: %i[getblockparamproxy])
begin;
def bar(&b)
b.call
end
def foo(&b)
bar(&b) * bar(&b)
end
print foo { 2 }
end;
end
def test_compile_insn_getspecial
assert_compile_once('$1', result_inspect: 'nil', insns: %i[getspecial])
end
def test_compile_insn_setspecial
assert_compile_once("#{<<~"begin;"}\n#{<<~"end;"}", result_inspect: 'true', insns: %i[setspecial])
begin;
true if nil.nil?..nil.nil?
end;
end
def test_compile_insn_instancevariable
assert_compile_once("#{<<~"begin;"}\n#{<<~"end;"}", result_inspect: '1', insns: %i[getinstancevariable setinstancevariable])
begin;
@foo = 1
@foo
end;
_mjit_compile_getivar.erb: optimize IC-hit getivar by inlining index (and serial to invalidate that) and simplifying the branch by using JIT cancellation. mjit_compile.inc.erb: use the above file mjit_compile.c: copy USE_IC_FOR_IVAR definition. will move this to another shared file later. common.mk: add new dependency test/ruby/test_jit.rb: cover this case === Optcarrot benchmark === ``` $ benchmark-driver benchmark.yml --rbenv '2.0.0::2.0.0-p648;before::before --disable-gems;before+JIT::before --disable-gems --jit;after::after --disable-gems;after+JIT::after --disable-gems --jit' -v --repeat-count 24 2.0.0: ruby 2.0.0p648 (2015-12-16 revision 53162) [x86_64-linux] before: ruby 2.6.0dev (2018-10-14 trunk 65072) [x86_64-linux] before+JIT: ruby 2.6.0dev (2018-10-14 trunk 65072) +JIT [x86_64-linux] after: ruby 2.6.0dev (2018-10-14 trunk 65072) [x86_64-linux] last_commit=_mjit_compile_getivar.erb: optimize IC-hit getivar after+JIT: ruby 2.6.0dev (2018-10-14 trunk 65072) +JIT [x86_64-linux] last_commit=_mjit_compile_getivar.erb: optimize IC-hit getivar Calculating ------------------------------------- 2.0.0 before before+JIT after after+JIT Optcarrot Lan_Master.nes 36.065 53.896 71.565 53.856 84.747 fps Comparison: Optcarrot Lan_Master.nes after+JIT: 84.7 fps before+JIT: 71.6 fps - 1.18x slower before: 53.9 fps - 1.57x slower after: 53.9 fps - 1.57x slower 2.0.0: 36.1 fps - 2.35x slower ``` git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65073 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-10-14 05:24:43 -04:00
# optimized getinstancevariable call
assert_eval_with_jit("#{<<~"begin;"}\n#{<<~"end;"}", stdout: '33', success_count: 1, min_calls: 2)
begin;
class A
def initialize
@a = 1
@b = 2
end
def three
@a + @b
end
end
a = A.new
print(a.three) # set ic
print(a.three) # inlined ic
end;
end
def test_compile_insn_classvariable
assert_eval_with_jit("#{<<~"begin;"}\n#{<<~"end;"}", stdout: '1', success_count: 1, insns: %i[getclassvariable setclassvariable])
begin;
class Foo
def self.foo
@@foo = 1
@@foo
end
end
print Foo.foo
end;
end
def test_compile_insn_constant
assert_compile_once("#{<<~"begin;"}\n#{<<~"end;"}", result_inspect: '1', insns: %i[getconstant setconstant])
begin;
FOO = 1
FOO
end;
end
def test_compile_insn_global
assert_compile_once("#{<<~"begin;"}\n#{<<~"end;"}", result_inspect: '1', insns: %i[getglobal setglobal])
begin;
$foo = 1
$foo
end;
end
def test_compile_insn_putnil
assert_compile_once('nil', result_inspect: 'nil', insns: %i[putnil])
end
def test_compile_insn_putself
assert_eval_with_jit("#{<<~"begin;"}\n#{<<~"end;"}", stdout: 'hello', success_count: 1, insns: %i[putself])
begin;
proc { print "hello" }.call
end;
end
def test_compile_insn_putobject
assert_compile_once('0', result_inspect: '0', insns: %i[putobject_INT2FIX_0_])
assert_compile_once('1', result_inspect: '1', insns: %i[putobject_INT2FIX_1_])
assert_compile_once('2', result_inspect: '2', insns: %i[putobject])
end
def test_compile_insn_definemethod_definesmethod
assert_eval_with_jit("#{<<~"begin;"}\n#{<<~"end;"}", stdout: 'helloworld', success_count: 3, insns: %i[definemethod definesmethod])
begin;
print 1.times.map {
def method_definition
'hello'
end
def self.smethod_definition
'world'
end
method_definition + smethod_definition
}.join
end;
end
def test_compile_insn_putspecialobject
assert_eval_with_jit("#{<<~"begin;"}\n#{<<~"end;"}", stdout: 'a', success_count: 2, insns: %i[putspecialobject])
begin;
print 1.times.map {
def a
'a'
end
alias :b :a
b
}.join
end;
end
def test_compile_insn_putstring_concatstrings_tostring
assert_compile_once('"a#{}b" + "c"', result_inspect: '"abc"', insns: %i[putstring concatstrings tostring])
end
def test_compile_insn_freezestring
assert_eval_with_jit("#{<<~"begin;"}\n#{<<~'end;'}", stdout: 'true', success_count: 1, insns: %i[freezestring])
begin;
# frozen_string_literal: true
print proc { "#{true}".frozen? }.call
end;
end
def test_compile_insn_toregexp
assert_compile_once('/#{true}/ =~ "true"', result_inspect: '0', insns: %i[toregexp])
end
mjit_compile.c: use local variables for stack if catch_except_p is FALSE. If catch_except_p is TRUE, stack values should be on VM's stack when exception is thrown and the JIT-ed frame is re-executed by VM's exception handler. If it's FALSE, the JIT-ed frame won't be re-executed and don't need to keep values on VM's stack. Using local variables allows us to reduce cfp->sp motion. Moving cfp->sp is needed only for insns whose handles_frame? is false. So it improves performance. _mjit_compile_insn.erb: Prepare `stack_size` variable for GET_SP, STACK_ADDR_FROM_TOP, TOPN macros. Share pc and sp motion partial view. Use cancel handler created in mjit_compile.c. _mjit_compile_send.erb: ditto. Also, when iseq->body->catch_except_p is TRUE, this stops to call mjit_exec directly. I described the reason in vm_insnhelper.h's comment for EXEC_EC_CFP. _mjit_compile_pc_and_sp.erb: Shared logic for moving sp and pc. As you can see from thsi file, when status->local_stack_p is TRUE and insn.handles_frame? is false, moving sp is skipped. But if insn.handles_frame? is true, values should be rolled back to VM's stack. common.mk: add dependency for the file _mjit_compile_insn_body.erb: Set sp value before canceling JIT on DISPATCH_ORIGINAL_INSN. Replace GET_SP, STACK_ADDR_FROM_TOP, TOPN macros for the case ocal_stack_p is TRUE and insn.handles_frame? is false. In that case, values are not available on VM's stack and those macros should be replaced. mjit_compile.inc.erb: updated comments of macros which are supported by JIT compiler. All references to `cfp->sp` should be replaced and thus INC_SP, SET_SV, PUSH are no longer supported for now, because they are not used now. vm_exec.h: moved EXEC_EC_CFP definition to vm_insnhelper.h because it's tighly coupled to CALL_METHOD. vm_insnhelper.h: Have revised EXEC_EC_CFP definition moved from vm_exec.h. Now it triggers mjit_exec for VM, and has the guard for catch_except_p on JIT-ed code. See comments for details. CALL_METHOD delegates triggering mjit_exec to EXEC_EC_CFP. insns.def: Stopped using EXEC_EC_CFP for the case we don't want to trigger mjit_exec. Those insns (defineclass, opt_call_c_function) are not supported by JIT and it's safe to use RESTORE_REGS(), NEXT_INSN(). expandarray is changed to pass GET_SP() to replace the macro in _mjit_compile_insn_body.erb. vm_insnhelper.c: change to take sp for the above reason. [close https://github.com/ruby/ruby/pull/1828] This patch resurrects the performance which was attached in [Feature #14235]. * Benchmark Optcarrot (with configuration for benchmark_driver.gem) https://github.com/benchmark-driver/optcarrot $ benchmark-driver benchmark.yml --verbose 1 --rbenv 'before;before+JIT::before,--jit;after;after+JIT::after,--jit' --repeat-count 10 before: ruby 2.6.0dev (2018-03-04 trunk 62652) [x86_64-linux] before+JIT: ruby 2.6.0dev (2018-03-04 trunk 62652) +JIT [x86_64-linux] after: ruby 2.6.0dev (2018-03-04 local-variable.. 62652) [x86_64-linux] last_commit=mjit_compile.c: use local variables for stack after+JIT: ruby 2.6.0dev (2018-03-04 local-variable.. 62652) +JIT [x86_64-linux] last_commit=mjit_compile.c: use local variables for stack Calculating ------------------------------------- before before+JIT after after+JIT optcarrot 53.552 59.680 53.697 63.358 fps Comparison: optcarrot after+JIT: 63.4 fps before+JIT: 59.7 fps - 1.06x slower after: 53.7 fps - 1.18x slower before: 53.6 fps - 1.18x slower git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62655 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-03-04 02:04:40 -05:00
def test_compile_insn_newarray
assert_compile_once("#{<<~"begin;"}\n#{<<~"end;"}", result_inspect: '[1, 2, 3]', insns: %i[newarray])
mjit_compile.c: use local variables for stack if catch_except_p is FALSE. If catch_except_p is TRUE, stack values should be on VM's stack when exception is thrown and the JIT-ed frame is re-executed by VM's exception handler. If it's FALSE, the JIT-ed frame won't be re-executed and don't need to keep values on VM's stack. Using local variables allows us to reduce cfp->sp motion. Moving cfp->sp is needed only for insns whose handles_frame? is false. So it improves performance. _mjit_compile_insn.erb: Prepare `stack_size` variable for GET_SP, STACK_ADDR_FROM_TOP, TOPN macros. Share pc and sp motion partial view. Use cancel handler created in mjit_compile.c. _mjit_compile_send.erb: ditto. Also, when iseq->body->catch_except_p is TRUE, this stops to call mjit_exec directly. I described the reason in vm_insnhelper.h's comment for EXEC_EC_CFP. _mjit_compile_pc_and_sp.erb: Shared logic for moving sp and pc. As you can see from thsi file, when status->local_stack_p is TRUE and insn.handles_frame? is false, moving sp is skipped. But if insn.handles_frame? is true, values should be rolled back to VM's stack. common.mk: add dependency for the file _mjit_compile_insn_body.erb: Set sp value before canceling JIT on DISPATCH_ORIGINAL_INSN. Replace GET_SP, STACK_ADDR_FROM_TOP, TOPN macros for the case ocal_stack_p is TRUE and insn.handles_frame? is false. In that case, values are not available on VM's stack and those macros should be replaced. mjit_compile.inc.erb: updated comments of macros which are supported by JIT compiler. All references to `cfp->sp` should be replaced and thus INC_SP, SET_SV, PUSH are no longer supported for now, because they are not used now. vm_exec.h: moved EXEC_EC_CFP definition to vm_insnhelper.h because it's tighly coupled to CALL_METHOD. vm_insnhelper.h: Have revised EXEC_EC_CFP definition moved from vm_exec.h. Now it triggers mjit_exec for VM, and has the guard for catch_except_p on JIT-ed code. See comments for details. CALL_METHOD delegates triggering mjit_exec to EXEC_EC_CFP. insns.def: Stopped using EXEC_EC_CFP for the case we don't want to trigger mjit_exec. Those insns (defineclass, opt_call_c_function) are not supported by JIT and it's safe to use RESTORE_REGS(), NEXT_INSN(). expandarray is changed to pass GET_SP() to replace the macro in _mjit_compile_insn_body.erb. vm_insnhelper.c: change to take sp for the above reason. [close https://github.com/ruby/ruby/pull/1828] This patch resurrects the performance which was attached in [Feature #14235]. * Benchmark Optcarrot (with configuration for benchmark_driver.gem) https://github.com/benchmark-driver/optcarrot $ benchmark-driver benchmark.yml --verbose 1 --rbenv 'before;before+JIT::before,--jit;after;after+JIT::after,--jit' --repeat-count 10 before: ruby 2.6.0dev (2018-03-04 trunk 62652) [x86_64-linux] before+JIT: ruby 2.6.0dev (2018-03-04 trunk 62652) +JIT [x86_64-linux] after: ruby 2.6.0dev (2018-03-04 local-variable.. 62652) [x86_64-linux] last_commit=mjit_compile.c: use local variables for stack after+JIT: ruby 2.6.0dev (2018-03-04 local-variable.. 62652) +JIT [x86_64-linux] last_commit=mjit_compile.c: use local variables for stack Calculating ------------------------------------- before before+JIT after after+JIT optcarrot 53.552 59.680 53.697 63.358 fps Comparison: optcarrot after+JIT: 63.4 fps before+JIT: 59.7 fps - 1.06x slower after: 53.7 fps - 1.18x slower before: 53.6 fps - 1.18x slower git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62655 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-03-04 02:04:40 -05:00
begin;
a, b, c = 1, 2, 3
[a, b, c]
end;
end
2019-09-02 12:39:00 -04:00
def test_compile_insn_newarraykwsplat
assert_compile_once('[**{ x: 1 }]', result_inspect: '[{:x=>1}]', insns: %i[newarraykwsplat])
end
mjit_compile.c: use local variables for stack if catch_except_p is FALSE. If catch_except_p is TRUE, stack values should be on VM's stack when exception is thrown and the JIT-ed frame is re-executed by VM's exception handler. If it's FALSE, the JIT-ed frame won't be re-executed and don't need to keep values on VM's stack. Using local variables allows us to reduce cfp->sp motion. Moving cfp->sp is needed only for insns whose handles_frame? is false. So it improves performance. _mjit_compile_insn.erb: Prepare `stack_size` variable for GET_SP, STACK_ADDR_FROM_TOP, TOPN macros. Share pc and sp motion partial view. Use cancel handler created in mjit_compile.c. _mjit_compile_send.erb: ditto. Also, when iseq->body->catch_except_p is TRUE, this stops to call mjit_exec directly. I described the reason in vm_insnhelper.h's comment for EXEC_EC_CFP. _mjit_compile_pc_and_sp.erb: Shared logic for moving sp and pc. As you can see from thsi file, when status->local_stack_p is TRUE and insn.handles_frame? is false, moving sp is skipped. But if insn.handles_frame? is true, values should be rolled back to VM's stack. common.mk: add dependency for the file _mjit_compile_insn_body.erb: Set sp value before canceling JIT on DISPATCH_ORIGINAL_INSN. Replace GET_SP, STACK_ADDR_FROM_TOP, TOPN macros for the case ocal_stack_p is TRUE and insn.handles_frame? is false. In that case, values are not available on VM's stack and those macros should be replaced. mjit_compile.inc.erb: updated comments of macros which are supported by JIT compiler. All references to `cfp->sp` should be replaced and thus INC_SP, SET_SV, PUSH are no longer supported for now, because they are not used now. vm_exec.h: moved EXEC_EC_CFP definition to vm_insnhelper.h because it's tighly coupled to CALL_METHOD. vm_insnhelper.h: Have revised EXEC_EC_CFP definition moved from vm_exec.h. Now it triggers mjit_exec for VM, and has the guard for catch_except_p on JIT-ed code. See comments for details. CALL_METHOD delegates triggering mjit_exec to EXEC_EC_CFP. insns.def: Stopped using EXEC_EC_CFP for the case we don't want to trigger mjit_exec. Those insns (defineclass, opt_call_c_function) are not supported by JIT and it's safe to use RESTORE_REGS(), NEXT_INSN(). expandarray is changed to pass GET_SP() to replace the macro in _mjit_compile_insn_body.erb. vm_insnhelper.c: change to take sp for the above reason. [close https://github.com/ruby/ruby/pull/1828] This patch resurrects the performance which was attached in [Feature #14235]. * Benchmark Optcarrot (with configuration for benchmark_driver.gem) https://github.com/benchmark-driver/optcarrot $ benchmark-driver benchmark.yml --verbose 1 --rbenv 'before;before+JIT::before,--jit;after;after+JIT::after,--jit' --repeat-count 10 before: ruby 2.6.0dev (2018-03-04 trunk 62652) [x86_64-linux] before+JIT: ruby 2.6.0dev (2018-03-04 trunk 62652) +JIT [x86_64-linux] after: ruby 2.6.0dev (2018-03-04 local-variable.. 62652) [x86_64-linux] last_commit=mjit_compile.c: use local variables for stack after+JIT: ruby 2.6.0dev (2018-03-04 local-variable.. 62652) +JIT [x86_64-linux] last_commit=mjit_compile.c: use local variables for stack Calculating ------------------------------------- before before+JIT after after+JIT optcarrot 53.552 59.680 53.697 63.358 fps Comparison: optcarrot after+JIT: 63.4 fps before+JIT: 59.7 fps - 1.06x slower after: 53.7 fps - 1.18x slower before: 53.6 fps - 1.18x slower git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62655 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-03-04 02:04:40 -05:00
def test_compile_insn_intern_duparray
assert_compile_once('[:"#{0}"] + [1,2,3]', result_inspect: '[:"0", 1, 2, 3]', insns: %i[intern duparray])
end
def test_compile_insn_expandarray
assert_compile_once('y = [ true, false, nil ]; x, = y; x', result_inspect: 'true', insns: %i[expandarray])
end
def test_compile_insn_concatarray
assert_compile_once('["t", "r", *x = "u", "e"].join', result_inspect: '"true"', insns: %i[concatarray])
end
def test_compile_insn_splatarray
assert_compile_once('[*(1..2)]', result_inspect: '[1, 2]', insns: %i[splatarray])
end
def test_compile_insn_newhash
assert_compile_once('a = 1; { a: a }', result_inspect: '{:a=>1}', insns: %i[newhash])
end
Speed up hash literals by duping This commit replaces the `newhashfromarray` instruction with a `duphash` instruction. Instead of allocating a new hash from an array stored in the Instruction Sequences, store a hash directly in the instruction sequences and dup it on execution. == Instruction sequence changes == ```ruby code = <<-eorby { "foo" => "bar", "baz" => "lol" } eorby insns = RubyVM::InstructionSequence.compile(code, __FILE__, nil, 0, frozen_string_literal: true) puts insns.disasm ``` On Ruby 2.5: ``` == disasm: #<ISeq:<compiled>@test.rb:0 (0,0)-(0,36)>==================== 0000 putobject "foo" 0002 putobject "bar" 0004 putobject "baz" 0006 putobject "lol" 0008 newhash 4 0010 leave ``` Ruby 2.6@r66174 3b6321083a2e3525da3b34d08a0b68bac094bd7f: ``` $ ./ruby test.rb == disasm: #<ISeq:<compiled>@test.rb:0 (0,0)-(0,36)> (catch: FALSE) 0000 newhashfromarray 2, ["foo", "bar", "baz", "lol"] 0003 leave ``` Ruby 2.6 + This commit: ``` $ ./ruby test.rb == disasm: #<ISeq:<compiled>@test.rb:0 (0,0)-(0,36)> (catch: FALSE) 0000 duphash {"foo"=>"bar", "baz"=>"lol"} 0002 leave ``` == Benchmark Results == Compared to 2.5.3: ``` $ make benchmark ITEM=hash_literal_small COMPARE_RUBY=/Users/aaron/.rbenv/versions/2.5.3/bin/ruby generating known_errors.inc known_errors.inc unchanged ./revision.h unchanged /Users/aaron/.rbenv/shims/ruby --disable=gems -rrubygems -I./benchmark/lib ./benchmark/benchmark-driver/exe/benchmark-driver \ --executables="compare-ruby::/Users/aaron/.rbenv/versions/2.5.3/bin/ruby -I.ext/common --disable-gem" \ --executables="built-ruby::./miniruby -I./lib -I. -I.ext/common -r./prelude --disable-gem" \ $(find ./benchmark -maxdepth 1 -name '*hash_literal_small*.yml' -o -name '*hash_literal_small*.rb' | sort) Calculating ------------------------------------- compare-ruby built-ruby hash_literal_small2 1.498 1.877 i/s - 1.000 times in 0.667581s 0.532656s hash_literal_small4 1.197 1.642 i/s - 1.000 times in 0.835375s 0.609160s hash_literal_small8 0.620 1.215 i/s - 1.000 times in 1.611638s 0.823090s Comparison: hash_literal_small2 built-ruby: 1.9 i/s compare-ruby: 1.5 i/s - 1.25x slower hash_literal_small4 built-ruby: 1.6 i/s compare-ruby: 1.2 i/s - 1.37x slower hash_literal_small8 built-ruby: 1.2 i/s compare-ruby: 0.6 i/s - 1.96x slower ``` Compared to r66255 ``` $ make benchmark ITEM=hash_literal_small COMPARE_RUBY=/Users/aaron/.rbenv/versions/ruby-trunk/bin/ruby generating known_errors.inc known_errors.inc unchanged ./revision.h unchanged /Users/aaron/.rbenv/shims/ruby --disable=gems -rrubygems -I./benchmark/lib ./benchmark/benchmark-driver/exe/benchmark-driver \ --executables="compare-ruby::/Users/aaron/.rbenv/versions/ruby-trunk/bin/ruby -I.ext/common --disable-gem" \ --executables="built-ruby::./miniruby -I./lib -I. -I.ext/common -r./prelude --disable-gem" \ $(find ./benchmark -maxdepth 1 -name '*hash_literal_small*.yml' -o -name '*hash_literal_small*.rb' | sort) Calculating ------------------------------------- compare-ruby built-ruby hash_literal_small2 1.567 1.831 i/s - 1.000 times in 0.638056s 0.546039s hash_literal_small4 1.298 1.652 i/s - 1.000 times in 0.770214s 0.605182s hash_literal_small8 0.873 1.216 i/s - 1.000 times in 1.145304s 0.822047s Comparison: hash_literal_small2 built-ruby: 1.8 i/s compare-ruby: 1.6 i/s - 1.17x slower hash_literal_small4 built-ruby: 1.7 i/s compare-ruby: 1.3 i/s - 1.27x slower hash_literal_small8 built-ruby: 1.2 i/s compare-ruby: 0.9 i/s - 1.39x slower ``` git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66258 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-12-06 13:28:21 -05:00
def test_compile_insn_duphash
assert_compile_once('{ a: 1 }', result_inspect: '{:a=>1}', insns: %i[duphash])
end
def test_compile_insn_newrange
assert_compile_once('a = 1; 0..a', result_inspect: '0..1', insns: %i[newrange])
end
def test_compile_insn_pop
assert_compile_once("#{<<~"begin;"}\n#{<<~"end;"}", result_inspect: '1', insns: %i[pop])
begin;
a = false
b = 1
a || b
end;
end
def test_compile_insn_dup
assert_compile_once("#{<<~"begin;"}\n#{<<~"end;"}", result_inspect: '3', insns: %i[dup])
begin;
a = 1
a&.+(2)
end;
end
def test_compile_insn_dupn
assert_compile_once("#{<<~"begin;"}\n#{<<~"end;"}", result_inspect: 'true', insns: %i[dupn])
begin;
klass = Class.new
klass::X ||= true
end;
end
def test_compile_insn_swap_topn
assert_compile_once('{}["true"] = true', result_inspect: 'true', insns: %i[swap topn])
end
def test_compile_insn_reverse
assert_compile_once('q, (w, e), r = 1, [2, 3], 4; [q, w, e, r]', result_inspect: '[1, 2, 3, 4]', insns: %i[reverse])
end
def test_compile_insn_reput
skip "write test"
end
def test_compile_insn_setn
assert_compile_once('[nil][0] = 1', result_inspect: '1', insns: %i[setn])
end
def test_compile_insn_adjuststack
assert_compile_once("#{<<~"begin;"}\n#{<<~"end;"}", result_inspect: 'true', insns: %i[adjuststack])
begin;
x = [true]
x[0] ||= nil
x[0]
end;
end
def test_compile_insn_defined
assert_compile_once('defined?(a)', result_inspect: 'nil', insns: %i[defined])
end
def test_compile_insn_checkkeyword
assert_eval_with_jit("#{<<~"begin;"}\n#{<<~"end;"}", stdout: 'true', success_count: 1, insns: %i[checkkeyword])
begin;
def test(x: rand)
x
end
print test(x: true)
end;
end
def test_compile_insn_tracecoverage
skip "write test"
end
def test_compile_insn_defineclass
skip "support this in mjit_compile (low priority)"
end
def test_compile_insn_send
assert_eval_with_jit("#{<<~"begin;"}\n#{<<~"end;"}", stdout: '1', success_count: 2, insns: %i[send])
begin;
print proc { yield_self { 1 } }.call
end;
end
def test_compile_insn_opt_str_freeze
assert_compile_once("#{<<~"begin;"}\n#{<<~"end;"}", result_inspect: '"foo"', insns: %i[opt_str_freeze])
begin;
'foo'.freeze
end;
end
def test_compile_insn_opt_nil_p
assert_compile_once("#{<<~"begin;"}\n#{<<~"end;"}", result_inspect: 'false', insns: %i[opt_nil_p])
begin;
nil.nil?.nil?
end;
end
def test_compile_insn_opt_str_uminus
assert_compile_once("#{<<~"begin;"}\n#{<<~"end;"}", result_inspect: '"bar"', insns: %i[opt_str_uminus])
begin;
-'bar'
end;
end
def test_compile_insn_opt_newarray_max
assert_compile_once("#{<<~"begin;"}\n#{<<~"end;"}", result_inspect: '2', insns: %i[opt_newarray_max])
begin;
a = 1
b = 2
[a, b].max
end;
end
def test_compile_insn_opt_newarray_min
assert_compile_once("#{<<~"begin;"}\n#{<<~"end;"}", result_inspect: '1', insns: %i[opt_newarray_min])
begin;
a = 1
b = 2
[a, b].min
end;
end
def test_compile_insn_opt_send_without_block
assert_compile_once('print', result_inspect: 'nil', insns: %i[opt_send_without_block])
end
def test_compile_insn_invokesuper
assert_eval_with_jit("#{<<~"begin;"}\n#{<<~"end;"}", stdout: '3', success_count: 4, insns: %i[invokesuper])
begin;
mod = Module.new {
def test
super + 2
end
}
klass = Class.new {
prepend mod
def test
1
end
}
print klass.new.test
end;
end
def test_compile_insn_invokeblock_leave
assert_eval_with_jit("#{<<~"begin;"}\n#{<<~"end;"}", stdout: '2', success_count: 2, insns: %i[invokeblock leave])
begin;
def foo
yield
end
print foo { 2 }
end;
end
def test_compile_insn_throw
assert_eval_with_jit("#{<<~"begin;"}\n#{<<~"end;"}", stdout: '4', success_count: 2, insns: %i[throw])
begin;
def test
proc do
if 1+1 == 1
return 3
else
return 4
end
5
end.call
end
print test
end;
end
def test_compile_insn_jump_branchif
assert_compile_once("#{<<~"begin;"}\n#{<<~'end;'}", result_inspect: 'nil', insns: %i[jump branchif])
begin;
a = false
1 + 1 while a
end;
end
def test_compile_insn_branchunless
assert_compile_once("#{<<~"begin;"}\n#{<<~'end;'}", result_inspect: '1', insns: %i[branchunless])
begin;
a = true
if a
1
else
2
end
end;
end
def test_compile_insn_branchnil
assert_compile_once("#{<<~"begin;"}\n#{<<~'end;'}", result_inspect: '3', insns: %i[branchnil])
begin;
a = 2
a&.+(1)
end;
end
def test_compile_insn_checktype
assert_compile_once("#{<<~"begin;"}\n#{<<~'end;'}", result_inspect: '"42"', insns: %i[checktype])
begin;
a = '2'
"4#{a}"
end;
end
def test_compile_insn_methodref
assert_compile_once("#{<<~"begin;"}\n#{<<~'end;'}", result_inspect: '"main"', insns: %i[methodref])
begin;
self.:inspect.call
end;
end
def test_compile_insn_inlinecache
assert_compile_once('Struct', result_inspect: 'Struct', insns: %i[opt_getinlinecache opt_setinlinecache])
end
def test_compile_insn_once
assert_compile_once('/#{true}/o =~ "true" && $~.to_a', result_inspect: '["true"]', insns: %i[once])
end
def test_compile_insn_checkmatch_opt_case_dispatch
assert_compile_once("#{<<~"begin;"}\n#{<<~"end;"}", result_inspect: '"world"', insns: %i[checkmatch opt_case_dispatch])
begin;
case 'hello'
when 'hello'
'world'
end
end;
end
def test_compile_insn_opt_calc
assert_compile_once('4 + 2 - ((2 * 3 / 2) % 2)', result_inspect: '5', insns: %i[opt_plus opt_minus opt_mult opt_div opt_mod])
assert_compile_once('4.0 + 2.0 - ((2.0 * 3.0 / 2.0) % 2.0)', result_inspect: '5.0', insns: %i[opt_plus opt_minus opt_mult opt_div opt_mod])
assert_compile_once('4 + 2', result_inspect: '6')
end
def test_compile_insn_opt_cmp
assert_compile_once('(1 == 1) && (1 != 2)', result_inspect: 'true', insns: %i[opt_eq opt_neq])
end
def test_compile_insn_opt_rel
assert_compile_once('1 < 2 && 1 <= 1 && 2 > 1 && 1 >= 1', result_inspect: 'true', insns: %i[opt_lt opt_le opt_gt opt_ge])
end
def test_compile_insn_opt_ltlt
assert_compile_once('[1] << 2', result_inspect: '[1, 2]', insns: %i[opt_ltlt])
end
def test_compile_insn_opt_and
assert_compile_once('1 & 3', result_inspect: '1', insns: %i[opt_and])
end
def test_compile_insn_opt_or
assert_compile_once('1 | 3', result_inspect: '3', insns: %i[opt_or])
end
mjit_compile.inc.erb: replace opt_key insn with opt_send_without_block insn if call cache has valid ISeq. If the receiver is not optimized target of opt_key (i.e. Hash or Array), it triggers JIT cancel and it would be slow. This change allows JIT to drop the check for Hash/Array and continue to execute JIT even if the receiver is not Hash or Array. See the following benchmark results. It's not improved so much, but it would be effective when we achieve Ruby method inlining in _mjit_compile_send.erb. * Micro benchmark Given the following bench.rb, ``` class HashWithIndifferentAccess < Hash def []=(key, value) super(key.to_s, value) end def [](key) super(key.to_s) end end indhash = HashWithIndifferentAccess.new indhash[:foo] = 'bar' key = 'foo' 100000000.times do indhash[key] end ``` ** before ``` $ time ./ruby --disable-gems --jit-verbose=1 /tmp/bench.rb JIT success (31.4ms): block in <main>@/tmp/bench.rb:15 -> /tmp/_ruby_mjit_p18206u0.c JIT success (669.3ms): []@/tmp/bench.rb:6 -> /tmp/_ruby_mjit_p18206u1.c Successful MJIT finish ./ruby --disable-gems --jit-verbose=1 /tmp/bench.rb 12.21s user 0.04s system 107% cpu 11.394 total ``` ** after ``` $ time ./ruby --disable-gems --jit-verbose=1 /tmp/bench.rb JIT success (41.0ms): block in <main>@/tmp/bench.rb:15 -> /tmp/_ruby_mjit_p17293u0.c JIT success (679.0ms): []@/tmp/bench.rb:6 -> /tmp/_ruby_mjit_p17293u1.c Successful MJIT finish ./ruby --disable-gems --jit-verbose=1 /tmp/bench.rb 11.54s user 0.06s system 108% cpu 10.726 total ``` The execution time is shortened. * optcarrot benchmark Optcarrot has no room to be improved by this change. Almost nothing is changed. fps: 59.54 (before) -> 59.51 (after) * discourse benchmark I expected this to be improved a little, but it isn't too. ** before (JIT) ``` categories_admin: 50: 12 75: 13 90: 14 99: 22 home_admin: 50: 12 75: 13 90: 16 99: 22 topic_admin: 50: 12 75: 13 90: 15 99: 21 categories: 50: 18 75: 19 90: 23 99: 27 home: 50: 3 75: 4 90: 4 99: 12 topic: 50: 11 75: 11 90: 14 99: 20 ``` ** after (JIT) ``` categories_admin: 50: 12 75: 12 90: 16 99: 24 home_admin: 50: 12 75: 12 90: 14 99: 21 topic_admin: 50: 12 75: 13 90: 16 99: 21 categories: 50: 17 75: 18 90: 23 99: 32 home: 50: 3 75: 4 90: 4 99: 10 topic: 50: 11 75: 12 90: 13 99: 20 ``` git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62398 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-02-13 10:58:38 -05:00
def test_compile_insn_opt_aref
# optimized call (optimized JIT) -> send call
assert_eval_with_jit("#{<<~"begin;"}\n#{<<~"end;"}", stdout: '21', success_count: 2, min_calls: 1, insns: %i[opt_aref])
mjit_compile.inc.erb: replace opt_key insn with opt_send_without_block insn if call cache has valid ISeq. If the receiver is not optimized target of opt_key (i.e. Hash or Array), it triggers JIT cancel and it would be slow. This change allows JIT to drop the check for Hash/Array and continue to execute JIT even if the receiver is not Hash or Array. See the following benchmark results. It's not improved so much, but it would be effective when we achieve Ruby method inlining in _mjit_compile_send.erb. * Micro benchmark Given the following bench.rb, ``` class HashWithIndifferentAccess < Hash def []=(key, value) super(key.to_s, value) end def [](key) super(key.to_s) end end indhash = HashWithIndifferentAccess.new indhash[:foo] = 'bar' key = 'foo' 100000000.times do indhash[key] end ``` ** before ``` $ time ./ruby --disable-gems --jit-verbose=1 /tmp/bench.rb JIT success (31.4ms): block in <main>@/tmp/bench.rb:15 -> /tmp/_ruby_mjit_p18206u0.c JIT success (669.3ms): []@/tmp/bench.rb:6 -> /tmp/_ruby_mjit_p18206u1.c Successful MJIT finish ./ruby --disable-gems --jit-verbose=1 /tmp/bench.rb 12.21s user 0.04s system 107% cpu 11.394 total ``` ** after ``` $ time ./ruby --disable-gems --jit-verbose=1 /tmp/bench.rb JIT success (41.0ms): block in <main>@/tmp/bench.rb:15 -> /tmp/_ruby_mjit_p17293u0.c JIT success (679.0ms): []@/tmp/bench.rb:6 -> /tmp/_ruby_mjit_p17293u1.c Successful MJIT finish ./ruby --disable-gems --jit-verbose=1 /tmp/bench.rb 11.54s user 0.06s system 108% cpu 10.726 total ``` The execution time is shortened. * optcarrot benchmark Optcarrot has no room to be improved by this change. Almost nothing is changed. fps: 59.54 (before) -> 59.51 (after) * discourse benchmark I expected this to be improved a little, but it isn't too. ** before (JIT) ``` categories_admin: 50: 12 75: 13 90: 14 99: 22 home_admin: 50: 12 75: 13 90: 16 99: 22 topic_admin: 50: 12 75: 13 90: 15 99: 21 categories: 50: 18 75: 19 90: 23 99: 27 home: 50: 3 75: 4 90: 4 99: 12 topic: 50: 11 75: 11 90: 14 99: 20 ``` ** after (JIT) ``` categories_admin: 50: 12 75: 12 90: 16 99: 24 home_admin: 50: 12 75: 12 90: 14 99: 21 topic_admin: 50: 12 75: 13 90: 16 99: 21 categories: 50: 17 75: 18 90: 23 99: 32 home: 50: 3 75: 4 90: 4 99: 10 topic: 50: 11 75: 12 90: 13 99: 20 ``` git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62398 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-02-13 10:58:38 -05:00
begin;
obj = Object.new
def obj.[](h)
h
end
block = proc { |h| h[1] }
print block.call({ 1 => 2 })
print block.call(obj)
end;
# send call -> optimized call (send JIT) -> optimized call
Recompile JIT-ed code without optimization based on inline cache when JIT cancel happens by that. This feature was in the original MJIT implementation by Vladimir, but on merging MJIT to Ruby it was removed for simplification. This commit adds the functionality again for the following benchmark: https://github.com/benchmark-driver/misc/blob/52f05781f65467baf895bf6ba79d172c9b0826fd/concurrent-map/bench.rb (shown float is duration seconds. shorter is better) * Before ``` $ INHERIT=0 ruby -v bench.rb ruby 2.7.0dev (2019-04-13 trunk 67523) [x86_64-linux] -- 1.6507579649914987 $ INHERIT=0 ruby -v --jit bench.rb ruby 2.7.0dev (2019-04-13 trunk 67523) +JIT [x86_64-linux] -- 1.5091587850474752 $ INHERIT=1 ruby -v bench.rb ruby 2.7.0dev (2019-04-13 trunk 67523) [x86_64-linux] -- 1.6124781150138006 $ INHERIT=1 ruby --jit -v bench.rb ruby 2.7.0dev (2019-04-13 trunk 67523) +JIT [x86_64-linux] -- 1.7495657080435194 # <-- this ``` * After ``` $ INHERIT=0 ruby -v bench.rb ruby 2.7.0dev (2019-04-13 trunk 67523) [x86_64-linux] last_commit=Recompile JIT-ed code without optimization -- 1.653559010999743 $ INHERIT=0 ruby --jit -v bench.rb ruby 2.7.0dev (2019-04-13 trunk 67523) +JIT [x86_64-linux] last_commit=Recompile JIT-ed code without optimization -- 1.4738391840364784 $ INHERIT=1 ruby -v bench.rb ruby 2.7.0dev (2019-04-13 trunk 67523) [x86_64-linux] last_commit=Recompile JIT-ed code without optimization -- 1.645227018976584 $ INHERIT=1 ruby --jit -v bench.rb ruby 2.7.0dev (2019-04-13 trunk 67523) +JIT [x86_64-linux] last_commit=Recompile JIT-ed code without optimization -- 1.523708809982054 # <-- this ``` git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67530 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2019-04-14 00:52:02 -04:00
assert_eval_with_jit("#{<<~"begin;"}\n#{<<~"end;"}", stdout: '122', success_count: 2, min_calls: 2)
mjit_compile.inc.erb: replace opt_key insn with opt_send_without_block insn if call cache has valid ISeq. If the receiver is not optimized target of opt_key (i.e. Hash or Array), it triggers JIT cancel and it would be slow. This change allows JIT to drop the check for Hash/Array and continue to execute JIT even if the receiver is not Hash or Array. See the following benchmark results. It's not improved so much, but it would be effective when we achieve Ruby method inlining in _mjit_compile_send.erb. * Micro benchmark Given the following bench.rb, ``` class HashWithIndifferentAccess < Hash def []=(key, value) super(key.to_s, value) end def [](key) super(key.to_s) end end indhash = HashWithIndifferentAccess.new indhash[:foo] = 'bar' key = 'foo' 100000000.times do indhash[key] end ``` ** before ``` $ time ./ruby --disable-gems --jit-verbose=1 /tmp/bench.rb JIT success (31.4ms): block in <main>@/tmp/bench.rb:15 -> /tmp/_ruby_mjit_p18206u0.c JIT success (669.3ms): []@/tmp/bench.rb:6 -> /tmp/_ruby_mjit_p18206u1.c Successful MJIT finish ./ruby --disable-gems --jit-verbose=1 /tmp/bench.rb 12.21s user 0.04s system 107% cpu 11.394 total ``` ** after ``` $ time ./ruby --disable-gems --jit-verbose=1 /tmp/bench.rb JIT success (41.0ms): block in <main>@/tmp/bench.rb:15 -> /tmp/_ruby_mjit_p17293u0.c JIT success (679.0ms): []@/tmp/bench.rb:6 -> /tmp/_ruby_mjit_p17293u1.c Successful MJIT finish ./ruby --disable-gems --jit-verbose=1 /tmp/bench.rb 11.54s user 0.06s system 108% cpu 10.726 total ``` The execution time is shortened. * optcarrot benchmark Optcarrot has no room to be improved by this change. Almost nothing is changed. fps: 59.54 (before) -> 59.51 (after) * discourse benchmark I expected this to be improved a little, but it isn't too. ** before (JIT) ``` categories_admin: 50: 12 75: 13 90: 14 99: 22 home_admin: 50: 12 75: 13 90: 16 99: 22 topic_admin: 50: 12 75: 13 90: 15 99: 21 categories: 50: 18 75: 19 90: 23 99: 27 home: 50: 3 75: 4 90: 4 99: 12 topic: 50: 11 75: 11 90: 14 99: 20 ``` ** after (JIT) ``` categories_admin: 50: 12 75: 12 90: 16 99: 24 home_admin: 50: 12 75: 12 90: 14 99: 21 topic_admin: 50: 12 75: 13 90: 16 99: 21 categories: 50: 17 75: 18 90: 23 99: 32 home: 50: 3 75: 4 90: 4 99: 10 topic: 50: 11 75: 12 90: 13 99: 20 ``` git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62398 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-02-13 10:58:38 -05:00
begin;
obj = Object.new
def obj.[](h)
h
end
block = proc { |h| h[1] }
print block.call(obj)
print block.call({ 1 => 2 })
print block.call({ 1 => 2 })
end;
end
def test_compile_insn_opt_aref_with
assert_compile_once("{ '1' => 2 }['1']", result_inspect: '2', insns: %i[opt_aref_with])
end
mjit_compile.inc.erb: replace opt_key insn with opt_send_without_block insn if call cache has valid ISeq. If the receiver is not optimized target of opt_key (i.e. Hash or Array), it triggers JIT cancel and it would be slow. This change allows JIT to drop the check for Hash/Array and continue to execute JIT even if the receiver is not Hash or Array. See the following benchmark results. It's not improved so much, but it would be effective when we achieve Ruby method inlining in _mjit_compile_send.erb. * Micro benchmark Given the following bench.rb, ``` class HashWithIndifferentAccess < Hash def []=(key, value) super(key.to_s, value) end def [](key) super(key.to_s) end end indhash = HashWithIndifferentAccess.new indhash[:foo] = 'bar' key = 'foo' 100000000.times do indhash[key] end ``` ** before ``` $ time ./ruby --disable-gems --jit-verbose=1 /tmp/bench.rb JIT success (31.4ms): block in <main>@/tmp/bench.rb:15 -> /tmp/_ruby_mjit_p18206u0.c JIT success (669.3ms): []@/tmp/bench.rb:6 -> /tmp/_ruby_mjit_p18206u1.c Successful MJIT finish ./ruby --disable-gems --jit-verbose=1 /tmp/bench.rb 12.21s user 0.04s system 107% cpu 11.394 total ``` ** after ``` $ time ./ruby --disable-gems --jit-verbose=1 /tmp/bench.rb JIT success (41.0ms): block in <main>@/tmp/bench.rb:15 -> /tmp/_ruby_mjit_p17293u0.c JIT success (679.0ms): []@/tmp/bench.rb:6 -> /tmp/_ruby_mjit_p17293u1.c Successful MJIT finish ./ruby --disable-gems --jit-verbose=1 /tmp/bench.rb 11.54s user 0.06s system 108% cpu 10.726 total ``` The execution time is shortened. * optcarrot benchmark Optcarrot has no room to be improved by this change. Almost nothing is changed. fps: 59.54 (before) -> 59.51 (after) * discourse benchmark I expected this to be improved a little, but it isn't too. ** before (JIT) ``` categories_admin: 50: 12 75: 13 90: 14 99: 22 home_admin: 50: 12 75: 13 90: 16 99: 22 topic_admin: 50: 12 75: 13 90: 15 99: 21 categories: 50: 18 75: 19 90: 23 99: 27 home: 50: 3 75: 4 90: 4 99: 12 topic: 50: 11 75: 11 90: 14 99: 20 ``` ** after (JIT) ``` categories_admin: 50: 12 75: 12 90: 16 99: 24 home_admin: 50: 12 75: 12 90: 14 99: 21 topic_admin: 50: 12 75: 13 90: 16 99: 21 categories: 50: 17 75: 18 90: 23 99: 32 home: 50: 3 75: 4 90: 4 99: 10 topic: 50: 11 75: 12 90: 13 99: 20 ``` git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62398 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-02-13 10:58:38 -05:00
def test_compile_insn_opt_aset
assert_compile_once("#{<<~"begin;"}\n#{<<~"end;"}", result_inspect: '5', insns: %i[opt_aset opt_aset_with])
begin;
hash = { '1' => 2 }
mjit_compile.inc.erb: replace opt_key insn with opt_send_without_block insn if call cache has valid ISeq. If the receiver is not optimized target of opt_key (i.e. Hash or Array), it triggers JIT cancel and it would be slow. This change allows JIT to drop the check for Hash/Array and continue to execute JIT even if the receiver is not Hash or Array. See the following benchmark results. It's not improved so much, but it would be effective when we achieve Ruby method inlining in _mjit_compile_send.erb. * Micro benchmark Given the following bench.rb, ``` class HashWithIndifferentAccess < Hash def []=(key, value) super(key.to_s, value) end def [](key) super(key.to_s) end end indhash = HashWithIndifferentAccess.new indhash[:foo] = 'bar' key = 'foo' 100000000.times do indhash[key] end ``` ** before ``` $ time ./ruby --disable-gems --jit-verbose=1 /tmp/bench.rb JIT success (31.4ms): block in <main>@/tmp/bench.rb:15 -> /tmp/_ruby_mjit_p18206u0.c JIT success (669.3ms): []@/tmp/bench.rb:6 -> /tmp/_ruby_mjit_p18206u1.c Successful MJIT finish ./ruby --disable-gems --jit-verbose=1 /tmp/bench.rb 12.21s user 0.04s system 107% cpu 11.394 total ``` ** after ``` $ time ./ruby --disable-gems --jit-verbose=1 /tmp/bench.rb JIT success (41.0ms): block in <main>@/tmp/bench.rb:15 -> /tmp/_ruby_mjit_p17293u0.c JIT success (679.0ms): []@/tmp/bench.rb:6 -> /tmp/_ruby_mjit_p17293u1.c Successful MJIT finish ./ruby --disable-gems --jit-verbose=1 /tmp/bench.rb 11.54s user 0.06s system 108% cpu 10.726 total ``` The execution time is shortened. * optcarrot benchmark Optcarrot has no room to be improved by this change. Almost nothing is changed. fps: 59.54 (before) -> 59.51 (after) * discourse benchmark I expected this to be improved a little, but it isn't too. ** before (JIT) ``` categories_admin: 50: 12 75: 13 90: 14 99: 22 home_admin: 50: 12 75: 13 90: 16 99: 22 topic_admin: 50: 12 75: 13 90: 15 99: 21 categories: 50: 18 75: 19 90: 23 99: 27 home: 50: 3 75: 4 90: 4 99: 12 topic: 50: 11 75: 11 90: 14 99: 20 ``` ** after (JIT) ``` categories_admin: 50: 12 75: 12 90: 16 99: 24 home_admin: 50: 12 75: 12 90: 14 99: 21 topic_admin: 50: 12 75: 13 90: 16 99: 21 categories: 50: 17 75: 18 90: 23 99: 32 home: 50: 3 75: 4 90: 4 99: 10 topic: 50: 11 75: 12 90: 13 99: 20 ``` git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62398 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-02-13 10:58:38 -05:00
(hash['2'] = 2) + (hash[1.to_s] = 3)
end;
end
def test_compile_insn_opt_length_size
assert_compile_once("#{<<~"begin;"}\n#{<<~"end;"}", result_inspect: '4', insns: %i[opt_length opt_size])
begin;
array = [1, 2]
array.length + array.size
end;
end
def test_compile_insn_opt_empty_p
assert_compile_once('[].empty?', result_inspect: 'true', insns: %i[opt_empty_p])
end
def test_compile_insn_opt_succ
assert_compile_once('1.succ', result_inspect: '2', insns: %i[opt_succ])
end
def test_compile_insn_opt_not
assert_compile_once('!!true', result_inspect: 'true', insns: %i[opt_not])
end
def test_compile_insn_opt_regexpmatch2
2019-09-02 01:33:29 -04:00
assert_compile_once("/true/ =~ 'true'", result_inspect: '0', insns: %i[opt_regexpmatch2])
assert_compile_once("'true' =~ /true/", result_inspect: '0', insns: %i[opt_regexpmatch2])
end
def test_compile_insn_opt_call_c_function
skip "support this in opt_call_c_function (low priority)"
end
def test_jit_output
out, err = eval_with_jit('5.times { puts "MJIT" }', verbose: 1, min_calls: 5)
assert_equal("MJIT\n" * 5, out)
assert_match(/^#{JIT_SUCCESS_PREFIX}: block in <main>@-e:1 -> .+_ruby_mjit_p\d+u\d+\.c$/, err)
assert_match(/^Successful MJIT finish$/, err)
end
def test_unload_units_and_compaction
Dir.mktmpdir("jit_test_unload_units_") do |dir|
# MIN_CACHE_SIZE is 10
out, err = eval_with_jit({"TMPDIR"=>dir}, "#{<<~"begin;"}\n#{<<~'end;'}", verbose: 1, min_calls: 1, max_cache: 10)
begin;
i = 0
while i < 11
eval(<<-EOS)
def mjit#{i}
print #{i}
end
mjit#{i}
EOS
i += 1
end
if defined?(fork)
# test the child does not try to delete files which are deleted by parent,
# and test possible deadlock on fork during MJIT unload and JIT compaction on child
Process.waitpid(Process.fork {})
end
end;
debug_info = %Q[stdout:\n"""\n#{out}\n"""\n\nstderr:\n"""\n#{err}"""\n]
assert_equal('012345678910', out, debug_info)
compactions, errs = err.lines.partition do |l|
l.match?(/\AJIT compaction \(\d+\.\dms\): Compacted \d+ methods ->/)
mjit.c: introduce JIT compaction [experimental] When all compilation finishes or the number of JIT-ed code reaches --jit-max-cache, this compacts all generated code to a single .so file and re-loads all methods from it. In the future, it may trigger compaction more frequently and/or limit the maximum times of compaction to prevent unlimited memory usage. So the current behavior is experimental, but at least the performance improvement in this commit won't be removed. === Benchmark === In this benchmark, I'll compare following four conditions: * trunk: r64082 * trunk JIT: r64082 w/ --jit * single-so JIT: This commit w/ --jit * objfcn JIT: This branch https://github.com/k0kubun/ruby/tree/objfcn w/ --jit, which is shinh's objfcn https://github.com/shinh/ruby/tree/objfcn rebased from this commit ``` $ uname -a Linux bionic 4.15.0-29-generic #31-Ubuntu SMP Tue Jul 17 15:39:52 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux ``` * Micro benchmark Using this script https://gist.github.com/k0kubun/10e6d3387c9ab1b134622b2c9d76ef51, calls some amount of different methods that just return `nil`. The following tables are its average duration seconds of 3 measurements. Smaller is better. ** 1 method (seconds) | | trunk | trunk JIT | single-so JIT | objfcn JIT | |:------|:------------------|:------------------|:------------------|:------------------| | Time | 5.576067774333296 | 5.915551971666446 | 5.833641665666619 | 5.845915191666639 | | Ratio | 1.00x | 1.06x | 1.05x | 1.05x | ** 50 methods (seconds) | | trunk | trunk JIT | single-so JIT | objfcn JIT | |:------|:------------------|:------------------|:------------------|:------------------| | Time | 3.1661167996666677| 6.125825928333342 | 4.135432743666665 | 3.750358728333348 | | Ratio | 1.00x | 1.93x | 1.31x | 1.18x | ** 1500 methods (seconds) | | trunk | trunk JIT | single-so JIT | objfcn JIT | |:------|:------------------|:------------------|:------------------|:------------------| | Time | 5.971650823666664 | 19.579182102999994| 10.511108153999961| 10.854653588999932| | Ratio | 1.00x | 3.28x | 1.76x | 1.82x | * Discourse Using the same benchmark strategy as https://bugs.ruby-lang.org/issues/14490 with this branch https://github.com/k0kubun/discourse/commits/benchmark2 forked from discourse v1.8.11 to support running trunk. 1. Run ruby script/bench.rb to warm up profiling database 2. Run RUBYOPT='--jit-verbose=1 --jit-max-cache=10000' RAILS_ENV=profile bin/puma -e production 3. WAIT 5-15 or so minutes for all jitting to stop so we have no cross talk 4. Run ab -n 100 http://localhost:9292/ 5. Wait for all new jitting to finish 6. Run ab -n 100 http://localhost:9292/ ** Response time (ms) Here is the response time milliseconds for each percentile. Skipping 99%ile because it's the same as 100%ile in 100 calls. | | trunk| trunk|single|objfcn| | | | JIT|so JIT| JIT| |:----|:-----|:-----|:-----|:-----| | 50% | 38 | 45 | 41 | 43 | | 66% | 39 | 50 | 44 | 44 | | 75% | 47 | 51 | 46 | 45 | | 80% | 49 | 52 | 47 | 47 | | 90% | 50 | 63 | 50 | 52 | | 95% | 60 | 79 | 52 | 55 | | 98% | 91 | 114 | 91 | 91 | |100% | 97 | 133 | 96 | 99 | ** Ratio (smaller is better) Here is the response time increase ratio against no-JIT trunk's one. | | trunk| trunk|single|objfcn| | | | JIT|so JIT| JIT| |:----|:-----|:-----|:-----|:-----| | 50% | 1.00x| 1.18x| 1.08x| 1.13x| | 66% | 1.00x| 1.28x| 1.13x| 1.13x| | 75% | 1.00x| 1.09x| 0.98x| 0.96x| | 80% | 1.00x| 1.06x| 0.96x| 0.96x| | 90% | 1.00x| 1.26x| 1.00x| 1.04x| | 95% | 1.00x| 1.32x| 0.87x| 0.92x| | 98% | 1.00x| 1.25x| 1.00x| 1.00x| |100% | 1.00x| 1.37x| 0.99x| 1.02x| While 50 and 60 %ile are still worse than no-JIT trunk, 75, 80, 90, 95, 98 and 100% are not slower than that. So now it's a little harder to say "MJIT slows down Rails applications". Probably I can close [Bug #14490] now. Let's start improving it. Close https://github.com/ruby/ruby/pull/1921 git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@64094 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-07-28 12:14:56 -04:00
end
10.times do |i|
assert_match(/\A#{JIT_SUCCESS_PREFIX}: mjit#{i}@\(eval\):/, errs[i], debug_info)
end
assert_equal("Too many JIT code -- 1 units unloaded\n", errs[10], debug_info)
assert_match(/\A#{JIT_SUCCESS_PREFIX}: mjit10@\(eval\):/, errs[11], debug_info)
# On --jit-wait, when the number of JIT-ed code reaches --jit-max-cache,
# it should trigger compaction.
unless RUBY_PLATFORM.match?(/mswin|mingw/) # compaction is not supported on Windows yet
assert_equal(3, compactions.size, debug_info)
end
if RUBY_PLATFORM.match?(/mswin/)
# "Permission Denied" error is preventing to remove so file on AppVeyor/RubyCI.
skip 'Removing so file is randomly failing on AppVeyor/RubyCI mswin due to Permission Denied.'
else
# verify .o files are deleted on unload_units
assert_send([Dir, :empty?, dir], debug_info)
end
end
end
mjit_compile.c: use local variables for stack if catch_except_p is FALSE. If catch_except_p is TRUE, stack values should be on VM's stack when exception is thrown and the JIT-ed frame is re-executed by VM's exception handler. If it's FALSE, the JIT-ed frame won't be re-executed and don't need to keep values on VM's stack. Using local variables allows us to reduce cfp->sp motion. Moving cfp->sp is needed only for insns whose handles_frame? is false. So it improves performance. _mjit_compile_insn.erb: Prepare `stack_size` variable for GET_SP, STACK_ADDR_FROM_TOP, TOPN macros. Share pc and sp motion partial view. Use cancel handler created in mjit_compile.c. _mjit_compile_send.erb: ditto. Also, when iseq->body->catch_except_p is TRUE, this stops to call mjit_exec directly. I described the reason in vm_insnhelper.h's comment for EXEC_EC_CFP. _mjit_compile_pc_and_sp.erb: Shared logic for moving sp and pc. As you can see from thsi file, when status->local_stack_p is TRUE and insn.handles_frame? is false, moving sp is skipped. But if insn.handles_frame? is true, values should be rolled back to VM's stack. common.mk: add dependency for the file _mjit_compile_insn_body.erb: Set sp value before canceling JIT on DISPATCH_ORIGINAL_INSN. Replace GET_SP, STACK_ADDR_FROM_TOP, TOPN macros for the case ocal_stack_p is TRUE and insn.handles_frame? is false. In that case, values are not available on VM's stack and those macros should be replaced. mjit_compile.inc.erb: updated comments of macros which are supported by JIT compiler. All references to `cfp->sp` should be replaced and thus INC_SP, SET_SV, PUSH are no longer supported for now, because they are not used now. vm_exec.h: moved EXEC_EC_CFP definition to vm_insnhelper.h because it's tighly coupled to CALL_METHOD. vm_insnhelper.h: Have revised EXEC_EC_CFP definition moved from vm_exec.h. Now it triggers mjit_exec for VM, and has the guard for catch_except_p on JIT-ed code. See comments for details. CALL_METHOD delegates triggering mjit_exec to EXEC_EC_CFP. insns.def: Stopped using EXEC_EC_CFP for the case we don't want to trigger mjit_exec. Those insns (defineclass, opt_call_c_function) are not supported by JIT and it's safe to use RESTORE_REGS(), NEXT_INSN(). expandarray is changed to pass GET_SP() to replace the macro in _mjit_compile_insn_body.erb. vm_insnhelper.c: change to take sp for the above reason. [close https://github.com/ruby/ruby/pull/1828] This patch resurrects the performance which was attached in [Feature #14235]. * Benchmark Optcarrot (with configuration for benchmark_driver.gem) https://github.com/benchmark-driver/optcarrot $ benchmark-driver benchmark.yml --verbose 1 --rbenv 'before;before+JIT::before,--jit;after;after+JIT::after,--jit' --repeat-count 10 before: ruby 2.6.0dev (2018-03-04 trunk 62652) [x86_64-linux] before+JIT: ruby 2.6.0dev (2018-03-04 trunk 62652) +JIT [x86_64-linux] after: ruby 2.6.0dev (2018-03-04 local-variable.. 62652) [x86_64-linux] last_commit=mjit_compile.c: use local variables for stack after+JIT: ruby 2.6.0dev (2018-03-04 local-variable.. 62652) +JIT [x86_64-linux] last_commit=mjit_compile.c: use local variables for stack Calculating ------------------------------------- before before+JIT after after+JIT optcarrot 53.552 59.680 53.697 63.358 fps Comparison: optcarrot after+JIT: 63.4 fps before+JIT: 59.7 fps - 1.06x slower after: 53.7 fps - 1.18x slower before: 53.6 fps - 1.18x slower git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62655 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-03-04 02:04:40 -05:00
def test_local_stack_on_exception
assert_eval_with_jit("#{<<~"begin;"}\n#{<<~"end;"}", stdout: '3', success_count: 2)
begin;
def b
raise
rescue
2
end
def a
# Calling #b should be vm_exec, not direct mjit_exec.
# Otherwise `1` on local variable would be purged.
1 + b
end
print a
end;
end
def test_local_stack_with_sp_motion_by_blockargs
assert_eval_with_jit("#{<<~"begin;"}\n#{<<~"end;"}", stdout: '1', success_count: 2)
begin;
def b(base)
1
end
# This method is simple enough to have false in catch_except_p.
# So local_stack_p would be true in JIT compiler.
def a
m = method(:b)
# ci->flag has VM_CALL_ARGS_BLOCKARG and cfp->sp is moved in vm_caller_setup_arg_block.
# So, for this send insn, JIT-ed code should use cfp->sp instead of local variables for stack.
Module.module_eval(&m)
end
print a
end;
end
def test_catching_deep_exception
assert_eval_with_jit("#{<<~"begin;"}\n#{<<~"end;"}", stdout: '1', success_count: 4)
begin;
def catch_true(paths, prefixes) # catch_except_p: TRUE
prefixes.each do |prefix| # catch_except_p: TRUE
paths.each do |path| # catch_except_p: FALSE
return path
end
end
end
def wrapper(paths, prefixes)
catch_true(paths, prefixes)
end
print wrapper(['1'], ['2'])
end;
end
def test_inlined_undefined_ivar
Recompile JIT-ed code without optimization based on inline cache when JIT cancel happens by that. This feature was in the original MJIT implementation by Vladimir, but on merging MJIT to Ruby it was removed for simplification. This commit adds the functionality again for the following benchmark: https://github.com/benchmark-driver/misc/blob/52f05781f65467baf895bf6ba79d172c9b0826fd/concurrent-map/bench.rb (shown float is duration seconds. shorter is better) * Before ``` $ INHERIT=0 ruby -v bench.rb ruby 2.7.0dev (2019-04-13 trunk 67523) [x86_64-linux] -- 1.6507579649914987 $ INHERIT=0 ruby -v --jit bench.rb ruby 2.7.0dev (2019-04-13 trunk 67523) +JIT [x86_64-linux] -- 1.5091587850474752 $ INHERIT=1 ruby -v bench.rb ruby 2.7.0dev (2019-04-13 trunk 67523) [x86_64-linux] -- 1.6124781150138006 $ INHERIT=1 ruby --jit -v bench.rb ruby 2.7.0dev (2019-04-13 trunk 67523) +JIT [x86_64-linux] -- 1.7495657080435194 # <-- this ``` * After ``` $ INHERIT=0 ruby -v bench.rb ruby 2.7.0dev (2019-04-13 trunk 67523) [x86_64-linux] last_commit=Recompile JIT-ed code without optimization -- 1.653559010999743 $ INHERIT=0 ruby --jit -v bench.rb ruby 2.7.0dev (2019-04-13 trunk 67523) +JIT [x86_64-linux] last_commit=Recompile JIT-ed code without optimization -- 1.4738391840364784 $ INHERIT=1 ruby -v bench.rb ruby 2.7.0dev (2019-04-13 trunk 67523) [x86_64-linux] last_commit=Recompile JIT-ed code without optimization -- 1.645227018976584 $ INHERIT=1 ruby --jit -v bench.rb ruby 2.7.0dev (2019-04-13 trunk 67523) +JIT [x86_64-linux] last_commit=Recompile JIT-ed code without optimization -- 1.523708809982054 # <-- this ``` git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67530 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2019-04-14 00:52:02 -04:00
assert_eval_with_jit("#{<<~"begin;"}\n#{<<~"end;"}", stdout: "bbb", success_count: 3, min_calls: 3)
begin;
class Foo
def initialize
@a = :a
end
def bar
if @b.nil?
@b = :b
end
end
end
verbose, $VERBOSE = $VERBOSE, false # suppress "instance variable @b not initialized"
print(Foo.new.bar)
print(Foo.new.bar)
print(Foo.new.bar)
$VERBOSE = verbose
end;
end
Check frozen flag on MJIT setinstancevariable It does not seem to have a significant performance impact, hopefully? ``` $ benchmark-driver -v benchmark.yml --rbenv 'before --jit;after --jit' --repeat-count=24 --output=all before --jit: ruby 2.7.0dev (2019-09-03T21:02:24Z master 77596fb7a9) +JIT [x86_64-linux] after --jit: ruby 2.7.0dev (2019-09-04T01:54:44Z master 7363e22d79) +JIT [x86_64-linux] Calculating ------------------------------------- before --jit after --jit Optcarrot Lan_Master.nes 48.44054595799523 71.67010255902900 fps 71.32797692837639 71.97846863769546 72.51921961607691 78.87360980544105 73.54082925611047 79.80408132389941 74.03503843709451 79.85739528572826 74.04863857926493 79.89850834901381 75.30266276129467 80.34607233076015 75.69063990896244 80.88474397425360 75.70458132587405 81.09234267781642 77.39842764662852 82.13766823612643 77.76922944068329 82.20398304840373 81.17984044023393 82.26722630628272 82.85235776076533 82.71375902781254 83.04906099135320 82.75893420702198 83.10214168136230 82.79668965325972 83.71456007558125 82.85131667916379 84.06658306760725 82.95676565411722 84.25690684305728 83.19972846225775 84.27938663923503 83.28510503845854 84.45467716218090 83.41003730434703 84.51563186125925 83.67773614721280 84.56139892968321 84.02082201151110 84.69819452180658 84.10495346787033 84.78125989622576 84.47867803506055 ``` Note for backporter: test_jit's `success_count` would be 1 in Ruby 2.6, since 2.7 introduced "MJIT recompile" on JIT-ed code cancel. [Bug #16139]
2019-09-03 21:53:20 -04:00
def test_inlined_setivar_frozen
assert_eval_with_jit("#{<<~"begin;"}\n#{<<~"end;"}", stdout: "FrozenError\n", success_count: 2, min_calls: 3)
begin;
class A
def a
@a = 1
end
end
a = A.new
a.a
a.a
a.a
a.freeze
begin
a.a
rescue FrozenError => e
p e.class
end
end;
end
def test_attr_reader
assert_eval_with_jit("#{<<~"begin;"}\n#{<<~"end;"}", stdout: "4nil\nnil\n6", success_count: 2, min_calls: 2)
begin;
class A
attr_reader :a, :b
def initialize
@a = 2
end
def test
a
end
def undefined
b
end
end
a = A.new
print(a.test * a.test)
p(a.undefined)
p(a.undefined)
# redefinition
def a.test
3
end
print(2 * a.test)
end;
assert_eval_with_jit("#{<<~"begin;"}\n#{<<~"end;"}", stdout: "true", success_count: 1, min_calls: 2)
begin;
class Hoge
attr_reader :foo
def initialize
@foo = []
@bar = nil
end
end
class Fuga < Hoge
def initialize
@bar = nil
@foo = []
end
end
def test(recv)
recv.foo.empty?
end
hoge = Hoge.new
fuga = Fuga.new
test(hoge) # VM: cc set index=1
test(hoge) # JIT: compile with index=1
test(fuga) # JIT -> VM: cc set index=2
print test(hoge) # JIT: should use index=1, not index=2 in cc
end;
end
def test_clean_so
if RUBY_PLATFORM.match?(/mswin/)
skip 'Removing so file is randomly failing on AppVeyor/RubyCI mswin due to Permission Denied.'
end
Dir.mktmpdir("jit_test_clean_so_") do |dir|
code = "x = 0; 10.times {|i|x+=i}"
eval_with_jit({"TMPDIR"=>dir}, code)
assert_send([Dir, :empty?, dir])
eval_with_jit({"TMPDIR"=>dir}, code, save_temps: true)
assert_not_send([Dir, :empty?, dir])
end
end
def test_clean_objects_on_exec
if /mswin|mingw/ =~ RUBY_PLATFORM
# TODO: check call stack and close handle of code which is not on stack, and remove objects on best-effort basis
skip 'Removing so file being used does not work on Windows'
end
Dir.mktmpdir("jit_test_clean_objects_on_exec_") do |dir|
eval_with_jit({"TMPDIR"=>dir}, "#{<<~"begin;"}\n#{<<~"end;"}", min_calls: 1)
begin;
def a; end; a
exec "true"
end;
error_message = "Undeleted files:\n #{Dir.glob("#{dir}/*").join("\n ")}\n"
assert_send([Dir, :empty?, dir], error_message)
end
end
def test_lambda_longjmp
assert_eval_with_jit("#{<<~"begin;"}\n#{<<~"end;"}", stdout: '5', success_count: 1)
begin;
fib = lambda do |x|
return x if x == 0 || x == 1
fib.call(x-1) + fib.call(x-2)
end
print fib.call(5)
end;
end
def test_stack_pointer_with_assignment
assert_eval_with_jit("#{<<~"begin;"}\n#{<<~"end;"}", stdout: "nil\nnil\n", success_count: 1)
begin;
2.times do
a, _ = nil
p a
end
end;
end
def test_frame_omitted_inlining
assert_eval_with_jit("#{<<~"begin;"}\n#{<<~"end;"}", stdout: "true\ntrue\ntrue\n", success_count: 1, min_calls: 2)
begin;
class Numeric
remove_method :zero?
def zero?
self == 0
end
end
3.times do
p 0.zero?
end
end;
end
def test_block_handler_with_possible_frame_omitted_inlining
assert_eval_with_jit("#{<<~"begin;"}\n#{<<~"end;"}", stdout: "70.0\n70.0\n70.0\n", success_count: 2, min_calls: 2)
begin;
def multiply(a, b)
a *= b
end
3.times do
p multiply(7.0, 10.0)
end
end;
end
def test_program_counter_with_regexpmatch
assert_eval_with_jit("#{<<~"begin;"}\n#{<<~"end;"}", stdout: "aa", success_count: 1)
begin;
2.times do
break if /a/ =~ "ab" && !$~[0]
print $~[0]
end
end;
end
def test_pushed_values_with_opt_aset_with
assert_eval_with_jit("#{<<~"begin;"}\n#{<<~"end;"}", stdout: "{}{}", success_count: 1)
begin;
2.times do
print(Thread.current["a"] = {})
end
end;
end
def test_pushed_values_with_opt_aref_with
assert_eval_with_jit("#{<<~"begin;"}\n#{<<~"end;"}", stdout: "nil\nnil\n", success_count: 1)
begin;
2.times do
p(Thread.current["a"])
end
end;
end
def test_caller_locations_without_catch_table
out, _ = eval_with_jit("#{<<~"begin;"}\n#{<<~"end;"}", min_calls: 1)
begin;
def b # 2
caller_locations.first # 3
end # 4
# 5
def a # 6
print # <-- don't leave PC here # 7
b # 8
end
puts a
puts a
end;
lines = out.lines
assert_equal("-e:8:in `a'\n", lines[0])
assert_equal("-e:8:in `a'\n", lines[1])
end
def test_fork_with_mjit_worker_thread
Dir.mktmpdir("jit_test_fork_with_mjit_worker_thread_") do |dir|
# min_calls: 2 to skip fork block
out, err = eval_with_jit({ "TMPDIR" => dir }, "#{<<~"begin;"}\n#{<<~"end;"}", min_calls: 2, verbose: 1)
begin;
def before_fork; end
def after_fork; end
before_fork; before_fork # the child should not delete this .o file
pid = Process.fork do # this child should not delete shared .pch file
sleep 0.5 # to prevent mixing outputs on Solaris
after_fork; after_fork # this child does not share JIT-ed after_fork with parent
end
after_fork; after_fork # this parent does not share JIT-ed after_fork with child
Process.waitpid(pid)
end;
success_count = err.scan(/^#{JIT_SUCCESS_PREFIX}:/).size
debug_info = "stdout:\n```\n#{out}\n```\n\nstderr:\n```\n#{err}```\n"
assert_equal(3, success_count, debug_info)
# assert no remove error
assert_equal("Successful MJIT finish\n" * 2, err.gsub(/^#{JIT_SUCCESS_PREFIX}:[^\n]+\n/, ''), debug_info)
# ensure objects are deleted
assert_send([Dir, :empty?, dir], debug_info)
end
end if defined?(fork)
private
# The shortest way to test one proc
2019-08-02 04:51:45 -04:00
def assert_compile_once(script, result_inspect:, insns: [], uplevel: 1)
if script.match?(/\A\n.+\n\z/m)
script = script.gsub(/^/, ' ')
else
script = " #{script} "
end
2019-08-02 04:51:45 -04:00
assert_eval_with_jit("p proc {#{script}}.call", stdout: "#{result_inspect}\n", success_count: 1, insns: insns, uplevel: uplevel + 1)
end
# Shorthand for normal test cases
2019-08-02 04:51:45 -04:00
def assert_eval_with_jit(script, stdout: nil, success_count:, min_calls: 1, insns: [], uplevel: 1)
mjit_compile.inc.erb: replace opt_key insn with opt_send_without_block insn if call cache has valid ISeq. If the receiver is not optimized target of opt_key (i.e. Hash or Array), it triggers JIT cancel and it would be slow. This change allows JIT to drop the check for Hash/Array and continue to execute JIT even if the receiver is not Hash or Array. See the following benchmark results. It's not improved so much, but it would be effective when we achieve Ruby method inlining in _mjit_compile_send.erb. * Micro benchmark Given the following bench.rb, ``` class HashWithIndifferentAccess < Hash def []=(key, value) super(key.to_s, value) end def [](key) super(key.to_s) end end indhash = HashWithIndifferentAccess.new indhash[:foo] = 'bar' key = 'foo' 100000000.times do indhash[key] end ``` ** before ``` $ time ./ruby --disable-gems --jit-verbose=1 /tmp/bench.rb JIT success (31.4ms): block in <main>@/tmp/bench.rb:15 -> /tmp/_ruby_mjit_p18206u0.c JIT success (669.3ms): []@/tmp/bench.rb:6 -> /tmp/_ruby_mjit_p18206u1.c Successful MJIT finish ./ruby --disable-gems --jit-verbose=1 /tmp/bench.rb 12.21s user 0.04s system 107% cpu 11.394 total ``` ** after ``` $ time ./ruby --disable-gems --jit-verbose=1 /tmp/bench.rb JIT success (41.0ms): block in <main>@/tmp/bench.rb:15 -> /tmp/_ruby_mjit_p17293u0.c JIT success (679.0ms): []@/tmp/bench.rb:6 -> /tmp/_ruby_mjit_p17293u1.c Successful MJIT finish ./ruby --disable-gems --jit-verbose=1 /tmp/bench.rb 11.54s user 0.06s system 108% cpu 10.726 total ``` The execution time is shortened. * optcarrot benchmark Optcarrot has no room to be improved by this change. Almost nothing is changed. fps: 59.54 (before) -> 59.51 (after) * discourse benchmark I expected this to be improved a little, but it isn't too. ** before (JIT) ``` categories_admin: 50: 12 75: 13 90: 14 99: 22 home_admin: 50: 12 75: 13 90: 16 99: 22 topic_admin: 50: 12 75: 13 90: 15 99: 21 categories: 50: 18 75: 19 90: 23 99: 27 home: 50: 3 75: 4 90: 4 99: 12 topic: 50: 11 75: 11 90: 14 99: 20 ``` ** after (JIT) ``` categories_admin: 50: 12 75: 12 90: 16 99: 24 home_admin: 50: 12 75: 12 90: 14 99: 21 topic_admin: 50: 12 75: 13 90: 16 99: 21 categories: 50: 17 75: 18 90: 23 99: 32 home: 50: 3 75: 4 90: 4 99: 10 topic: 50: 11 75: 12 90: 13 99: 20 ``` git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62398 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-02-13 10:58:38 -05:00
out, err = eval_with_jit(script, verbose: 1, min_calls: min_calls)
actual = err.scan(/^#{JIT_SUCCESS_PREFIX}:/).size
# Add --jit-verbose=2 logs for cl.exe because compiler's error message is suppressed
# for cl.exe with --jit-verbose=1. See `start_process` in mjit_worker.c.
if RUBY_PLATFORM.match?(/mswin/) && success_count != actual
out2, err2 = eval_with_jit(script, verbose: 2, min_calls: min_calls)
end
# Make sure that the script has insns expected to be tested
used_insns = method_insns(script)
insns.each do |insn|
unless used_insns.include?(insn)
$stderr.puts
2019-08-02 04:51:45 -04:00
warn "'#{insn}' insn is not included in the script. Actual insns are: #{used_insns.join(' ')}\n", uplevel: uplevel+2
end
TestJIT.untested_insns.delete(insn)
end
assert_equal(
success_count, actual,
"Expected #{success_count} times of JIT success, but succeeded #{actual} times.\n\n"\
"script:\n#{code_block(script)}\nstderr:\n#{code_block(err)}#{(
"\nstdout(verbose=2 retry):\n#{code_block(out2)}\nstderr(verbose=2 retry):\n#{code_block(err2)}" if out2 || err2
)}",
)
if stdout
assert_equal(stdout, out, "Expected stdout #{out.inspect} to match #{stdout.inspect} with script:\n#{code_block(script)}")
end
mjit.c: introduce JIT compaction [experimental] When all compilation finishes or the number of JIT-ed code reaches --jit-max-cache, this compacts all generated code to a single .so file and re-loads all methods from it. In the future, it may trigger compaction more frequently and/or limit the maximum times of compaction to prevent unlimited memory usage. So the current behavior is experimental, but at least the performance improvement in this commit won't be removed. === Benchmark === In this benchmark, I'll compare following four conditions: * trunk: r64082 * trunk JIT: r64082 w/ --jit * single-so JIT: This commit w/ --jit * objfcn JIT: This branch https://github.com/k0kubun/ruby/tree/objfcn w/ --jit, which is shinh's objfcn https://github.com/shinh/ruby/tree/objfcn rebased from this commit ``` $ uname -a Linux bionic 4.15.0-29-generic #31-Ubuntu SMP Tue Jul 17 15:39:52 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux ``` * Micro benchmark Using this script https://gist.github.com/k0kubun/10e6d3387c9ab1b134622b2c9d76ef51, calls some amount of different methods that just return `nil`. The following tables are its average duration seconds of 3 measurements. Smaller is better. ** 1 method (seconds) | | trunk | trunk JIT | single-so JIT | objfcn JIT | |:------|:------------------|:------------------|:------------------|:------------------| | Time | 5.576067774333296 | 5.915551971666446 | 5.833641665666619 | 5.845915191666639 | | Ratio | 1.00x | 1.06x | 1.05x | 1.05x | ** 50 methods (seconds) | | trunk | trunk JIT | single-so JIT | objfcn JIT | |:------|:------------------|:------------------|:------------------|:------------------| | Time | 3.1661167996666677| 6.125825928333342 | 4.135432743666665 | 3.750358728333348 | | Ratio | 1.00x | 1.93x | 1.31x | 1.18x | ** 1500 methods (seconds) | | trunk | trunk JIT | single-so JIT | objfcn JIT | |:------|:------------------|:------------------|:------------------|:------------------| | Time | 5.971650823666664 | 19.579182102999994| 10.511108153999961| 10.854653588999932| | Ratio | 1.00x | 3.28x | 1.76x | 1.82x | * Discourse Using the same benchmark strategy as https://bugs.ruby-lang.org/issues/14490 with this branch https://github.com/k0kubun/discourse/commits/benchmark2 forked from discourse v1.8.11 to support running trunk. 1. Run ruby script/bench.rb to warm up profiling database 2. Run RUBYOPT='--jit-verbose=1 --jit-max-cache=10000' RAILS_ENV=profile bin/puma -e production 3. WAIT 5-15 or so minutes for all jitting to stop so we have no cross talk 4. Run ab -n 100 http://localhost:9292/ 5. Wait for all new jitting to finish 6. Run ab -n 100 http://localhost:9292/ ** Response time (ms) Here is the response time milliseconds for each percentile. Skipping 99%ile because it's the same as 100%ile in 100 calls. | | trunk| trunk|single|objfcn| | | | JIT|so JIT| JIT| |:----|:-----|:-----|:-----|:-----| | 50% | 38 | 45 | 41 | 43 | | 66% | 39 | 50 | 44 | 44 | | 75% | 47 | 51 | 46 | 45 | | 80% | 49 | 52 | 47 | 47 | | 90% | 50 | 63 | 50 | 52 | | 95% | 60 | 79 | 52 | 55 | | 98% | 91 | 114 | 91 | 91 | |100% | 97 | 133 | 96 | 99 | ** Ratio (smaller is better) Here is the response time increase ratio against no-JIT trunk's one. | | trunk| trunk|single|objfcn| | | | JIT|so JIT| JIT| |:----|:-----|:-----|:-----|:-----| | 50% | 1.00x| 1.18x| 1.08x| 1.13x| | 66% | 1.00x| 1.28x| 1.13x| 1.13x| | 75% | 1.00x| 1.09x| 0.98x| 0.96x| | 80% | 1.00x| 1.06x| 0.96x| 0.96x| | 90% | 1.00x| 1.26x| 1.00x| 1.04x| | 95% | 1.00x| 1.32x| 0.87x| 0.92x| | 98% | 1.00x| 1.25x| 1.00x| 1.00x| |100% | 1.00x| 1.37x| 0.99x| 1.02x| While 50 and 60 %ile are still worse than no-JIT trunk, 75, 80, 90, 95, 98 and 100% are not slower than that. So now it's a little harder to say "MJIT slows down Rails applications". Probably I can close [Bug #14490] now. Let's start improving it. Close https://github.com/ruby/ruby/pull/1921 git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@64094 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-07-28 12:14:56 -04:00
err_lines = err.lines.reject! do |l|
l.chomp.empty? || l.match?(/\A#{JIT_SUCCESS_PREFIX}/) || IGNORABLE_PATTERNS.any? { |pat| pat.match?(l) }
end
unless err_lines.empty?
warn err_lines.join(''), uplevel: uplevel
end
end
# Collect block's insns or defined method's insns, which are expected to be JIT-ed.
# Note that this intentionally excludes insns in script's toplevel because they are not JIT-ed.
def method_insns(script)
insns = []
RubyVM::InstructionSequence.compile(script).to_a.last.each do |(insn, *args)|
case insn
when :send
insns += collect_insns(args.last)
when :definemethod, :definesmethod
insns += collect_insns(args[1])
when :defineclass
insns += collect_insns(args[1])
end
end
insns.uniq
end
# Recursively collect insns in iseq_array
def collect_insns(iseq_array)
return [] if iseq_array.nil?
insns = iseq_array.last.select { |x| x.is_a?(Array) }.map(&:first)
iseq_array.last.each do |(insn, *args)|
case insn
when :definemethod, :definesmethod, :send
insns += collect_insns(args.last)
end
end
insns
end
end