This updates the trace instructions to directly dispatch to
opt_send_without_block. So this should cause no slowdown in
non-trace mode.
To enable the tracing of the optimized methods, RUBY_EVENT_C_CALL
and RUBY_EVENT_C_RETURN are added as events to the specialized
instructions.
Fixes [Bug #14870]
Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
I'm investigating SEGVs like https://github.com/ruby/ruby/runs/2715166621?check_suite_focus=true.
Because a lot of things are going on on this line, it's hard to identify
the cause, especially because we can't get the core file of the failures.
Therefore I intentionally increased the number of lines for
investigation.
on interruption.
The cancellation code was originally written for leave insn, but re-entering
opt_invokebuiltin_delegate_leave insn on a cancellation is not safe, because
a builtin function is executed twice.
e7fc353f04 reverted vm_ic_hit_p's signature change made in 53babf35ef,
which broke JIT compilation of getinlinecache.
To make sure it doesn't happen again, I separated vm_inlined_ic_hit_p to
make the intention clear.
constant cache `IC` is accessed by non-atomic manner and there are
thread-safety issues, so Ruby 3.0 disables to use const cache on
non-main ractors.
This patch enables it by introducing `imemo_constcache` and allocates
it by every re-fill of const cache like `imemo_callcache`.
[Bug #17510]
Now `IC` only has one entry `IC::entry` and it points to
`iseq_inline_constant_cache_entry`, managed by T_IMEMO object.
`IC` is atomic data structure so `rb_mjit_before_vm_ic_update()` and
`rb_mjit_after_vm_ic_update()` is not needed.
when we already check ROBJECT_NUMIV(self) is larger than
ROBJECT_EMBED_LEN_MAX at the beginning of the method, because the number
of instance variables for the same object doesn't decrease.
```
$ benchmark-driver -v --rbenv 'before --jit;after --jit' --repeat-count=4 --alternate --output=all benchmark_3000.yml
before --jit: ruby 3.0.0dev (2020-12-23T06:32:19Z master dbb4f19969) +JIT [x86_64-linux]
after --jit: ruby 3.0.0dev (2020-12-23T07:45:42Z master 95e866c098) +JIT [x86_64-linux]
last_commit=Skip checking ROBJECT_EMBED
Calculating -------------------------------------
before --jit after --jit
Optcarrot 3000 frames 102.34091772397872 102.77738408379015 fps
103.37784821624231 105.46530219076179
104.39567016876369 106.43712452152215
105.31782092252713 106.54986150067481
```
`cd` is passed to method call functions to method invocation
functions, but `cd` can be manipulated by other ractors simultaneously
so it contains thread-safety issue.
To solve this issue, this patch stores `ci` and found `cc` to `calling`
and stops to pass `cd`.
Performance is probably improved?
$ benchmark-driver -v --rbenv 'before --jit;after --jit' --repeat-count=12 --alternate --output=all benchmark.yml
before --jit: ruby 3.0.0dev (2020-11-27T04:37:47Z master 69e77e81dc) +JIT [x86_64-linux]
after --jit: ruby 3.0.0dev (2020-11-27T05:28:19Z master df6b05c6dd) +JIT [x86_64-linux]
last_commit=Set VM_FRAME_FLAG_FINISH at once
Calculating -------------------------------------
before --jit after --jit
Optcarrot Lan_Master.nes 80.89292998533379 82.19497327502751 fps
80.93130641142331 85.13943315260148
81.06214830270119 87.43757879797808
82.29172808453910 87.89942441487113
84.61206450455929 87.91309779491075
85.44545883567997 87.98026086648694
86.02923132404449 88.03081060383973
86.07411817365879 88.14650206137341
86.34348799602836 88.32791633649961
87.90257338977324 88.57599644892220
88.58006509876580 88.67426384743277
89.26611118140011 88.81669430874207
This should have no bad impact on VM because this function is ALWAYS_INLINE.
iv_index_tbl manages instance variable indexes (ID -> index).
This data structure should be synchronized with other ractors
so introduce some VM locks.
This patch also introduced atomic ivar cache used by
set/getinlinecache instructions. To make updating ivar cache (IVC),
we changed iv_index_tbl data structure to manage (ID -> entry)
and an entry points serial and index. IVC points to this entry so
that cache update becomes atomically.
generic_ivtbl is a process global table to maintain instance variables
for non T_OBJECT/T_CLASS/... objects. So we need to protect them
for multi-Ractor exection.
Hint: we can make them Ractor local for unshareable objects, but
now it is premature optimization.
This commit introduces Ractor mechanism to run Ruby program in
parallel. See doc/ractor.md for more details about Ractor.
See ticket [Feature #17100] to see the implementation details
and discussions.
[Feature #17100]
This commit does not complete the implementation. You can find
many bugs on using Ractor. Also the specification will be changed
so that this feature is experimental. You will see a warning when
you make the first Ractor with `Ractor.new`.
I hope this feature can help programmers from thread-safety issues.
A single quote "is representable either by itself or by the escape
sequence", according to ISO/IEC 9899 (checked all versions). So this is
not a bug fix. But the generated output is a bit readable without
backslashes.
Noticed that struct rb_builtin_function is a purely compile-time
constant. MJIT can eliminate some runtime calculations by statically
generate dedicated C code generator for each builtin functions.
which is checked by the first guard. When JIT-inlined cc and operand
cd->cc are different, the JIT-ed code might wrongly dispatch cd->cc even
while class check is done with another cc inlined by JIT.
This fixes SEGV on railsbench.
Use ID instead of GENTRY for gvars.
Global variables are compiled into GENTRY (a pointer to struct
rb_global_entry). This patch replace this GENTRY to ID and
make the code simple.
We need to search GENTRY from ID every time (st_lookup), so
additional overhead will be introduced.
However, the performance of accessing global variables is not
important now a day and this simplicity helps Ractor development.
for opt_* insns.
opt_eq handles rb_obj_equal inside opt_eq, and all other cfunc is
handled by opt_send_without_block. Therefore we can't decide which insn
should be generated by checking whether it's cfunc cc or not.
```
$ benchmark-driver -v --rbenv 'before --jit;after --jit' benchmark/mjit_opt_cc_insns.yml --repeat-count=4
before --jit: ruby 2.8.0dev (2020-06-26T05:21:43Z master 9dbc2294a6) +JIT [x86_64-linux]
after --jit: ruby 2.8.0dev (2020-06-26T06:30:18Z master 75cece1b0b) +JIT [x86_64-linux]
last_commit=Decide JIT-ed insn based on cached cfunc
Calculating -------------------------------------
before --jit after --jit
mjit_nil?(1) 73.878M 74.021M i/s - 40.000M times in 0.541432s 0.540391s
mjit_not(1) 72.635M 74.601M i/s - 40.000M times in 0.550702s 0.536187s
mjit_eq(1, nil) 7.331M 7.445M i/s - 8.000M times in 1.091211s 1.074596s
mjit_eq(nil, 1) 49.450M 64.711M i/s - 8.000M times in 0.161781s 0.123627s
Comparison:
mjit_nil?(1)
after --jit: 74020528.4 i/s
before --jit: 73878185.9 i/s - 1.00x slower
mjit_not(1)
after --jit: 74600882.0 i/s
before --jit: 72634507.6 i/s - 1.03x slower
mjit_eq(1, nil)
after --jit: 7444657.4 i/s
before --jit: 7331304.3 i/s - 1.02x slower
mjit_eq(nil, 1)
after --jit: 64710790.6 i/s
before --jit: 49449507.4 i/s - 1.31x slower
```
only for opt_nil_p and opt_not.
While vm_method_cfunc_is is used for opt_eq too, many fast paths of it
don't call it. So if it's populated, it should generate opt_send,
regardless of cfunc or not. And again, opt_neq isn't relevant due to the
difference in operands.
So opt_nil_p and opt_not are the only variants using vm_method_cfunc_is
like they use.
```
$ benchmark-driver -v --rbenv 'before2 --jit::ruby --jit;before --jit;after --jit' benchmark/mjit_opt_cc_insns.yml --repeat-count=4
before2 --jit: ruby 2.8.0dev (2020-06-22T08:37:37Z master 3238641750) +JIT [x86_64-linux]
before --jit: ruby 2.8.0dev (2020-06-23T01:01:24Z master 9ce2066209) +JIT [x86_64-linux]
after --jit: ruby 2.8.0dev (2020-06-23T06:58:37Z master 17e9df3157) +JIT [x86_64-linux]
last_commit=Avoid generating opt_send with cfunc cc with JIT
Calculating -------------------------------------
before2 --jit before --jit after --jit
mjit_nil?(1) 54.204M 75.536M 75.031M i/s - 40.000M times in 0.737947s 0.529548s 0.533110s
mjit_not(1) 53.822M 70.921M 71.920M i/s - 40.000M times in 0.743195s 0.564007s 0.556171s
mjit_eq(1, nil) 7.367M 6.496M 7.331M i/s - 8.000M times in 1.085882s 1.231470s 1.091327s
Comparison:
mjit_nil?(1)
before --jit: 75536059.3 i/s
after --jit: 75031409.4 i/s - 1.01x slower
before2 --jit: 54204431.6 i/s - 1.39x slower
mjit_not(1)
after --jit: 71920324.1 i/s
before --jit: 70921063.1 i/s - 1.01x slower
before2 --jit: 53821697.6 i/s - 1.34x slower
mjit_eq(1, nil)
before2 --jit: 7367280.0 i/s
after --jit: 7330527.4 i/s - 1.01x slower
before --jit: 6496302.8 i/s - 1.13x slower
```
because opt_nil/opt_not/opt_eq populates cc even when it doesn't
fallback to opt_send_without_block because of vm_method_cfunc_is.
```
$ benchmark-driver -v --rbenv 'before --jit;after --jit' benchmark/mjit_opt_cc_insns.yml --repeat-count=4
before --jit: ruby 2.8.0dev (2020-06-22T08:11:24Z master d231b8f95b) +JIT [x86_64-linux]
after --jit: ruby 2.8.0dev (2020-06-22T08:53:27Z master e1125879ed) +JIT [x86_64-linux]
last_commit=Compile opt_send for opt_* only when cc has ISeq
Calculating -------------------------------------
before --jit after --jit
mjit_nil?(1) 54.106M 73.693M i/s - 40.000M times in 0.739288s 0.542795s
mjit_not(1) 53.398M 74.477M i/s - 40.000M times in 0.749090s 0.537075s
mjit_eq(1, nil) 7.427M 6.497M i/s - 8.000M times in 1.077136s 1.231326s
Comparison:
mjit_nil?(1)
after --jit: 73692594.3 i/s
before --jit: 54106108.4 i/s - 1.36x slower
mjit_not(1)
after --jit: 74477487.9 i/s
before --jit: 53398125.0 i/s - 1.39x slower
mjit_eq(1, nil)
before --jit: 7427105.9 i/s
after --jit: 6497063.0 i/s - 1.14x slower
```
Actually opt_eq becomes slower by this. Maybe it's indeed using
opt_send_without_block, but I'll approach that one in another commit.
Even if local stack optimization is not used and values are written to
VM stack, the stack pointer itself may not be moved properly. So this
should be always moved on JIT cancellation.
By the way it's hard to write a test for this because if we try to
generate an interrupt, it will be a method call and it consumes the
interrupt by itself on popping a frame.