2018-12-27 01:12:09 -05:00
|
|
|
/* -*- C -*-
|
2007-01-16 03:52:22 -05:00
|
|
|
insns.def - YARV instruction definitions
|
|
|
|
|
|
|
|
$Author: $
|
|
|
|
created at: 04/01/01 01:17:55 JST
|
|
|
|
|
* blockinlining.c, compile.c, compile.h, debug.c, debug.h,
id.c, insnhelper.h, insns.def, thread.c, thread_pthread.ci,
thread_pthread.h, thread_win32.ci, thread_win32.h, vm.h,
vm_dump.c, vm_evalbody.ci, vm_opts.h: fix comments and
copyright year.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13920 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-11-13 17:13:04 -05:00
|
|
|
Copyright (C) 2004-2007 Koichi Sasada
|
2018-01-12 03:38:08 -05:00
|
|
|
Massive rewrite by @shyouhei in 2017.
|
2018-01-12 03:38:07 -05:00
|
|
|
*/
|
2018-01-09 20:53:24 -05:00
|
|
|
|
2018-01-12 03:38:08 -05:00
|
|
|
/* Some comments about this file's contents:
|
|
|
|
|
|
|
|
- The new format aims to be editable by C editor of your choice;
|
|
|
|
your mileage might vary of course.
|
|
|
|
|
|
|
|
- Each instructions are in following format:
|
|
|
|
|
|
|
|
DEFINE_INSN
|
|
|
|
instruction_name
|
|
|
|
(type operand, type operand, ..)
|
|
|
|
(pop_values, ..)
|
|
|
|
(return values ..)
|
|
|
|
// attr type name contents..
|
|
|
|
{
|
|
|
|
.. // insn body
|
|
|
|
}
|
|
|
|
|
|
|
|
- Unlike the old format which was line-oriented, you can now place
|
|
|
|
newlines and comments at liberal positions.
|
|
|
|
|
|
|
|
- `DEFINE_INSN` is a keyword.
|
|
|
|
|
|
|
|
- An instruction name must be a valid C identifier.
|
|
|
|
|
|
|
|
- Operands, pop values, return values are series of either variable
|
|
|
|
declarations, keyword `void`, or keyword `...`. They are much
|
|
|
|
like C function declarations.
|
|
|
|
|
|
|
|
- Attribute pragmas are optional, and can include arbitrary C
|
|
|
|
expressions. You can write anything there but as of writing,
|
2018-01-27 08:50:28 -05:00
|
|
|
supported attributes are:
|
|
|
|
|
|
|
|
* sp_inc: Used to dynamically calculate sp increase in
|
|
|
|
`insn_stack_increase`.
|
|
|
|
|
2018-07-25 10:55:43 -04:00
|
|
|
* handles_sp: If it is true, VM deals with sp in the insn.
|
2018-10-28 23:21:22 -04:00
|
|
|
Default is if the instruction takes ISEQ operand or not.
|
2018-07-19 09:25:22 -04:00
|
|
|
|
2018-09-11 05:48:58 -04:00
|
|
|
* leaf: indicates that the instruction is "leaf" i.e. it does
|
2018-10-28 23:21:22 -04:00
|
|
|
not introduce new stack frame on top of it.
|
|
|
|
If an instruction handles sp, that can never be a leaf.
|
2018-09-11 05:48:58 -04:00
|
|
|
|
2018-01-12 03:38:08 -05:00
|
|
|
- Attributes can access operands, but not stack (push/pop) variables.
|
|
|
|
|
|
|
|
- An instruction's body is a pure C block, copied verbatimly into
|
|
|
|
the generated C source code.
|
2018-01-09 20:53:24 -05:00
|
|
|
*/
|
2018-01-09 08:30:29 -05:00
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* nop */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
nop
|
|
|
|
()
|
|
|
|
()
|
|
|
|
()
|
|
|
|
{
|
|
|
|
/* none */
|
|
|
|
}
|
|
|
|
|
|
|
|
/**********************************************************/
|
|
|
|
/* deal with variables */
|
|
|
|
/**********************************************************/
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* Get local variable (pointed by `idx' and `level').
|
2012-10-04 09:52:20 -04:00
|
|
|
'level' indicates the nesting depth from the current block.
|
2007-01-16 03:52:22 -05:00
|
|
|
*/
|
|
|
|
DEFINE_INSN
|
|
|
|
getlocal
|
2012-10-04 09:52:20 -04:00
|
|
|
(lindex_t idx, rb_num_t level)
|
2007-01-16 03:52:22 -05:00
|
|
|
()
|
|
|
|
(VALUE val)
|
|
|
|
{
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
val = *(vm_get_ep(GET_EP(), level) - idx);
|
2017-05-31 02:46:57 -04:00
|
|
|
RB_DEBUG_COUNTER_INC(lvar_get);
|
|
|
|
(void)RB_DEBUG_COUNTER_INC_IF(lvar_get_dynamic, level > 0);
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* Set a local variable (pointed to by 'idx') as val.
|
2012-10-04 09:52:20 -04:00
|
|
|
'level' indicates the nesting depth from the current block.
|
2007-01-16 03:52:22 -05:00
|
|
|
*/
|
|
|
|
DEFINE_INSN
|
|
|
|
setlocal
|
2012-10-04 09:52:20 -04:00
|
|
|
(lindex_t idx, rb_num_t level)
|
2007-01-16 03:52:22 -05:00
|
|
|
(VALUE val)
|
|
|
|
()
|
|
|
|
{
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
vm_env_write(vm_get_ep(GET_EP(), level), -(int)idx, val);
|
2017-05-31 02:46:57 -04:00
|
|
|
RB_DEBUG_COUNTER_INC(lvar_set);
|
|
|
|
(void)RB_DEBUG_COUNTER_INC_IF(lvar_set_dynamic, level > 0);
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* Get a block parameter. */
|
2017-10-24 07:13:49 -04:00
|
|
|
DEFINE_INSN
|
|
|
|
getblockparam
|
|
|
|
(lindex_t idx, rb_num_t level)
|
|
|
|
()
|
|
|
|
(VALUE val)
|
|
|
|
{
|
|
|
|
const VALUE *ep = vm_get_ep(GET_EP(), level);
|
|
|
|
VM_ASSERT(VM_ENV_LOCAL_P(ep));
|
|
|
|
|
|
|
|
if (!VM_ENV_FLAGS(ep, VM_FRAME_FLAG_MODIFIED_BLOCK_PARAM)) {
|
2017-10-27 02:21:50 -04:00
|
|
|
val = rb_vm_bh_to_procval(ec, VM_ENV_BLOCK_HANDLER(ep));
|
2017-10-24 07:13:49 -04:00
|
|
|
vm_env_write(ep, -(int)idx, val);
|
|
|
|
VM_ENV_FLAGS_SET(ep, VM_FRAME_FLAG_MODIFIED_BLOCK_PARAM);
|
|
|
|
}
|
|
|
|
else {
|
|
|
|
val = *(ep - idx);
|
|
|
|
RB_DEBUG_COUNTER_INC(lvar_get);
|
|
|
|
(void)RB_DEBUG_COUNTER_INC_IF(lvar_get_dynamic, level > 0);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* Set block parameter. */
|
2017-10-24 07:13:49 -04:00
|
|
|
DEFINE_INSN
|
|
|
|
setblockparam
|
|
|
|
(lindex_t idx, rb_num_t level)
|
|
|
|
(VALUE val)
|
|
|
|
()
|
|
|
|
{
|
|
|
|
const VALUE *ep = vm_get_ep(GET_EP(), level);
|
|
|
|
VM_ASSERT(VM_ENV_LOCAL_P(ep));
|
|
|
|
|
|
|
|
vm_env_write(ep, -(int)idx, val);
|
|
|
|
RB_DEBUG_COUNTER_INC(lvar_set);
|
|
|
|
(void)RB_DEBUG_COUNTER_INC_IF(lvar_set_dynamic, level > 0);
|
|
|
|
|
|
|
|
VM_ENV_FLAGS_SET(ep, VM_FRAME_FLAG_MODIFIED_BLOCK_PARAM);
|
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* Get special proxy object which only responds to `call` method if the block parameter
|
2018-01-07 14:18:49 -05:00
|
|
|
represents a iseq/ifunc block. Otherwise, same as `getblockparam`.
|
|
|
|
*/
|
|
|
|
DEFINE_INSN
|
|
|
|
getblockparamproxy
|
|
|
|
(lindex_t idx, rb_num_t level)
|
|
|
|
()
|
|
|
|
(VALUE val)
|
|
|
|
{
|
|
|
|
const VALUE *ep = vm_get_ep(GET_EP(), level);
|
|
|
|
VM_ASSERT(VM_ENV_LOCAL_P(ep));
|
|
|
|
|
|
|
|
if (!VM_ENV_FLAGS(ep, VM_FRAME_FLAG_MODIFIED_BLOCK_PARAM)) {
|
|
|
|
VALUE block_handler = VM_ENV_BLOCK_HANDLER(ep);
|
|
|
|
|
|
|
|
if (block_handler) {
|
|
|
|
switch (vm_block_handler_type(block_handler)) {
|
|
|
|
case block_handler_type_iseq:
|
|
|
|
case block_handler_type_ifunc:
|
|
|
|
val = rb_block_param_proxy;
|
|
|
|
break;
|
|
|
|
case block_handler_type_symbol:
|
|
|
|
val = rb_sym_to_proc(VM_BH_TO_SYMBOL(block_handler));
|
|
|
|
goto INSN_LABEL(set);
|
|
|
|
case block_handler_type_proc:
|
|
|
|
val = VM_BH_TO_PROC(block_handler);
|
|
|
|
goto INSN_LABEL(set);
|
|
|
|
default:
|
|
|
|
VM_UNREACHABLE(getblockparamproxy);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
else {
|
|
|
|
val = Qnil;
|
|
|
|
INSN_LABEL(set):
|
|
|
|
vm_env_write(ep, -(int)idx, val);
|
|
|
|
VM_ENV_FLAGS_SET(ep, VM_FRAME_FLAG_MODIFIED_BLOCK_PARAM);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
else {
|
|
|
|
val = *(ep - idx);
|
|
|
|
RB_DEBUG_COUNTER_INC(lvar_get);
|
|
|
|
(void)RB_DEBUG_COUNTER_INC_IF(lvar_get_dynamic, level > 0);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* Get value of special local variable ($~, $_, ..). */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
getspecial
|
2012-12-10 01:11:16 -05:00
|
|
|
(rb_num_t key, rb_num_t type)
|
2007-01-16 03:52:22 -05:00
|
|
|
()
|
|
|
|
(VALUE val)
|
|
|
|
{
|
2017-10-27 02:21:50 -04:00
|
|
|
val = vm_getspecial(ec, GET_LEP(), key, type);
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* Set value of special local variable ($~, $_, ...) to obj. */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
setspecial
|
2012-12-10 01:11:16 -05:00
|
|
|
(rb_num_t key)
|
2007-01-16 03:52:22 -05:00
|
|
|
(VALUE obj)
|
|
|
|
()
|
|
|
|
{
|
2017-10-27 02:21:50 -04:00
|
|
|
lep_svar_set(ec, GET_LEP(), key, obj);
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* Get value of instance variable id of self. */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
getinstancevariable
|
2009-07-13 00:44:20 -04:00
|
|
|
(ID id, IC ic)
|
2007-01-16 03:52:22 -05:00
|
|
|
()
|
|
|
|
(VALUE val)
|
2018-09-11 05:48:58 -04:00
|
|
|
/* "instance variable not initialized" warning can be hooked. */
|
|
|
|
// attr bool leaf = false; /* has rb_warning() */
|
2007-01-16 03:52:22 -05:00
|
|
|
{
|
2012-10-16 13:07:23 -04:00
|
|
|
val = vm_getinstancevariable(GET_SELF(), id, ic);
|
2007-02-04 14:17:33 -05:00
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* Set value of instance variable id of self to val. */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
setinstancevariable
|
2009-09-06 03:40:24 -04:00
|
|
|
(ID id, IC ic)
|
2007-01-16 03:52:22 -05:00
|
|
|
(VALUE val)
|
|
|
|
()
|
|
|
|
{
|
2012-10-16 13:07:23 -04:00
|
|
|
vm_setinstancevariable(GET_SELF(), id, val, ic);
|
2007-02-04 14:17:33 -05:00
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* Get value of class variable id of klass as val. */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
getclassvariable
|
|
|
|
(ID id)
|
|
|
|
()
|
|
|
|
(VALUE val)
|
2018-09-11 05:48:58 -04:00
|
|
|
/* "class variable access from toplevel" warning can be hooked. */
|
|
|
|
// attr bool leaf = false; /* has rb_warning() */
|
2007-01-16 03:52:22 -05:00
|
|
|
{
|
2015-03-08 17:22:43 -04:00
|
|
|
val = rb_cvar_get(vm_get_cvar_base(rb_vm_get_cref(GET_EP()), GET_CFP()), id);
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* Set value of class variable id of klass as val. */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
setclassvariable
|
2007-02-04 14:15:38 -05:00
|
|
|
(ID id)
|
2007-01-16 03:52:22 -05:00
|
|
|
(VALUE val)
|
|
|
|
()
|
2018-09-11 05:48:58 -04:00
|
|
|
/* "class variable access from toplevel" warning can be hooked. */
|
|
|
|
// attr bool leaf = false; /* has rb_warning() */
|
2007-01-16 03:52:22 -05:00
|
|
|
{
|
2016-09-08 00:44:51 -04:00
|
|
|
vm_ensure_not_refinement_module(GET_SELF());
|
2015-03-08 17:22:43 -04:00
|
|
|
rb_cvar_set(vm_get_cvar_base(rb_vm_get_cref(GET_EP()), GET_CFP()), id, val);
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* Get constant variable id. If klass is Qnil, constants
|
2018-07-27 02:28:14 -04:00
|
|
|
are searched in the current scope. Otherwise, get constant under klass
|
2007-01-16 03:52:22 -05:00
|
|
|
class or module.
|
|
|
|
*/
|
|
|
|
DEFINE_INSN
|
|
|
|
getconstant
|
|
|
|
(ID id)
|
|
|
|
(VALUE klass)
|
|
|
|
(VALUE val)
|
2018-09-11 05:48:58 -04:00
|
|
|
/* getconstant can kick autoload */
|
|
|
|
// attr bool leaf = false; /* has rb_autoload_load() */
|
2007-01-16 03:52:22 -05:00
|
|
|
{
|
2017-10-27 02:21:50 -04:00
|
|
|
val = vm_get_ev_const(ec, klass, id, 0);
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
|
2018-07-27 02:28:14 -04:00
|
|
|
/* Set constant variable id under cbase class or module.
|
2007-01-16 03:52:22 -05:00
|
|
|
*/
|
|
|
|
DEFINE_INSN
|
|
|
|
setconstant
|
|
|
|
(ID id)
|
2008-05-13 22:31:28 -04:00
|
|
|
(VALUE val, VALUE cbase)
|
2007-01-16 03:52:22 -05:00
|
|
|
()
|
2018-09-11 05:48:58 -04:00
|
|
|
/* Assigning an object to a constant is basically a leaf operation.
|
|
|
|
* The problem is, assigning a Module instance to a constant _names_
|
|
|
|
* that module. Naming involves string manipulations, which are
|
|
|
|
* method calls. */
|
|
|
|
// attr bool leaf = false; /* has StringValue() */
|
2007-01-16 03:52:22 -05:00
|
|
|
{
|
2008-05-13 22:31:28 -04:00
|
|
|
vm_check_if_namespace(cbase);
|
2016-09-08 00:44:51 -04:00
|
|
|
vm_ensure_not_refinement_module(GET_SELF());
|
2008-05-13 22:31:28 -04:00
|
|
|
rb_const_set(cbase, id, val);
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* get global variable id. */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
getglobal
|
|
|
|
(GENTRY entry)
|
|
|
|
()
|
|
|
|
(VALUE val)
|
2018-09-11 05:48:58 -04:00
|
|
|
// attr bool leaf = leafness_of_getglobal(entry);
|
2007-01-16 03:52:22 -05:00
|
|
|
{
|
2018-12-27 20:06:04 -05:00
|
|
|
struct rb_global_entry *gentry = (void *)entry;
|
|
|
|
val = rb_gvar_get(gentry);
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* set global variable id as val. */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
setglobal
|
|
|
|
(GENTRY entry)
|
|
|
|
(VALUE val)
|
|
|
|
()
|
2018-09-11 05:48:58 -04:00
|
|
|
// attr bool leaf = leafness_of_setglobal(entry);
|
2007-01-16 03:52:22 -05:00
|
|
|
{
|
2018-12-27 20:06:04 -05:00
|
|
|
struct rb_global_entry *gentry = (void *)entry;
|
|
|
|
rb_gvar_set(gentry, val);
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
|
|
|
|
/**********************************************************/
|
|
|
|
/* deal with values */
|
|
|
|
/**********************************************************/
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* put nil to stack. */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
putnil
|
|
|
|
()
|
|
|
|
()
|
|
|
|
(VALUE val)
|
|
|
|
{
|
|
|
|
val = Qnil;
|
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* put self. */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
putself
|
|
|
|
()
|
|
|
|
()
|
|
|
|
(VALUE val)
|
|
|
|
{
|
|
|
|
val = GET_SELF();
|
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* put some object.
|
2008-06-30 23:05:58 -04:00
|
|
|
i.e. Fixnum, true, false, nil, and so on.
|
2008-05-13 22:31:28 -04:00
|
|
|
*/
|
|
|
|
DEFINE_INSN
|
2008-06-30 23:05:58 -04:00
|
|
|
putobject
|
|
|
|
(VALUE val)
|
2008-05-13 22:31:28 -04:00
|
|
|
()
|
|
|
|
(VALUE val)
|
|
|
|
{
|
2008-06-30 23:05:58 -04:00
|
|
|
/* */
|
2008-05-13 22:31:28 -04:00
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* put special object. "value_type" is for expansion. */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
2008-06-30 23:05:58 -04:00
|
|
|
putspecialobject
|
|
|
|
(rb_num_t value_type)
|
2007-01-16 03:52:22 -05:00
|
|
|
()
|
|
|
|
(VALUE val)
|
|
|
|
{
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
enum vm_special_object_type type;
|
|
|
|
|
|
|
|
type = (enum vm_special_object_type)value_type;
|
|
|
|
val = vm_get_special_object(GET_EP(), type);
|
2008-06-30 23:05:58 -04:00
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* put iseq value. */
|
2008-06-30 23:05:58 -04:00
|
|
|
DEFINE_INSN
|
|
|
|
putiseq
|
|
|
|
(ISEQ iseq)
|
|
|
|
()
|
|
|
|
(VALUE ret)
|
2018-10-28 23:21:22 -04:00
|
|
|
// attr bool handles_sp = false; /* of course it doesn't */
|
2008-06-30 23:05:58 -04:00
|
|
|
{
|
2015-07-21 18:52:59 -04:00
|
|
|
ret = (VALUE)iseq;
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* put string val. string will be copied. */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
putstring
|
2007-07-02 08:49:35 -04:00
|
|
|
(VALUE str)
|
2007-01-16 03:52:22 -05:00
|
|
|
()
|
|
|
|
(VALUE val)
|
|
|
|
{
|
2009-02-18 00:33:36 -05:00
|
|
|
val = rb_str_resurrect(str);
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* put concatenate strings */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
concatstrings
|
2007-05-03 05:09:14 -04:00
|
|
|
(rb_num_t num)
|
2007-01-16 03:52:22 -05:00
|
|
|
(...)
|
2018-01-12 03:38:07 -05:00
|
|
|
(VALUE val)
|
2018-11-07 02:16:50 -05:00
|
|
|
// attr rb_snum_t sp_inc = 1 - (rb_snum_t)num;
|
2007-01-16 03:52:22 -05:00
|
|
|
{
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
val = rb_str_concat_literals(num, STACK_ADDR_FROM_TOP(num));
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* push the result of to_s. */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
tostring
|
|
|
|
()
|
2017-09-17 22:27:13 -04:00
|
|
|
(VALUE val, VALUE str)
|
2007-01-16 03:52:22 -05:00
|
|
|
(VALUE val)
|
|
|
|
{
|
2017-09-17 22:27:13 -04:00
|
|
|
val = rb_obj_as_string_result(str, val);
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* Freeze (dynamically) created strings. if debug_info is given, set it. */
|
2015-11-20 18:49:31 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
freezestring
|
|
|
|
(VALUE debug_info)
|
|
|
|
(VALUE str)
|
|
|
|
(VALUE str)
|
|
|
|
{
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
vm_freezestring(str, debug_info);
|
2015-11-20 18:49:31 -05:00
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* compile str to Regexp and push it.
|
2016-01-09 21:07:00 -05:00
|
|
|
opt is the option for the Regexp.
|
2007-01-16 03:52:22 -05:00
|
|
|
*/
|
|
|
|
DEFINE_INSN
|
|
|
|
toregexp
|
2008-01-29 03:03:51 -05:00
|
|
|
(rb_num_t opt, rb_num_t cnt)
|
|
|
|
(...)
|
2018-01-12 03:38:07 -05:00
|
|
|
(VALUE val)
|
2018-09-11 05:48:58 -04:00
|
|
|
/* This instruction has StringValue(), which is a method call. But it
|
|
|
|
* seems that path is never covered. */
|
|
|
|
// attr bool leaf = true; /* yes it is */
|
2018-11-07 02:16:50 -05:00
|
|
|
// attr rb_snum_t sp_inc = 1 - (rb_snum_t)cnt;
|
2007-01-16 03:52:22 -05:00
|
|
|
{
|
2017-04-20 06:32:08 -04:00
|
|
|
const VALUE ary = rb_ary_tmp_new_from_values(0, cnt, STACK_ADDR_FROM_TOP(cnt));
|
2009-06-30 03:46:44 -04:00
|
|
|
val = rb_reg_new_ary(ary, (int)opt);
|
2009-02-11 00:46:17 -05:00
|
|
|
rb_ary_clear(ary);
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* intern str to Symbol and push it. */
|
2017-09-18 01:16:37 -04:00
|
|
|
DEFINE_INSN
|
|
|
|
intern
|
|
|
|
()
|
|
|
|
(VALUE str)
|
|
|
|
(VALUE sym)
|
|
|
|
{
|
|
|
|
sym = rb_str_intern(str);
|
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* put new array initialized with num values on the stack. */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
newarray
|
2007-05-03 05:09:14 -04:00
|
|
|
(rb_num_t num)
|
2007-01-16 03:52:22 -05:00
|
|
|
(...)
|
2018-01-12 03:38:07 -05:00
|
|
|
(VALUE val)
|
2018-11-07 02:16:50 -05:00
|
|
|
// attr rb_snum_t sp_inc = 1 - (rb_snum_t)num;
|
2007-01-16 03:52:22 -05:00
|
|
|
{
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
val = rb_ary_new4(num, STACK_ADDR_FROM_TOP(num));
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* dup array */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
duparray
|
|
|
|
(VALUE ary)
|
|
|
|
()
|
|
|
|
(VALUE val)
|
|
|
|
{
|
2019-01-09 18:04:00 -05:00
|
|
|
RUBY_DTRACE_CREATE_HOOK(ARRAY, RARRAY_LEN(ary));
|
2009-02-18 00:33:36 -05:00
|
|
|
val = rb_ary_resurrect(ary);
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
|
Speed up hash literals by duping
This commit replaces the `newhashfromarray` instruction with a `duphash`
instruction. Instead of allocating a new hash from an array stored in
the Instruction Sequences, store a hash directly in the instruction
sequences and dup it on execution.
== Instruction sequence changes ==
```ruby
code = <<-eorby
{ "foo" => "bar", "baz" => "lol" }
eorby
insns = RubyVM::InstructionSequence.compile(code, __FILE__, nil, 0, frozen_string_literal: true)
puts insns.disasm
```
On Ruby 2.5:
```
== disasm: #<ISeq:<compiled>@test.rb:0 (0,0)-(0,36)>====================
0000 putobject "foo"
0002 putobject "bar"
0004 putobject "baz"
0006 putobject "lol"
0008 newhash 4
0010 leave
```
Ruby 2.6@r66174 3b6321083a2e3525da3b34d08a0b68bac094bd7f:
```
$ ./ruby test.rb
== disasm: #<ISeq:<compiled>@test.rb:0 (0,0)-(0,36)> (catch: FALSE)
0000 newhashfromarray 2, ["foo", "bar", "baz", "lol"]
0003 leave
```
Ruby 2.6 + This commit:
```
$ ./ruby test.rb
== disasm: #<ISeq:<compiled>@test.rb:0 (0,0)-(0,36)> (catch: FALSE)
0000 duphash {"foo"=>"bar", "baz"=>"lol"}
0002 leave
```
== Benchmark Results ==
Compared to 2.5.3:
```
$ make benchmark ITEM=hash_literal_small COMPARE_RUBY=/Users/aaron/.rbenv/versions/2.5.3/bin/ruby
generating known_errors.inc
known_errors.inc unchanged
./revision.h unchanged
/Users/aaron/.rbenv/shims/ruby --disable=gems -rrubygems -I./benchmark/lib ./benchmark/benchmark-driver/exe/benchmark-driver \
--executables="compare-ruby::/Users/aaron/.rbenv/versions/2.5.3/bin/ruby -I.ext/common --disable-gem" \
--executables="built-ruby::./miniruby -I./lib -I. -I.ext/common -r./prelude --disable-gem" \
$(find ./benchmark -maxdepth 1 -name '*hash_literal_small*.yml' -o -name '*hash_literal_small*.rb' | sort)
Calculating -------------------------------------
compare-ruby built-ruby
hash_literal_small2 1.498 1.877 i/s - 1.000 times in 0.667581s 0.532656s
hash_literal_small4 1.197 1.642 i/s - 1.000 times in 0.835375s 0.609160s
hash_literal_small8 0.620 1.215 i/s - 1.000 times in 1.611638s 0.823090s
Comparison:
hash_literal_small2
built-ruby: 1.9 i/s
compare-ruby: 1.5 i/s - 1.25x slower
hash_literal_small4
built-ruby: 1.6 i/s
compare-ruby: 1.2 i/s - 1.37x slower
hash_literal_small8
built-ruby: 1.2 i/s
compare-ruby: 0.6 i/s - 1.96x slower
```
Compared to r66255
```
$ make benchmark ITEM=hash_literal_small COMPARE_RUBY=/Users/aaron/.rbenv/versions/ruby-trunk/bin/ruby
generating known_errors.inc
known_errors.inc unchanged
./revision.h unchanged
/Users/aaron/.rbenv/shims/ruby --disable=gems -rrubygems -I./benchmark/lib ./benchmark/benchmark-driver/exe/benchmark-driver \
--executables="compare-ruby::/Users/aaron/.rbenv/versions/ruby-trunk/bin/ruby -I.ext/common --disable-gem" \
--executables="built-ruby::./miniruby -I./lib -I. -I.ext/common -r./prelude --disable-gem" \
$(find ./benchmark -maxdepth 1 -name '*hash_literal_small*.yml' -o -name '*hash_literal_small*.rb' | sort)
Calculating -------------------------------------
compare-ruby built-ruby
hash_literal_small2 1.567 1.831 i/s - 1.000 times in 0.638056s 0.546039s
hash_literal_small4 1.298 1.652 i/s - 1.000 times in 0.770214s 0.605182s
hash_literal_small8 0.873 1.216 i/s - 1.000 times in 1.145304s 0.822047s
Comparison:
hash_literal_small2
built-ruby: 1.8 i/s
compare-ruby: 1.6 i/s - 1.17x slower
hash_literal_small4
built-ruby: 1.7 i/s
compare-ruby: 1.3 i/s - 1.27x slower
hash_literal_small8
built-ruby: 1.2 i/s
compare-ruby: 0.9 i/s - 1.39x slower
```
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66258 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-12-06 13:28:21 -05:00
|
|
|
/* dup hash */
|
|
|
|
DEFINE_INSN
|
|
|
|
duphash
|
|
|
|
(VALUE hash)
|
|
|
|
()
|
|
|
|
(VALUE val)
|
|
|
|
{
|
2019-01-09 18:04:00 -05:00
|
|
|
RUBY_DTRACE_CREATE_HOOK(HASH, RHASH_SIZE(hash) << 1);
|
2018-12-20 02:17:55 -05:00
|
|
|
val = rb_hash_resurrect(hash);
|
Speed up hash literals by duping
This commit replaces the `newhashfromarray` instruction with a `duphash`
instruction. Instead of allocating a new hash from an array stored in
the Instruction Sequences, store a hash directly in the instruction
sequences and dup it on execution.
== Instruction sequence changes ==
```ruby
code = <<-eorby
{ "foo" => "bar", "baz" => "lol" }
eorby
insns = RubyVM::InstructionSequence.compile(code, __FILE__, nil, 0, frozen_string_literal: true)
puts insns.disasm
```
On Ruby 2.5:
```
== disasm: #<ISeq:<compiled>@test.rb:0 (0,0)-(0,36)>====================
0000 putobject "foo"
0002 putobject "bar"
0004 putobject "baz"
0006 putobject "lol"
0008 newhash 4
0010 leave
```
Ruby 2.6@r66174 3b6321083a2e3525da3b34d08a0b68bac094bd7f:
```
$ ./ruby test.rb
== disasm: #<ISeq:<compiled>@test.rb:0 (0,0)-(0,36)> (catch: FALSE)
0000 newhashfromarray 2, ["foo", "bar", "baz", "lol"]
0003 leave
```
Ruby 2.6 + This commit:
```
$ ./ruby test.rb
== disasm: #<ISeq:<compiled>@test.rb:0 (0,0)-(0,36)> (catch: FALSE)
0000 duphash {"foo"=>"bar", "baz"=>"lol"}
0002 leave
```
== Benchmark Results ==
Compared to 2.5.3:
```
$ make benchmark ITEM=hash_literal_small COMPARE_RUBY=/Users/aaron/.rbenv/versions/2.5.3/bin/ruby
generating known_errors.inc
known_errors.inc unchanged
./revision.h unchanged
/Users/aaron/.rbenv/shims/ruby --disable=gems -rrubygems -I./benchmark/lib ./benchmark/benchmark-driver/exe/benchmark-driver \
--executables="compare-ruby::/Users/aaron/.rbenv/versions/2.5.3/bin/ruby -I.ext/common --disable-gem" \
--executables="built-ruby::./miniruby -I./lib -I. -I.ext/common -r./prelude --disable-gem" \
$(find ./benchmark -maxdepth 1 -name '*hash_literal_small*.yml' -o -name '*hash_literal_small*.rb' | sort)
Calculating -------------------------------------
compare-ruby built-ruby
hash_literal_small2 1.498 1.877 i/s - 1.000 times in 0.667581s 0.532656s
hash_literal_small4 1.197 1.642 i/s - 1.000 times in 0.835375s 0.609160s
hash_literal_small8 0.620 1.215 i/s - 1.000 times in 1.611638s 0.823090s
Comparison:
hash_literal_small2
built-ruby: 1.9 i/s
compare-ruby: 1.5 i/s - 1.25x slower
hash_literal_small4
built-ruby: 1.6 i/s
compare-ruby: 1.2 i/s - 1.37x slower
hash_literal_small8
built-ruby: 1.2 i/s
compare-ruby: 0.6 i/s - 1.96x slower
```
Compared to r66255
```
$ make benchmark ITEM=hash_literal_small COMPARE_RUBY=/Users/aaron/.rbenv/versions/ruby-trunk/bin/ruby
generating known_errors.inc
known_errors.inc unchanged
./revision.h unchanged
/Users/aaron/.rbenv/shims/ruby --disable=gems -rrubygems -I./benchmark/lib ./benchmark/benchmark-driver/exe/benchmark-driver \
--executables="compare-ruby::/Users/aaron/.rbenv/versions/ruby-trunk/bin/ruby -I.ext/common --disable-gem" \
--executables="built-ruby::./miniruby -I./lib -I. -I.ext/common -r./prelude --disable-gem" \
$(find ./benchmark -maxdepth 1 -name '*hash_literal_small*.yml' -o -name '*hash_literal_small*.rb' | sort)
Calculating -------------------------------------
compare-ruby built-ruby
hash_literal_small2 1.567 1.831 i/s - 1.000 times in 0.638056s 0.546039s
hash_literal_small4 1.298 1.652 i/s - 1.000 times in 0.770214s 0.605182s
hash_literal_small8 0.873 1.216 i/s - 1.000 times in 1.145304s 0.822047s
Comparison:
hash_literal_small2
built-ruby: 1.8 i/s
compare-ruby: 1.6 i/s - 1.17x slower
hash_literal_small4
built-ruby: 1.7 i/s
compare-ruby: 1.3 i/s - 1.27x slower
hash_literal_small8
built-ruby: 1.2 i/s
compare-ruby: 0.9 i/s - 1.39x slower
```
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66258 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-12-06 13:28:21 -05:00
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* if TOS is an array expand, expand it to num objects.
|
2016-01-09 21:07:00 -05:00
|
|
|
if the number of the array is less than num, push nils to fill.
|
|
|
|
if it is greater than num, exceeding elements are dropped.
|
|
|
|
unless TOS is an array, push num - 1 nils.
|
|
|
|
if flags is non-zero, push the array of the rest elements.
|
|
|
|
flag: 0x01 - rest args array
|
|
|
|
flag: 0x02 - for postarg
|
|
|
|
flag: 0x04 - reverse?
|
2007-01-16 03:52:22 -05:00
|
|
|
*/
|
|
|
|
DEFINE_INSN
|
|
|
|
expandarray
|
2007-05-03 05:09:14 -04:00
|
|
|
(rb_num_t num, rb_num_t flag)
|
2007-01-16 03:52:22 -05:00
|
|
|
(..., VALUE ary)
|
2018-01-12 03:38:07 -05:00
|
|
|
(...)
|
2018-09-11 05:48:58 -04:00
|
|
|
// attr bool leaf = false; /* has rb_check_array_type() */
|
2018-11-07 03:03:10 -05:00
|
|
|
// attr rb_snum_t sp_inc = (rb_snum_t)num - 1 + (flag & 1 ? 1 : 0);
|
2007-01-16 03:52:22 -05:00
|
|
|
{
|
2018-07-19 09:25:22 -04:00
|
|
|
vm_expandarray(GET_SP(), ary, num, (int)flag);
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* concat two arrays */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
concatarray
|
|
|
|
()
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
(VALUE ary1, VALUE ary2)
|
2007-01-16 03:52:22 -05:00
|
|
|
(VALUE ary)
|
2018-09-11 05:48:58 -04:00
|
|
|
// attr bool leaf = false; /* has rb_check_array_type() */
|
2007-01-16 03:52:22 -05:00
|
|
|
{
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
ary = vm_concat_array(ary1, ary2);
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* call to_a on array ary to splat */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
splatarray
|
|
|
|
(VALUE flag)
|
|
|
|
(VALUE ary)
|
|
|
|
(VALUE obj)
|
2018-09-11 05:48:58 -04:00
|
|
|
// attr bool leaf = false; /* has rb_check_array_type() */
|
2007-01-16 03:52:22 -05:00
|
|
|
{
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
obj = vm_splat_array(flag, ary);
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* put new Hash from n elements. n must be an even number. */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
newhash
|
2007-05-03 05:09:14 -04:00
|
|
|
(rb_num_t num)
|
2007-01-16 03:52:22 -05:00
|
|
|
(...)
|
2018-01-12 03:38:07 -05:00
|
|
|
(VALUE val)
|
2018-09-11 05:48:58 -04:00
|
|
|
// attr bool leaf = false; /* has rb_hash_key_str() */
|
2018-11-07 02:16:50 -05:00
|
|
|
// attr rb_snum_t sp_inc = 1 - (rb_snum_t)num;
|
2007-01-16 03:52:22 -05:00
|
|
|
{
|
2015-10-29 01:32:57 -04:00
|
|
|
RUBY_DTRACE_CREATE_HOOK(HASH, num);
|
* probes.d: add DTrace probe declarations. [ruby-core:27448]
* array.c (empty_ary_alloc, ary_new): added array create DTrace probe.
* compile.c (rb_insns_name): allowing DTrace probes to access
instruction sequence name.
* Makefile.in: translate probes.d file to appropriate header file.
* common.mk: declare dependencies on the DTrace header.
* configure.in: add a test for existence of DTrace.
* eval.c (setup_exception): add a probe for when an exception is
raised.
* gc.c: Add DTrace probes for mark begin and end, and sweep begin and
end.
* hash.c (empty_hash_alloc): Add a probe for hash allocation.
* insns.def: Add probes for function entry and return.
* internal.h: function declaration for compile.c change.
* load.c (rb_f_load): add probes for `load` entry and exit, require
entry and exit, and wrapping search_required for load path search.
* object.c (rb_obj_alloc): added a probe for general object creation.
* parse.y (yycompile0): added a probe around parse and compile phase.
* string.c (empty_str_alloc, str_new): DTrace probes for string
allocation.
* test/dtrace/*: tests for DTrace probes.
* vm.c (vm_invoke_proc): add probes for function return on exception
raise, hash create, and instruction sequence execution.
* vm_core.h: add probe declarations for function entry and exit.
* vm_dump.c: add probes header file.
* vm_eval.c (vm_call0_cfunc, vm_call0_cfunc_with_frame): add probe on
function entry and return.
* vm_exec.c: expose instruction number to instruction name function.
* vm_insnshelper.c: add function entry and exit probes for cfunc
methods.
* vm_insnhelper.h: vm usage information is always collected, so
uncomment the functions.
12 19:14:50 2012 Akinori MUSHA <knu@iDaemons.org>
* configure.in (isinf, isnan): isinf() and isnan() are macros on
DragonFly which cannot be found by AC_REPLACE_FUNCS(). This
workaround enforces the fact that they exist on DragonFly.
12 15:59:38 2012 Shugo Maeda <shugo@ruby-lang.org>
* vm_core.h (rb_call_info_t::refinements), compile.c (new_callinfo),
vm_insnhelper.c (vm_search_method): revert r37616 because it's too
slow. [ruby-dev:46477]
* test/ruby/test_refinement.rb (test_inline_method_cache): skip
the test until the bug is fixed efficiently.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@37631 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2012-11-12 16:52:12 -05:00
|
|
|
|
2017-09-05 00:48:19 -04:00
|
|
|
val = rb_hash_new_with_size(num / 2);
|
2017-04-23 21:40:51 -04:00
|
|
|
|
2017-04-27 00:21:04 -04:00
|
|
|
if (num) {
|
|
|
|
rb_hash_bulk_insert(num, STACK_ADDR_FROM_TOP(num), val);
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* put new Range object.(Range.new(low, high, flag)) */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
newrange
|
2007-05-03 05:09:14 -04:00
|
|
|
(rb_num_t flag)
|
2007-01-16 03:52:22 -05:00
|
|
|
(VALUE low, VALUE high)
|
|
|
|
(VALUE val)
|
2018-09-11 05:48:58 -04:00
|
|
|
/* rb_range_new() exercises "bad value for range" check. */
|
|
|
|
// attr bool leaf = false; /* see also: range.c:range_init() */
|
2007-01-16 03:52:22 -05:00
|
|
|
{
|
2009-06-30 03:46:44 -04:00
|
|
|
val = rb_range_new(low, high, (int)flag);
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
|
|
|
|
/**********************************************************/
|
|
|
|
/* deal with stack operation */
|
|
|
|
/**********************************************************/
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* pop from stack. */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
pop
|
|
|
|
()
|
|
|
|
(VALUE val)
|
|
|
|
()
|
|
|
|
{
|
2011-11-27 03:24:19 -05:00
|
|
|
(void)val;
|
2007-01-16 03:52:22 -05:00
|
|
|
/* none */
|
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* duplicate stack top. */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
dup
|
|
|
|
()
|
|
|
|
(VALUE val)
|
|
|
|
(VALUE val1, VALUE val2)
|
|
|
|
{
|
|
|
|
val1 = val2 = val;
|
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* duplicate stack top n elements */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
dupn
|
2007-05-03 05:09:14 -04:00
|
|
|
(rb_num_t n)
|
2007-01-16 03:52:22 -05:00
|
|
|
(...)
|
2018-01-12 03:38:07 -05:00
|
|
|
(...)
|
|
|
|
// attr rb_snum_t sp_inc = n;
|
2007-01-16 03:52:22 -05:00
|
|
|
{
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
void *dst = GET_SP();
|
|
|
|
void *src = STACK_ADDR_FROM_TOP(n);
|
|
|
|
|
|
|
|
MEMCPY(dst, src, VALUE, n);
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* swap top 2 vals */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
swap
|
|
|
|
()
|
|
|
|
(VALUE val, VALUE obj)
|
|
|
|
(VALUE obj, VALUE val)
|
|
|
|
{
|
|
|
|
/* none */
|
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* reverse stack top N order. */
|
2015-02-24 19:20:39 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
reverse
|
|
|
|
(rb_num_t n)
|
|
|
|
(...)
|
2018-01-12 03:38:07 -05:00
|
|
|
(...)
|
|
|
|
// attr rb_snum_t sp_inc = 0;
|
2015-02-24 19:20:39 -05:00
|
|
|
{
|
|
|
|
rb_num_t i;
|
|
|
|
VALUE *sp = STACK_ADDR_FROM_TOP(n);
|
|
|
|
|
|
|
|
for (i=0; i<n/2; i++) {
|
|
|
|
VALUE v0 = sp[i];
|
|
|
|
VALUE v1 = TOPN(i);
|
|
|
|
sp[i] = v1;
|
|
|
|
TOPN(i) = v0;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* for stack caching. */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
reput
|
|
|
|
()
|
|
|
|
(..., VALUE val)
|
2018-01-12 03:38:07 -05:00
|
|
|
(VALUE val)
|
|
|
|
// attr rb_snum_t sp_inc = 0;
|
2007-01-16 03:52:22 -05:00
|
|
|
{
|
|
|
|
/* none */
|
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* get nth stack value from stack top */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
topn
|
2007-05-03 05:09:14 -04:00
|
|
|
(rb_num_t n)
|
2007-01-16 03:52:22 -05:00
|
|
|
(...)
|
2018-01-12 03:38:07 -05:00
|
|
|
(VALUE val)
|
|
|
|
// attr rb_snum_t sp_inc = 1;
|
2007-01-16 03:52:22 -05:00
|
|
|
{
|
|
|
|
val = TOPN(n);
|
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* set Nth stack entry to stack top */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
setn
|
2007-05-03 05:09:14 -04:00
|
|
|
(rb_num_t n)
|
2007-01-16 03:52:22 -05:00
|
|
|
(..., VALUE val)
|
2018-01-12 03:38:07 -05:00
|
|
|
(VALUE val)
|
|
|
|
// attr rb_snum_t sp_inc = 0;
|
2007-01-16 03:52:22 -05:00
|
|
|
{
|
2018-07-19 09:25:22 -04:00
|
|
|
TOPN(n) = val;
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* empty current stack */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
2008-01-25 13:02:01 -05:00
|
|
|
adjuststack
|
|
|
|
(rb_num_t n)
|
2007-01-16 03:52:22 -05:00
|
|
|
(...)
|
2018-01-12 03:38:07 -05:00
|
|
|
(...)
|
2018-01-12 08:25:03 -05:00
|
|
|
// attr rb_snum_t sp_inc = -(rb_snum_t)n;
|
2007-01-16 03:52:22 -05:00
|
|
|
{
|
2018-07-19 09:25:22 -04:00
|
|
|
/* none */
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
|
|
|
|
/**********************************************************/
|
|
|
|
/* deal with setting */
|
|
|
|
/**********************************************************/
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* defined? */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
defined
|
2010-10-30 21:42:54 -04:00
|
|
|
(rb_num_t op_type, VALUE obj, VALUE needstr)
|
2007-01-16 03:52:22 -05:00
|
|
|
(VALUE v)
|
|
|
|
(VALUE val)
|
2018-09-11 05:48:58 -04:00
|
|
|
// attr bool leaf = leafness_of_defined(op_type);
|
2007-01-16 03:52:22 -05:00
|
|
|
{
|
2017-10-27 02:21:50 -04:00
|
|
|
val = vm_defined(ec, GET_CFP(), op_type, obj, needstr, v);
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* check `target' matches `pattern'.
|
2012-08-08 03:52:19 -04:00
|
|
|
`flag & VM_CHECKMATCH_TYPE_MASK' describe how to check pattern.
|
|
|
|
VM_CHECKMATCH_TYPE_WHEN: ignore target and check pattern is truthy.
|
|
|
|
VM_CHECKMATCH_TYPE_CASE: check `patten === target'.
|
2018-07-15 05:48:09 -04:00
|
|
|
VM_CHECKMATCH_TYPE_RESCUE: check `pattern.kind_op?(Module) && pattern === target'.
|
2012-08-08 03:52:19 -04:00
|
|
|
if `flag & VM_CHECKMATCH_ARRAY' is not 0, then `patten' is array of patterns.
|
|
|
|
*/
|
|
|
|
DEFINE_INSN
|
|
|
|
checkmatch
|
|
|
|
(rb_num_t flag)
|
|
|
|
(VALUE target, VALUE pattern)
|
|
|
|
(VALUE result)
|
2018-09-11 05:48:58 -04:00
|
|
|
// attr bool leaf = leafness_of_checkmatch(flag);
|
2012-08-08 03:52:19 -04:00
|
|
|
{
|
2017-11-16 01:10:31 -05:00
|
|
|
result = vm_check_match(ec, target, pattern, flag);
|
2012-08-08 03:52:19 -04:00
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* check keywords are specified or not. */
|
* rewrite method/block parameter fitting logic to optimize
keyword arguments/parameters and a splat argument.
[Feature #10440] (Details are described in this ticket)
Most of complex part is moved to vm_args.c.
Now, ISeq#to_a does not catch up new instruction format.
* vm_core.h: change iseq data structures.
* introduce rb_call_info_kw_arg_t to represent keyword arguments.
* add rb_call_info_t::kw_arg.
* rename rb_iseq_t::arg_post_len to rb_iseq_t::arg_post_num.
* rename rb_iseq_t::arg_keywords to arg_keyword_num.
* rename rb_iseq_t::arg_keyword to rb_iseq_t::arg_keyword_bits.
to represent keyword bitmap parameter index.
This bitmap parameter shows that which keyword parameters are given
or not given (0 for given).
It is refered by `checkkeyword' instruction described bellow.
* rename rb_iseq_t::arg_keyword_check to rb_iseq_t::arg_keyword_rest
to represent keyword rest parameter index.
* add rb_iseq_t::arg_keyword_default_values to represent default
keyword values.
* rename VM_CALL_ARGS_SKIP_SETUP to VM_CALL_ARGS_SIMPLE
to represent
(ci->flag & (SPLAT|BLOCKARG)) &&
ci->blockiseq == NULL &&
ci->kw_arg == NULL.
* vm_insnhelper.c, vm_args.c: rewrite with refactoring.
* rewrite splat argument code.
* rewrite keyword arguments/parameters code.
* merge method and block parameter fitting code into one code base.
* vm.c, vm_eval.c: catch up these changes.
* compile.c (new_callinfo): callinfo requires kw_arg parameter.
* compile.c (compile_array_): check the last argument Hash object or
not. If Hash object and all keys are Symbol literals, they are
compiled to keyword arguments.
* insns.def (checkkeyword): add new instruction.
This instruction check the availability of corresponding keyword.
For example, a method "def foo k1: 'v1'; end" is cimpiled to the
following instructions.
0000 checkkeyword 2, 0 # check k1 is given.
0003 branchif 9 # if given, jump to address #9
0005 putstring "v1"
0007 setlocal_OP__WC__0 3 # k1 = 'v1'
0009 trace 8
0011 putnil
0012 trace 16
0014 leave
* insns.def (opt_send_simple): removed and add new instruction
"opt_send_without_block".
* parse.y (new_args_tail_gen): reorder variables.
Before this patch, a method "def foo(k1: 1, kr1:, k2: 2, **krest, &b)"
has parameter variables "k1, kr1, k2, &b, internal_id, krest",
but this patch reorders to "kr1, k1, k2, internal_id, krest, &b".
(locate a block variable at last)
* parse.y (vtable_pop): added.
This function remove latest `n' variables from vtable.
* iseq.c: catch up iseq data changes.
* proc.c: ditto.
* class.c (keyword_error): export as rb_keyword_error().
* common.mk: depend vm_args.c for vm.o.
* hash.c (rb_hash_has_key): export.
* internal.h: ditto.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@48239 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2014-11-02 13:02:55 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
checkkeyword
|
2017-12-22 19:51:36 -05:00
|
|
|
(lindex_t kw_bits_index, lindex_t keyword_index)
|
* rewrite method/block parameter fitting logic to optimize
keyword arguments/parameters and a splat argument.
[Feature #10440] (Details are described in this ticket)
Most of complex part is moved to vm_args.c.
Now, ISeq#to_a does not catch up new instruction format.
* vm_core.h: change iseq data structures.
* introduce rb_call_info_kw_arg_t to represent keyword arguments.
* add rb_call_info_t::kw_arg.
* rename rb_iseq_t::arg_post_len to rb_iseq_t::arg_post_num.
* rename rb_iseq_t::arg_keywords to arg_keyword_num.
* rename rb_iseq_t::arg_keyword to rb_iseq_t::arg_keyword_bits.
to represent keyword bitmap parameter index.
This bitmap parameter shows that which keyword parameters are given
or not given (0 for given).
It is refered by `checkkeyword' instruction described bellow.
* rename rb_iseq_t::arg_keyword_check to rb_iseq_t::arg_keyword_rest
to represent keyword rest parameter index.
* add rb_iseq_t::arg_keyword_default_values to represent default
keyword values.
* rename VM_CALL_ARGS_SKIP_SETUP to VM_CALL_ARGS_SIMPLE
to represent
(ci->flag & (SPLAT|BLOCKARG)) &&
ci->blockiseq == NULL &&
ci->kw_arg == NULL.
* vm_insnhelper.c, vm_args.c: rewrite with refactoring.
* rewrite splat argument code.
* rewrite keyword arguments/parameters code.
* merge method and block parameter fitting code into one code base.
* vm.c, vm_eval.c: catch up these changes.
* compile.c (new_callinfo): callinfo requires kw_arg parameter.
* compile.c (compile_array_): check the last argument Hash object or
not. If Hash object and all keys are Symbol literals, they are
compiled to keyword arguments.
* insns.def (checkkeyword): add new instruction.
This instruction check the availability of corresponding keyword.
For example, a method "def foo k1: 'v1'; end" is cimpiled to the
following instructions.
0000 checkkeyword 2, 0 # check k1 is given.
0003 branchif 9 # if given, jump to address #9
0005 putstring "v1"
0007 setlocal_OP__WC__0 3 # k1 = 'v1'
0009 trace 8
0011 putnil
0012 trace 16
0014 leave
* insns.def (opt_send_simple): removed and add new instruction
"opt_send_without_block".
* parse.y (new_args_tail_gen): reorder variables.
Before this patch, a method "def foo(k1: 1, kr1:, k2: 2, **krest, &b)"
has parameter variables "k1, kr1, k2, &b, internal_id, krest",
but this patch reorders to "kr1, k1, k2, internal_id, krest, &b".
(locate a block variable at last)
* parse.y (vtable_pop): added.
This function remove latest `n' variables from vtable.
* iseq.c: catch up iseq data changes.
* proc.c: ditto.
* class.c (keyword_error): export as rb_keyword_error().
* common.mk: depend vm_args.c for vm.o.
* hash.c (rb_hash_has_key): export.
* internal.h: ditto.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@48239 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2014-11-02 13:02:55 -05:00
|
|
|
()
|
|
|
|
(VALUE ret)
|
|
|
|
{
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
ret = vm_check_keyword(kw_bits_index, keyword_index, GET_EP());
|
* rewrite method/block parameter fitting logic to optimize
keyword arguments/parameters and a splat argument.
[Feature #10440] (Details are described in this ticket)
Most of complex part is moved to vm_args.c.
Now, ISeq#to_a does not catch up new instruction format.
* vm_core.h: change iseq data structures.
* introduce rb_call_info_kw_arg_t to represent keyword arguments.
* add rb_call_info_t::kw_arg.
* rename rb_iseq_t::arg_post_len to rb_iseq_t::arg_post_num.
* rename rb_iseq_t::arg_keywords to arg_keyword_num.
* rename rb_iseq_t::arg_keyword to rb_iseq_t::arg_keyword_bits.
to represent keyword bitmap parameter index.
This bitmap parameter shows that which keyword parameters are given
or not given (0 for given).
It is refered by `checkkeyword' instruction described bellow.
* rename rb_iseq_t::arg_keyword_check to rb_iseq_t::arg_keyword_rest
to represent keyword rest parameter index.
* add rb_iseq_t::arg_keyword_default_values to represent default
keyword values.
* rename VM_CALL_ARGS_SKIP_SETUP to VM_CALL_ARGS_SIMPLE
to represent
(ci->flag & (SPLAT|BLOCKARG)) &&
ci->blockiseq == NULL &&
ci->kw_arg == NULL.
* vm_insnhelper.c, vm_args.c: rewrite with refactoring.
* rewrite splat argument code.
* rewrite keyword arguments/parameters code.
* merge method and block parameter fitting code into one code base.
* vm.c, vm_eval.c: catch up these changes.
* compile.c (new_callinfo): callinfo requires kw_arg parameter.
* compile.c (compile_array_): check the last argument Hash object or
not. If Hash object and all keys are Symbol literals, they are
compiled to keyword arguments.
* insns.def (checkkeyword): add new instruction.
This instruction check the availability of corresponding keyword.
For example, a method "def foo k1: 'v1'; end" is cimpiled to the
following instructions.
0000 checkkeyword 2, 0 # check k1 is given.
0003 branchif 9 # if given, jump to address #9
0005 putstring "v1"
0007 setlocal_OP__WC__0 3 # k1 = 'v1'
0009 trace 8
0011 putnil
0012 trace 16
0014 leave
* insns.def (opt_send_simple): removed and add new instruction
"opt_send_without_block".
* parse.y (new_args_tail_gen): reorder variables.
Before this patch, a method "def foo(k1: 1, kr1:, k2: 2, **krest, &b)"
has parameter variables "k1, kr1, k2, &b, internal_id, krest",
but this patch reorders to "kr1, k1, k2, internal_id, krest, &b".
(locate a block variable at last)
* parse.y (vtable_pop): added.
This function remove latest `n' variables from vtable.
* iseq.c: catch up iseq data changes.
* proc.c: ditto.
* class.c (keyword_error): export as rb_keyword_error().
* common.mk: depend vm_args.c for vm.o.
* hash.c (rb_hash_has_key): export.
* internal.h: ditto.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@48239 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2014-11-02 13:02:55 -05:00
|
|
|
}
|
|
|
|
|
2018-04-21 06:52:52 -04:00
|
|
|
/* check if val is type. */
|
|
|
|
DEFINE_INSN
|
|
|
|
checktype
|
|
|
|
(rb_num_t type)
|
|
|
|
(VALUE val)
|
|
|
|
(VALUE ret)
|
|
|
|
{
|
|
|
|
ret = (TYPE(val) == (int)type) ? Qtrue : Qfalse;
|
|
|
|
}
|
|
|
|
|
2018-12-31 10:00:37 -05:00
|
|
|
/* get method reference. */
|
|
|
|
DEFINE_INSN
|
|
|
|
methodref
|
|
|
|
(ID id)
|
|
|
|
(VALUE val)
|
|
|
|
(VALUE ret)
|
|
|
|
{
|
|
|
|
ret = rb_obj_method(val, ID2SYM(id));
|
|
|
|
}
|
|
|
|
|
2007-01-16 03:52:22 -05:00
|
|
|
/**********************************************************/
|
|
|
|
/* deal with control flow 1: class/module */
|
|
|
|
/**********************************************************/
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* enter class definition scope. if super is Qfalse, and class
|
2007-01-16 03:52:22 -05:00
|
|
|
"klass" is defined, it's redefine. otherwise, define "klass" class.
|
|
|
|
*/
|
|
|
|
DEFINE_INSN
|
|
|
|
defineclass
|
2012-12-20 03:13:53 -05:00
|
|
|
(ID id, ISEQ class_iseq, rb_num_t flags)
|
2007-01-16 03:52:22 -05:00
|
|
|
(VALUE cbase, VALUE super)
|
|
|
|
(VALUE val)
|
|
|
|
{
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
VALUE klass = vm_find_or_create_class_by_id(id, flags, cbase, super);
|
2007-08-12 15:09:15 -04:00
|
|
|
|
2015-12-08 08:58:50 -05:00
|
|
|
rb_iseq_check(class_iseq);
|
|
|
|
|
2007-01-16 03:52:22 -05:00
|
|
|
/* enter scope */
|
2017-10-27 02:21:50 -04:00
|
|
|
vm_push_frame(ec, class_iseq, VM_FRAME_MAGIC_CLASS | VM_ENV_FLAG_LOCAL, klass,
|
2016-07-28 07:02:30 -04:00
|
|
|
GET_BLOCK_HANDLER(),
|
2017-10-27 02:21:50 -04:00
|
|
|
(VALUE)vm_cref_push(ec, klass, NULL, FALSE),
|
2015-07-21 18:52:59 -04:00
|
|
|
class_iseq->body->iseq_encoded, GET_SP(),
|
2016-07-28 07:02:30 -04:00
|
|
|
class_iseq->body->local_table_size,
|
2015-12-08 08:58:50 -05:00
|
|
|
class_iseq->body->stack_max);
|
mjit_compile.c: use local variables for stack
if catch_except_p is FALSE. If catch_except_p is TRUE, stack values
should be on VM's stack when exception is thrown and the JIT-ed frame
is re-executed by VM's exception handler. If it's FALSE, the JIT-ed
frame won't be re-executed and don't need to keep values on VM's stack.
Using local variables allows us to reduce cfp->sp motion. Moving cfp->sp
is needed only for insns whose handles_frame? is false. So it improves
performance.
_mjit_compile_insn.erb: Prepare `stack_size` variable for GET_SP,
STACK_ADDR_FROM_TOP, TOPN macros. Share pc and sp motion partial view.
Use cancel handler created in mjit_compile.c.
_mjit_compile_send.erb: ditto. Also, when iseq->body->catch_except_p is
TRUE, this stops to call mjit_exec directly. I described the reason in
vm_insnhelper.h's comment for EXEC_EC_CFP.
_mjit_compile_pc_and_sp.erb: Shared logic for moving sp and pc. As you
can see from thsi file, when status->local_stack_p is TRUE and
insn.handles_frame? is false, moving sp is skipped. But if
insn.handles_frame? is true, values should be rolled back to VM's stack.
common.mk: add dependency for the file
_mjit_compile_insn_body.erb: Set sp value before canceling JIT on
DISPATCH_ORIGINAL_INSN. Replace GET_SP, STACK_ADDR_FROM_TOP, TOPN macros
for the case ocal_stack_p is TRUE and insn.handles_frame? is false.
In that case, values are not available on VM's stack and those macros
should be replaced.
mjit_compile.inc.erb: updated comments of macros which are supported by
JIT compiler. All references to `cfp->sp` should be replaced and thus
INC_SP, SET_SV, PUSH are no longer supported for now, because they are
not used now.
vm_exec.h: moved EXEC_EC_CFP definition to vm_insnhelper.h because it's
tighly coupled to CALL_METHOD.
vm_insnhelper.h: Have revised EXEC_EC_CFP definition moved from vm_exec.h.
Now it triggers mjit_exec for VM, and has the guard for catch_except_p
on JIT-ed code. See comments for details. CALL_METHOD delegates
triggering mjit_exec to EXEC_EC_CFP.
insns.def: Stopped using EXEC_EC_CFP for the case we don't want to
trigger mjit_exec. Those insns (defineclass, opt_call_c_function) are
not supported by JIT and it's safe to use RESTORE_REGS(), NEXT_INSN().
expandarray is changed to pass GET_SP() to replace the macro in
_mjit_compile_insn_body.erb.
vm_insnhelper.c: change to take sp for the above reason.
[close https://github.com/ruby/ruby/pull/1828]
This patch resurrects the performance which was attached in
[Feature #14235].
* Benchmark
Optcarrot (with configuration for benchmark_driver.gem)
https://github.com/benchmark-driver/optcarrot
$ benchmark-driver benchmark.yml --verbose 1 --rbenv 'before;before+JIT::before,--jit;after;after+JIT::after,--jit' --repeat-count 10
before: ruby 2.6.0dev (2018-03-04 trunk 62652) [x86_64-linux]
before+JIT: ruby 2.6.0dev (2018-03-04 trunk 62652) +JIT [x86_64-linux]
after: ruby 2.6.0dev (2018-03-04 local-variable.. 62652) [x86_64-linux]
last_commit=mjit_compile.c: use local variables for stack
after+JIT: ruby 2.6.0dev (2018-03-04 local-variable.. 62652) +JIT [x86_64-linux]
last_commit=mjit_compile.c: use local variables for stack
Calculating -------------------------------------
before before+JIT after after+JIT
optcarrot 53.552 59.680 53.697 63.358 fps
Comparison:
optcarrot
after+JIT: 63.4 fps
before+JIT: 59.7 fps - 1.06x slower
after: 53.7 fps - 1.18x slower
before: 53.6 fps - 1.18x slower
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62655 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-03-04 02:04:40 -05:00
|
|
|
RESTORE_REGS();
|
|
|
|
NEXT_INSN();
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
|
|
|
|
/**********************************************************/
|
|
|
|
/* deal with control flow 2: method/iterator */
|
|
|
|
/**********************************************************/
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* invoke method. */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
send
|
2015-09-19 13:59:58 -04:00
|
|
|
(CALL_INFO ci, CALL_CACHE cc, ISEQ blockiseq)
|
2007-01-16 03:52:22 -05:00
|
|
|
(...)
|
2018-01-12 03:38:07 -05:00
|
|
|
(VALUE val)
|
2018-12-25 19:58:26 -05:00
|
|
|
// attr rb_snum_t sp_inc = sp_inc_of_sendish(ci);
|
2007-01-16 03:52:22 -05:00
|
|
|
{
|
2018-12-25 19:59:37 -05:00
|
|
|
VALUE bh = vm_caller_setup_arg_block(ec, GET_CFP(), ci, blockiseq, false);
|
|
|
|
val = vm_sendish(ec, GET_CFP(), ci, cc, bh, vm_search_method_wrap);
|
2015-10-01 06:50:49 -04:00
|
|
|
|
2018-12-25 19:59:37 -05:00
|
|
|
if (val == Qundef) {
|
|
|
|
RESTORE_REGS();
|
|
|
|
NEXT_INSN();
|
|
|
|
}
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
|
2018-09-26 04:11:05 -04:00
|
|
|
/* Invoke method without block */
|
|
|
|
DEFINE_INSN
|
|
|
|
opt_send_without_block
|
|
|
|
(CALL_INFO ci, CALL_CACHE cc)
|
|
|
|
(...)
|
|
|
|
(VALUE val)
|
|
|
|
// attr bool handles_sp = true;
|
2018-12-25 19:58:26 -05:00
|
|
|
// attr rb_snum_t sp_inc = sp_inc_of_sendish(ci);
|
2018-09-26 04:11:05 -04:00
|
|
|
{
|
2018-12-25 19:59:37 -05:00
|
|
|
VALUE bh = VM_BLOCK_HANDLER_NONE;
|
|
|
|
val = vm_sendish(ec, GET_CFP(), ci, cc, bh, vm_search_method_wrap);
|
|
|
|
|
|
|
|
if (val == Qundef) {
|
|
|
|
RESTORE_REGS();
|
|
|
|
NEXT_INSN();
|
|
|
|
}
|
2018-09-26 04:11:05 -04:00
|
|
|
}
|
|
|
|
|
2013-11-09 16:17:06 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
opt_str_freeze
|
2018-09-11 23:39:36 -04:00
|
|
|
(VALUE str, CALL_INFO ci, CALL_CACHE cc)
|
2013-11-09 16:17:06 -05:00
|
|
|
()
|
|
|
|
(VALUE val)
|
|
|
|
{
|
2018-06-26 21:10:02 -04:00
|
|
|
val = vm_opt_str_freeze(str, BOP_FREEZE, idFreeze);
|
2018-09-11 23:39:36 -04:00
|
|
|
|
|
|
|
if (val == Qundef) {
|
2018-09-12 00:04:31 -04:00
|
|
|
PUSH(rb_str_resurrect(str));
|
2018-09-14 03:44:44 -04:00
|
|
|
CALL_SIMPLE_METHOD();
|
2018-09-11 23:39:36 -04:00
|
|
|
}
|
2013-11-09 16:17:06 -05:00
|
|
|
}
|
|
|
|
|
2017-03-27 02:12:37 -04:00
|
|
|
DEFINE_INSN
|
|
|
|
opt_str_uminus
|
2018-09-11 23:39:36 -04:00
|
|
|
(VALUE str, CALL_INFO ci, CALL_CACHE cc)
|
2017-03-27 02:12:37 -04:00
|
|
|
()
|
|
|
|
(VALUE val)
|
|
|
|
{
|
2018-06-26 21:10:02 -04:00
|
|
|
val = vm_opt_str_freeze(str, BOP_UMINUS, idUMinus);
|
2018-09-11 23:39:36 -04:00
|
|
|
|
|
|
|
if (val == Qundef) {
|
2018-09-12 00:04:31 -04:00
|
|
|
PUSH(rb_str_resurrect(str));
|
2018-09-14 03:44:44 -04:00
|
|
|
CALL_SIMPLE_METHOD();
|
2018-09-11 23:39:36 -04:00
|
|
|
}
|
2017-03-27 02:12:37 -04:00
|
|
|
}
|
|
|
|
|
2016-03-17 08:47:31 -04:00
|
|
|
DEFINE_INSN
|
|
|
|
opt_newarray_max
|
|
|
|
(rb_num_t num)
|
|
|
|
(...)
|
2018-01-12 03:38:07 -05:00
|
|
|
(VALUE val)
|
2018-09-11 05:48:58 -04:00
|
|
|
/* This instruction typically has no funcalls. But it compares array
|
|
|
|
* contents each other by nature. That part could call methods when
|
|
|
|
* necessary. No way to detect such method calls beforehand. We
|
|
|
|
* cannot but mark it being not leaf. */
|
|
|
|
// attr bool leaf = false; /* has rb_funcall() */
|
2018-11-07 02:16:50 -05:00
|
|
|
// attr rb_snum_t sp_inc = 1 - (rb_snum_t)num;
|
2016-03-17 08:47:31 -04:00
|
|
|
{
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
val = vm_opt_newarray_max(num, STACK_ADDR_FROM_TOP(num));
|
2016-03-17 08:47:31 -04:00
|
|
|
}
|
|
|
|
|
|
|
|
DEFINE_INSN
|
|
|
|
opt_newarray_min
|
|
|
|
(rb_num_t num)
|
|
|
|
(...)
|
2018-01-12 03:38:07 -05:00
|
|
|
(VALUE val)
|
2018-09-11 05:48:58 -04:00
|
|
|
/* Same discussion as opt_newarray_max. */
|
|
|
|
// attr bool leaf = false; /* has rb_funcall() */
|
2018-11-07 02:16:50 -05:00
|
|
|
// attr rb_snum_t sp_inc = 1 - (rb_snum_t)num;
|
2016-03-17 08:47:31 -04:00
|
|
|
{
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
val = vm_opt_newarray_min(num, STACK_ADDR_FROM_TOP(num));
|
2016-03-17 08:47:31 -04:00
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* super(args) # args.size => num */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
invokesuper
|
2015-09-19 13:59:58 -04:00
|
|
|
(CALL_INFO ci, CALL_CACHE cc, ISEQ blockiseq)
|
2007-01-16 03:52:22 -05:00
|
|
|
(...)
|
2018-01-12 03:38:07 -05:00
|
|
|
(VALUE val)
|
2018-12-25 19:58:26 -05:00
|
|
|
// attr rb_snum_t sp_inc = sp_inc_of_sendish(ci);
|
2007-01-16 03:52:22 -05:00
|
|
|
{
|
2018-12-25 19:59:37 -05:00
|
|
|
VALUE bh = vm_caller_setup_arg_block(ec, GET_CFP(), ci, blockiseq, true);
|
|
|
|
val = vm_sendish(ec, GET_CFP(), ci, cc, bh, vm_search_super_method);
|
2015-09-19 13:59:58 -04:00
|
|
|
|
2018-12-25 19:59:37 -05:00
|
|
|
if (val == Qundef) {
|
|
|
|
RESTORE_REGS();
|
|
|
|
NEXT_INSN();
|
|
|
|
}
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* yield(args) */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
invokeblock
|
* insns.def (send, invokesuper, invokeblock, opt_*), vm_core.h:
use only a `ci' (rb_call_info_t) parameter instead of using
parameters such as `op_id', 'op_argc', `blockiseq' and flag.
These information are stored in rb_call_info_t at the compile
time.
This technique simplifies parameter passings at related
function calls (~10% speedups for simple mehtod invocation at
my machine).
`rb_call_info_t' also has new function pointer variable `call'.
This `call' variable enables to customize method (block)
invocation process for each place. However, it always call
`vm_call_general()' at this changes.
`rb_call_info_t' also has temporary variables for method
(block) invocation.
* vm_core.h, compile.c, insns.def: introduce VM_CALL_ARGS_SKIP_SETUP
VM_CALL macro. This flag indicates that this call can skip
caller_setup (block arg and splat arg).
* compile.c: catch up above changes.
* iseq.c: catch up above changes (especially for TS_CALLINFO).
* tool/instruction.rb: catch up above chagnes.
* vm_insnhelper.c, vm_insnhelper.h: ditto. Macros and functions
parameters are changed.
* vm_eval.c (vm_call0): ditto (it will be rewriten soon).
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@37180 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2012-10-14 12:59:05 -04:00
|
|
|
(CALL_INFO ci)
|
2007-01-16 03:52:22 -05:00
|
|
|
(...)
|
2018-01-12 03:38:07 -05:00
|
|
|
(VALUE val)
|
2018-07-25 10:55:43 -04:00
|
|
|
// attr bool handles_sp = true;
|
2018-12-25 19:58:26 -05:00
|
|
|
// attr rb_snum_t sp_inc = sp_inc_of_invokeblock(ci);
|
2007-01-16 03:52:22 -05:00
|
|
|
{
|
2018-12-25 19:59:37 -05:00
|
|
|
static struct rb_call_cache cc = {
|
|
|
|
0, 0, NULL, vm_invokeblock_i,
|
|
|
|
};
|
2018-01-05 12:51:10 -05:00
|
|
|
|
2018-12-25 19:59:37 -05:00
|
|
|
VALUE bh = VM_BLOCK_HANDLER_NONE;
|
|
|
|
val = vm_sendish(ec, GET_CFP(), ci, &cc, bh, vm_search_invokeblock);
|
2018-01-05 12:51:10 -05:00
|
|
|
|
mjit_compile.c: use local variables for stack
if catch_except_p is FALSE. If catch_except_p is TRUE, stack values
should be on VM's stack when exception is thrown and the JIT-ed frame
is re-executed by VM's exception handler. If it's FALSE, the JIT-ed
frame won't be re-executed and don't need to keep values on VM's stack.
Using local variables allows us to reduce cfp->sp motion. Moving cfp->sp
is needed only for insns whose handles_frame? is false. So it improves
performance.
_mjit_compile_insn.erb: Prepare `stack_size` variable for GET_SP,
STACK_ADDR_FROM_TOP, TOPN macros. Share pc and sp motion partial view.
Use cancel handler created in mjit_compile.c.
_mjit_compile_send.erb: ditto. Also, when iseq->body->catch_except_p is
TRUE, this stops to call mjit_exec directly. I described the reason in
vm_insnhelper.h's comment for EXEC_EC_CFP.
_mjit_compile_pc_and_sp.erb: Shared logic for moving sp and pc. As you
can see from thsi file, when status->local_stack_p is TRUE and
insn.handles_frame? is false, moving sp is skipped. But if
insn.handles_frame? is true, values should be rolled back to VM's stack.
common.mk: add dependency for the file
_mjit_compile_insn_body.erb: Set sp value before canceling JIT on
DISPATCH_ORIGINAL_INSN. Replace GET_SP, STACK_ADDR_FROM_TOP, TOPN macros
for the case ocal_stack_p is TRUE and insn.handles_frame? is false.
In that case, values are not available on VM's stack and those macros
should be replaced.
mjit_compile.inc.erb: updated comments of macros which are supported by
JIT compiler. All references to `cfp->sp` should be replaced and thus
INC_SP, SET_SV, PUSH are no longer supported for now, because they are
not used now.
vm_exec.h: moved EXEC_EC_CFP definition to vm_insnhelper.h because it's
tighly coupled to CALL_METHOD.
vm_insnhelper.h: Have revised EXEC_EC_CFP definition moved from vm_exec.h.
Now it triggers mjit_exec for VM, and has the guard for catch_except_p
on JIT-ed code. See comments for details. CALL_METHOD delegates
triggering mjit_exec to EXEC_EC_CFP.
insns.def: Stopped using EXEC_EC_CFP for the case we don't want to
trigger mjit_exec. Those insns (defineclass, opt_call_c_function) are
not supported by JIT and it's safe to use RESTORE_REGS(), NEXT_INSN().
expandarray is changed to pass GET_SP() to replace the macro in
_mjit_compile_insn_body.erb.
vm_insnhelper.c: change to take sp for the above reason.
[close https://github.com/ruby/ruby/pull/1828]
This patch resurrects the performance which was attached in
[Feature #14235].
* Benchmark
Optcarrot (with configuration for benchmark_driver.gem)
https://github.com/benchmark-driver/optcarrot
$ benchmark-driver benchmark.yml --verbose 1 --rbenv 'before;before+JIT::before,--jit;after;after+JIT::after,--jit' --repeat-count 10
before: ruby 2.6.0dev (2018-03-04 trunk 62652) [x86_64-linux]
before+JIT: ruby 2.6.0dev (2018-03-04 trunk 62652) +JIT [x86_64-linux]
after: ruby 2.6.0dev (2018-03-04 local-variable.. 62652) [x86_64-linux]
last_commit=mjit_compile.c: use local variables for stack
after+JIT: ruby 2.6.0dev (2018-03-04 local-variable.. 62652) +JIT [x86_64-linux]
last_commit=mjit_compile.c: use local variables for stack
Calculating -------------------------------------
before before+JIT after after+JIT
optcarrot 53.552 59.680 53.697 63.358 fps
Comparison:
optcarrot
after+JIT: 63.4 fps
before+JIT: 59.7 fps - 1.06x slower
after: 53.7 fps - 1.18x slower
before: 53.6 fps - 1.18x slower
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62655 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-03-04 02:04:40 -05:00
|
|
|
if (val == Qundef) {
|
2018-12-25 19:59:37 -05:00
|
|
|
RESTORE_REGS();
|
|
|
|
NEXT_INSN();
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* return from this scope. */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
leave
|
|
|
|
()
|
|
|
|
(VALUE val)
|
|
|
|
(VALUE val)
|
2018-09-11 05:48:58 -04:00
|
|
|
/* This is super surprising but when leaving from a frame, we check
|
|
|
|
* for interrupts. If any, that should be executed on top of the
|
|
|
|
* current execution context. This is a method call. */
|
|
|
|
// attr bool leaf = false; /* has rb_threadptr_execute_interrupts() */
|
2018-07-25 10:55:43 -04:00
|
|
|
// attr bool handles_sp = true;
|
2007-01-16 03:52:22 -05:00
|
|
|
{
|
|
|
|
if (OPT_CHECKED_RUN) {
|
2015-08-05 01:43:58 -04:00
|
|
|
const VALUE *const bp = vm_base_ptr(reg_cfp);
|
|
|
|
if (reg_cfp->sp != bp) {
|
2017-10-27 02:21:50 -04:00
|
|
|
vm_stack_consistency_error(ec, reg_cfp, bp);
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2017-11-06 02:44:28 -05:00
|
|
|
RUBY_VM_CHECK_INTS(ec);
|
2007-01-16 03:52:22 -05:00
|
|
|
|
2017-10-27 02:21:50 -04:00
|
|
|
if (vm_pop_frame(ec, GET_CFP(), GET_EP())) {
|
2007-06-27 04:21:21 -04:00
|
|
|
#if OPT_CALL_THREADED_CODE
|
2017-10-27 15:16:51 -04:00
|
|
|
rb_ec_thread_ptr(ec)->retval = val;
|
2012-08-07 07:13:57 -04:00
|
|
|
return 0;
|
2007-06-27 04:21:21 -04:00
|
|
|
#else
|
* vm_core.h: remove VM_FRAME_MAGIC_FINISH (finish frame type).
Before this commit:
`finish frame' was place holder which indicates that VM loop
needs to return function.
If a C method calls a Ruby methods (a method written by Ruby),
then VM loop will be (re-)invoked. When the Ruby method returns,
then also VM loop should be escaped. `finish frame' has only
one instruction `finish', which returns VM loop function.
VM loop function executes `finish' instruction, then VM loop
function returns itself.
With such mechanism, `leave' instruction (which returns one
frame from current scope) doesn't need to check that this `leave'
should also return from VM loop function.
Strictly, one branch can be removed from `leave' instructon.
Consideration:
However, pushing the `finish frame' needs costs because
it needs several memory accesses. The number of pushing
`finish frame' is greater than I had assumed. Of course,
pushing `finish frame' consumes additional control frame.
Moreover, recent processors has good branch prediction,
with which we can ignore such trivial checking.
After this commit:
Finally, I decide to remove `finish frame' and `finish'
instruction. Some parts of VM depend on `finish frame',
so the new frame flag VM_FRAME_FLAG_FINISH is introduced.
If this frame should escape from VM function loop, then
the result of VM_FRAME_TYPE_FINISH_P(cfp) is true.
`leave' instruction checks this flag every time.
I measured performance on it. However on my environments,
it improves some benchmarks and slows some benchmarks down.
Maybe it is because of C compiler optimization parameters.
I'll re-visit here if this cause problems.
* insns.def (leave, finish): remove finish instruction.
* vm.c, vm_eval.c, vm_exec.c, vm_backtrace.c, vm_dump.c:
apply above changes.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@36099 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2012-06-15 06:22:34 -04:00
|
|
|
return val;
|
2007-06-27 04:21:21 -04:00
|
|
|
#endif
|
* vm_core.h: remove VM_FRAME_MAGIC_FINISH (finish frame type).
Before this commit:
`finish frame' was place holder which indicates that VM loop
needs to return function.
If a C method calls a Ruby methods (a method written by Ruby),
then VM loop will be (re-)invoked. When the Ruby method returns,
then also VM loop should be escaped. `finish frame' has only
one instruction `finish', which returns VM loop function.
VM loop function executes `finish' instruction, then VM loop
function returns itself.
With such mechanism, `leave' instruction (which returns one
frame from current scope) doesn't need to check that this `leave'
should also return from VM loop function.
Strictly, one branch can be removed from `leave' instructon.
Consideration:
However, pushing the `finish frame' needs costs because
it needs several memory accesses. The number of pushing
`finish frame' is greater than I had assumed. Of course,
pushing `finish frame' consumes additional control frame.
Moreover, recent processors has good branch prediction,
with which we can ignore such trivial checking.
After this commit:
Finally, I decide to remove `finish frame' and `finish'
instruction. Some parts of VM depend on `finish frame',
so the new frame flag VM_FRAME_FLAG_FINISH is introduced.
If this frame should escape from VM function loop, then
the result of VM_FRAME_TYPE_FINISH_P(cfp) is true.
`leave' instruction checks this flag every time.
I measured performance on it. However on my environments,
it improves some benchmarks and slows some benchmarks down.
Maybe it is because of C compiler optimization parameters.
I'll re-visit here if this cause problems.
* insns.def (leave, finish): remove finish instruction.
* vm.c, vm_eval.c, vm_exec.c, vm_backtrace.c, vm_dump.c:
apply above changes.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@36099 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2012-06-15 06:22:34 -04:00
|
|
|
}
|
|
|
|
else {
|
|
|
|
RESTORE_REGS();
|
|
|
|
}
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
|
|
|
|
/**********************************************************/
|
|
|
|
/* deal with control flow 3: exception */
|
|
|
|
/**********************************************************/
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* longjump */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
throw
|
2007-05-03 05:09:14 -04:00
|
|
|
(rb_num_t throw_state)
|
2007-01-16 03:52:22 -05:00
|
|
|
(VALUE throwobj)
|
|
|
|
(VALUE val)
|
2018-09-11 05:48:58 -04:00
|
|
|
/* Same discussion as leave. */
|
|
|
|
// attr bool leaf = false; /* has rb_threadptr_execute_interrupts() */
|
2007-01-16 03:52:22 -05:00
|
|
|
{
|
2017-11-06 02:44:28 -05:00
|
|
|
RUBY_VM_CHECK_INTS(ec);
|
2017-10-27 02:21:50 -04:00
|
|
|
val = vm_throw(ec, GET_CFP(), throw_state, throwobj);
|
2007-08-06 07:36:30 -04:00
|
|
|
THROW_EXCEPTION(val);
|
2007-01-16 03:52:22 -05:00
|
|
|
/* unreachable */
|
|
|
|
}
|
|
|
|
|
|
|
|
/**********************************************************/
|
|
|
|
/* deal with control flow 4: local jump */
|
|
|
|
/**********************************************************/
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* set PC to (PC + dst). */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
jump
|
|
|
|
(OFFSET dst)
|
|
|
|
()
|
|
|
|
()
|
2018-09-11 05:48:58 -04:00
|
|
|
/* Same discussion as leave. */
|
|
|
|
// attr bool leaf = false; /* has rb_threadptr_execute_interrupts() */
|
2007-01-16 03:52:22 -05:00
|
|
|
{
|
2017-11-06 02:44:28 -05:00
|
|
|
RUBY_VM_CHECK_INTS(ec);
|
2007-01-16 03:52:22 -05:00
|
|
|
JUMP(dst);
|
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* if val is not false or nil, set PC to (PC + dst). */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
branchif
|
|
|
|
(OFFSET dst)
|
|
|
|
(VALUE val)
|
|
|
|
()
|
2018-09-11 05:48:58 -04:00
|
|
|
/* Same discussion as jump. */
|
|
|
|
// attr bool leaf = false; /* has rb_threadptr_execute_interrupts() */
|
2007-01-16 03:52:22 -05:00
|
|
|
{
|
|
|
|
if (RTEST(val)) {
|
2017-11-06 02:44:28 -05:00
|
|
|
RUBY_VM_CHECK_INTS(ec);
|
2007-01-16 03:52:22 -05:00
|
|
|
JUMP(dst);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* if val is false or nil, set PC to (PC + dst). */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
branchunless
|
|
|
|
(OFFSET dst)
|
|
|
|
(VALUE val)
|
|
|
|
()
|
2018-09-11 05:48:58 -04:00
|
|
|
/* Same discussion as jump. */
|
|
|
|
// attr bool leaf = false; /* has rb_threadptr_execute_interrupts() */
|
2007-01-16 03:52:22 -05:00
|
|
|
{
|
|
|
|
if (!RTEST(val)) {
|
2017-11-06 02:44:28 -05:00
|
|
|
RUBY_VM_CHECK_INTS(ec);
|
2007-01-16 03:52:22 -05:00
|
|
|
JUMP(dst);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* if val is nil, set PC to (PC + dst). */
|
2015-10-22 02:30:12 -04:00
|
|
|
DEFINE_INSN
|
|
|
|
branchnil
|
|
|
|
(OFFSET dst)
|
|
|
|
(VALUE val)
|
|
|
|
()
|
2018-09-11 05:48:58 -04:00
|
|
|
/* Same discussion as jump. */
|
|
|
|
// attr bool leaf = false; /* has rb_threadptr_execute_interrupts() */
|
2015-10-22 02:30:12 -04:00
|
|
|
{
|
|
|
|
if (NIL_P(val)) {
|
2017-11-06 02:44:28 -05:00
|
|
|
RUBY_VM_CHECK_INTS(ec);
|
2015-10-22 02:30:12 -04:00
|
|
|
JUMP(dst);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2007-01-16 03:52:22 -05:00
|
|
|
/**********************************************************/
|
|
|
|
/* for optimize */
|
|
|
|
/**********************************************************/
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* push inline-cached value and go to dst if it is valid */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
2018-11-07 03:13:20 -05:00
|
|
|
opt_getinlinecache
|
2009-07-13 00:44:20 -04:00
|
|
|
(OFFSET dst, IC ic)
|
2007-01-16 03:52:22 -05:00
|
|
|
()
|
|
|
|
(VALUE val)
|
|
|
|
{
|
2018-02-10 11:54:47 -05:00
|
|
|
if (vm_ic_hit_p(ic, GET_EP())) {
|
|
|
|
val = ic->ic_value.value;
|
2007-01-16 03:52:22 -05:00
|
|
|
JUMP(dst);
|
|
|
|
}
|
2018-02-10 11:54:47 -05:00
|
|
|
else {
|
|
|
|
val = Qnil;
|
|
|
|
}
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* set inline cache */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
2018-11-07 03:13:20 -05:00
|
|
|
opt_setinlinecache
|
2010-02-24 12:06:15 -05:00
|
|
|
(IC ic)
|
2007-01-16 03:52:22 -05:00
|
|
|
(VALUE val)
|
|
|
|
(VALUE val)
|
|
|
|
{
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
vm_ic_update(ic, val, GET_EP());
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* run iseq only once */
|
2013-08-20 13:41:13 -04:00
|
|
|
DEFINE_INSN
|
|
|
|
once
|
2018-03-19 14:21:54 -04:00
|
|
|
(ISEQ iseq, ISE ise)
|
2013-08-20 13:41:13 -04:00
|
|
|
()
|
|
|
|
(VALUE val)
|
|
|
|
{
|
2018-03-19 14:21:54 -04:00
|
|
|
val = vm_once_dispatch(ec, iseq, ise);
|
2013-08-20 13:41:13 -04:00
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* case dispatcher, jump by table if possible */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
opt_case_dispatch
|
|
|
|
(CDHASH hash, OFFSET else_offset)
|
|
|
|
(..., VALUE key)
|
2018-01-12 03:38:07 -05:00
|
|
|
()
|
|
|
|
// attr rb_snum_t sp_inc = -1;
|
2007-01-16 03:52:22 -05:00
|
|
|
{
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
OFFSET dst = vm_case_dispatch(hash, else_offset, key);
|
|
|
|
|
|
|
|
if (dst) {
|
|
|
|
JUMP(dst);
|
2009-08-12 01:55:06 -04:00
|
|
|
}
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
|
|
|
|
/** simple functions */
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* optimized X+Y. */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
opt_plus
|
2015-09-19 13:59:58 -04:00
|
|
|
(CALL_INFO ci, CALL_CACHE cc)
|
2007-01-16 03:52:22 -05:00
|
|
|
(VALUE recv, VALUE obj)
|
|
|
|
(VALUE val)
|
2018-09-11 05:48:58 -04:00
|
|
|
/* Array + anything can be handled inside of opt_plus, and that
|
|
|
|
* anything is converted into array using #to_ary. */
|
|
|
|
// attr bool leaf = false; /* has rb_to_array_type() */
|
2007-01-16 03:52:22 -05:00
|
|
|
{
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
val = vm_opt_plus(recv, obj);
|
|
|
|
|
|
|
|
if (val == Qundef) {
|
2018-09-14 03:57:19 -04:00
|
|
|
CALL_SIMPLE_METHOD();
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* optimized X-Y. */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
opt_minus
|
2015-09-19 13:59:58 -04:00
|
|
|
(CALL_INFO ci, CALL_CACHE cc)
|
2007-01-16 03:52:22 -05:00
|
|
|
(VALUE recv, VALUE obj)
|
|
|
|
(VALUE val)
|
|
|
|
{
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
val = vm_opt_minus(recv, obj);
|
|
|
|
|
|
|
|
if (val == Qundef) {
|
2018-09-14 03:44:44 -04:00
|
|
|
CALL_SIMPLE_METHOD();
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* optimized X*Y. */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
opt_mult
|
2015-09-19 13:59:58 -04:00
|
|
|
(CALL_INFO ci, CALL_CACHE cc)
|
2007-01-16 03:52:22 -05:00
|
|
|
(VALUE recv, VALUE obj)
|
|
|
|
(VALUE val)
|
|
|
|
{
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
val = vm_opt_mult(recv, obj);
|
|
|
|
|
|
|
|
if (val == Qundef) {
|
2018-09-14 03:44:44 -04:00
|
|
|
CALL_SIMPLE_METHOD();
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* optimized X/Y. */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
opt_div
|
2015-09-19 13:59:58 -04:00
|
|
|
(CALL_INFO ci, CALL_CACHE cc)
|
2007-01-16 03:52:22 -05:00
|
|
|
(VALUE recv, VALUE obj)
|
|
|
|
(VALUE val)
|
|
|
|
{
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
val = vm_opt_div(recv, obj);
|
|
|
|
|
|
|
|
if (val == Qundef) {
|
2018-09-14 03:44:44 -04:00
|
|
|
CALL_SIMPLE_METHOD();
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* optimized X%Y. */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
opt_mod
|
2015-09-19 13:59:58 -04:00
|
|
|
(CALL_INFO ci, CALL_CACHE cc)
|
2007-01-16 03:52:22 -05:00
|
|
|
(VALUE recv, VALUE obj)
|
|
|
|
(VALUE val)
|
|
|
|
{
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
val = vm_opt_mod(recv, obj);
|
|
|
|
|
|
|
|
if (val == Qundef) {
|
2018-09-14 03:44:44 -04:00
|
|
|
CALL_SIMPLE_METHOD();
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* optimized X==Y. */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
opt_eq
|
2015-09-19 13:59:58 -04:00
|
|
|
(CALL_INFO ci, CALL_CACHE cc)
|
2007-01-16 03:52:22 -05:00
|
|
|
(VALUE recv, VALUE obj)
|
|
|
|
(VALUE val)
|
2018-09-11 05:48:58 -04:00
|
|
|
/* This instruction can compare a string with non-string. This
|
|
|
|
* (somewhat) coerces the non-string into a string, via a method
|
|
|
|
* call. */
|
|
|
|
// attr bool leaf = false; /* has rb_str_equal() */
|
2007-01-16 03:52:22 -05:00
|
|
|
{
|
2015-09-19 13:59:58 -04:00
|
|
|
val = opt_eq_func(recv, obj, ci, cc);
|
2007-01-16 03:52:22 -05:00
|
|
|
|
2007-12-18 07:07:51 -05:00
|
|
|
if (val == Qundef) {
|
2018-09-14 03:44:44 -04:00
|
|
|
CALL_SIMPLE_METHOD();
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
2007-12-18 07:07:51 -05:00
|
|
|
}
|
2007-01-16 03:52:22 -05:00
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* optimized X!=Y. */
|
2007-12-18 07:07:51 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
opt_neq
|
2018-01-29 02:15:08 -05:00
|
|
|
(CALL_INFO ci_eq, CALL_CACHE cc_eq, CALL_INFO ci, CALL_CACHE cc)
|
2007-12-18 07:07:51 -05:00
|
|
|
(VALUE recv, VALUE obj)
|
|
|
|
(VALUE val)
|
2018-09-11 05:48:58 -04:00
|
|
|
/* Same discussion as opt_eq. */
|
|
|
|
// attr bool leaf = false; /* has rb_str_equal() */
|
2007-12-18 07:07:51 -05:00
|
|
|
{
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
val = vm_opt_neq(ci, cc, ci_eq, cc_eq, recv, obj);
|
2007-12-18 07:07:51 -05:00
|
|
|
|
|
|
|
if (val == Qundef) {
|
2018-09-14 03:44:44 -04:00
|
|
|
CALL_SIMPLE_METHOD();
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* optimized X<Y. */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
opt_lt
|
2015-09-19 13:59:58 -04:00
|
|
|
(CALL_INFO ci, CALL_CACHE cc)
|
2007-01-16 03:52:22 -05:00
|
|
|
(VALUE recv, VALUE obj)
|
|
|
|
(VALUE val)
|
|
|
|
{
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
val = vm_opt_lt(recv, obj);
|
2007-01-16 03:52:22 -05:00
|
|
|
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
if (val == Qundef) {
|
2018-09-14 03:44:44 -04:00
|
|
|
CALL_SIMPLE_METHOD();
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* optimized X<=Y. */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
opt_le
|
2015-09-19 13:59:58 -04:00
|
|
|
(CALL_INFO ci, CALL_CACHE cc)
|
2007-01-16 03:52:22 -05:00
|
|
|
(VALUE recv, VALUE obj)
|
|
|
|
(VALUE val)
|
|
|
|
{
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
val = vm_opt_le(recv, obj);
|
2007-01-16 03:52:22 -05:00
|
|
|
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
if (val == Qundef) {
|
2018-09-14 03:44:44 -04:00
|
|
|
CALL_SIMPLE_METHOD();
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* optimized X>Y. */
|
2007-05-21 00:46:51 -04:00
|
|
|
DEFINE_INSN
|
|
|
|
opt_gt
|
2015-09-19 13:59:58 -04:00
|
|
|
(CALL_INFO ci, CALL_CACHE cc)
|
2007-05-21 00:46:51 -04:00
|
|
|
(VALUE recv, VALUE obj)
|
|
|
|
(VALUE val)
|
|
|
|
{
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
val = vm_opt_gt(recv, obj);
|
2007-05-21 00:46:51 -04:00
|
|
|
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
if (val == Qundef) {
|
2018-09-14 03:44:44 -04:00
|
|
|
CALL_SIMPLE_METHOD();
|
2007-05-21 00:46:51 -04:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* optimized X>=Y. */
|
2007-05-21 00:46:51 -04:00
|
|
|
DEFINE_INSN
|
|
|
|
opt_ge
|
2015-09-19 13:59:58 -04:00
|
|
|
(CALL_INFO ci, CALL_CACHE cc)
|
2007-05-21 00:46:51 -04:00
|
|
|
(VALUE recv, VALUE obj)
|
|
|
|
(VALUE val)
|
|
|
|
{
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
val = vm_opt_ge(recv, obj);
|
2007-05-21 00:46:51 -04:00
|
|
|
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
if (val == Qundef) {
|
2018-09-14 03:44:44 -04:00
|
|
|
CALL_SIMPLE_METHOD();
|
2007-05-21 00:46:51 -04:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* << */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
opt_ltlt
|
2015-09-19 13:59:58 -04:00
|
|
|
(CALL_INFO ci, CALL_CACHE cc)
|
2007-01-16 03:52:22 -05:00
|
|
|
(VALUE recv, VALUE obj)
|
|
|
|
(VALUE val)
|
|
|
|
{
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
val = vm_opt_ltlt(recv, obj);
|
|
|
|
|
|
|
|
if (val == Qundef) {
|
2018-09-14 03:44:44 -04:00
|
|
|
CALL_SIMPLE_METHOD();
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-09-25 22:38:45 -04:00
|
|
|
/* optimized X&Y. */
|
|
|
|
DEFINE_INSN
|
|
|
|
opt_and
|
|
|
|
(CALL_INFO ci, CALL_CACHE cc)
|
|
|
|
(VALUE recv, VALUE obj)
|
|
|
|
(VALUE val)
|
|
|
|
{
|
|
|
|
val = vm_opt_and(recv, obj);
|
|
|
|
|
|
|
|
if (val == Qundef) {
|
|
|
|
CALL_SIMPLE_METHOD();
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/* optimized X|Y. */
|
|
|
|
DEFINE_INSN
|
|
|
|
opt_or
|
|
|
|
(CALL_INFO ci, CALL_CACHE cc)
|
|
|
|
(VALUE recv, VALUE obj)
|
|
|
|
(VALUE val)
|
|
|
|
{
|
|
|
|
val = vm_opt_or(recv, obj);
|
|
|
|
|
|
|
|
if (val == Qundef) {
|
|
|
|
CALL_SIMPLE_METHOD();
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* [] */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
opt_aref
|
2015-09-19 13:59:58 -04:00
|
|
|
(CALL_INFO ci, CALL_CACHE cc)
|
2007-01-16 03:52:22 -05:00
|
|
|
(VALUE recv, VALUE obj)
|
|
|
|
(VALUE val)
|
2018-09-11 05:48:58 -04:00
|
|
|
/* This is complicated. In case of hash, vm_opt_aref() resorts to
|
|
|
|
* rb_hash_aref(). If `recv` has no `obj`, this function then yields
|
|
|
|
* default_proc. This is a method call. So opt_aref is
|
|
|
|
* (surprisingly) not leaf. */
|
|
|
|
// attr bool leaf = false; /* has rb_funcall() */ /* calls #yield */
|
2007-01-16 03:52:22 -05:00
|
|
|
{
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
val = vm_opt_aref(recv, obj);
|
|
|
|
|
|
|
|
if (val == Qundef) {
|
2018-09-14 03:44:44 -04:00
|
|
|
CALL_SIMPLE_METHOD();
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* recv[obj] = set */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
opt_aset
|
2015-09-19 13:59:58 -04:00
|
|
|
(CALL_INFO ci, CALL_CACHE cc)
|
2007-01-16 03:52:22 -05:00
|
|
|
(VALUE recv, VALUE obj, VALUE set)
|
|
|
|
(VALUE val)
|
2018-09-11 05:48:58 -04:00
|
|
|
/* This is another story than opt_aref. When vm_opt_aset() resorts
|
|
|
|
* to rb_hash_aset(), which should call #hash for `obj`. */
|
|
|
|
// attr bool leaf = false; /* has rb_funcall() */ /* calls #hash */
|
2007-01-16 03:52:22 -05:00
|
|
|
{
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
val = vm_opt_aset(recv, obj, set);
|
|
|
|
|
|
|
|
if (val == Qundef) {
|
2018-09-14 03:44:44 -04:00
|
|
|
CALL_SIMPLE_METHOD();
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* recv[str] = set */
|
2014-01-09 23:54:08 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
opt_aset_with
|
2018-01-29 02:15:08 -05:00
|
|
|
(VALUE key, CALL_INFO ci, CALL_CACHE cc)
|
2014-01-09 23:54:08 -05:00
|
|
|
(VALUE recv, VALUE val)
|
|
|
|
(VALUE val)
|
2018-09-11 05:48:58 -04:00
|
|
|
/* Same discussion as opt_aset. */
|
|
|
|
// attr bool leaf = false; /* has rb_funcall() */ /* calls #hash */
|
2014-01-09 23:54:08 -05:00
|
|
|
{
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
VALUE tmp = vm_opt_aset_with(recv, key, val);
|
|
|
|
|
|
|
|
if (tmp != Qundef) {
|
2017-04-18 07:06:58 -04:00
|
|
|
val = tmp;
|
2014-01-24 22:15:30 -05:00
|
|
|
}
|
|
|
|
else {
|
2018-07-17 12:20:15 -04:00
|
|
|
#ifndef MJIT_HEADER
|
2018-07-19 09:25:22 -04:00
|
|
|
TOPN(0) = rb_str_resurrect(key);
|
|
|
|
PUSH(val);
|
mjit_compile.c: merge initial JIT compiler
which has been developed by Takashi Kokubun <takashikkbn@gmail> as
YARV-MJIT. Many of its bugs are fixed by wanabe <s.wanabe@gmail.com>.
This JIT compiler is designed to be a safe migration path to introduce
JIT compiler to MRI. So this commit does not include any bytecode
changes or dynamic instruction modifications, which are done in original
MJIT.
This commit even strips off some aggressive optimizations from
YARV-MJIT, and thus it's slower than YARV-MJIT too. But it's still
fairly faster than Ruby 2.5 in some benchmarks (attached below).
Note that this JIT compiler passes `make test`, `make test-all`, `make
test-spec` without JIT, and even with JIT. Not only it's perfectly safe
with JIT disabled because it does not replace VM instructions unlike
MJIT, but also with JIT enabled it stably runs Ruby applications
including Rails applications.
I'm expecting this version as just "initial" JIT compiler. I have many
optimization ideas which are skipped for initial merging, and you may
easily replace this JIT compiler with a faster one by just replacing
mjit_compile.c. `mjit_compile` interface is designed for the purpose.
common.mk: update dependencies for mjit_compile.c.
internal.h: declare `rb_vm_insn_addr2insn` for MJIT.
vm.c: exclude some definitions if `-DMJIT_HEADER` is provided to
compiler. This avoids to include some functions which take a long time
to compile, e.g. vm_exec_core. Some of the purpose is achieved in
transform_mjit_header.rb (see `IGNORED_FUNCTIONS`) but others are
manually resolved for now. Load mjit_helper.h for MJIT header.
mjit_helper.h: New. This is a file used only by JIT-ed code. I'll
refactor `mjit_call_cfunc` later.
vm_eval.c: add some #ifdef switches to skip compiling some functions
like Init_vm_eval.
win32/mkexports.rb: export thread/ec functions, which are used by MJIT.
include/ruby/defines.h: add MJIT_FUNC_EXPORTED macro alis to clarify
that a function is exported only for MJIT.
array.c: export a function used by MJIT.
bignum.c: ditto.
class.c: ditto.
compile.c: ditto.
error.c: ditto.
gc.c: ditto.
hash.c: ditto.
iseq.c: ditto.
numeric.c: ditto.
object.c: ditto.
proc.c: ditto.
re.c: ditto.
st.c: ditto.
string.c: ditto.
thread.c: ditto.
variable.c: ditto.
vm_backtrace.c: ditto.
vm_insnhelper.c: ditto.
vm_method.c: ditto.
I would like to improve maintainability of function exports, but I
believe this way is acceptable as initial merging if we clarify the
new exports are for MJIT (so that we can use them as TODO list to fix)
and add unit tests to detect unresolved symbols.
I'll add unit tests of JIT compilations in succeeding commits.
Author: Takashi Kokubun <takashikkbn@gmail.com>
Contributor: wanabe <s.wanabe@gmail.com>
Part of [Feature #14235]
---
* Known issues
* Code generated by gcc is faster than clang. The benchmark may be worse
in macOS. Following benchmark result is provided by gcc w/ Linux.
* Performance is decreased when Google Chrome is running
* JIT can work on MinGW, but it doesn't improve performance at least
in short running benchmark.
* Currently it doesn't perform well with Rails. We'll try to fix this
before release.
---
* Benchmark reslts
Benchmarked with:
Intel 4.0GHz i7-4790K with 16GB memory under x86-64 Ubuntu 8 Cores
- 2.0.0-p0: Ruby 2.0.0-p0
- r62186: Ruby trunk (early 2.6.0), before MJIT changes
- JIT off: On this commit, but without `--jit` option
- JIT on: On this commit, and with `--jit` option
** Optcarrot fps
Benchmark: https://github.com/mame/optcarrot
| |2.0.0-p0 |r62186 |JIT off |JIT on |
|:--------|:--------|:--------|:--------|:--------|
|fps |37.32 |51.46 |51.31 |58.88 |
|vs 2.0.0 |1.00x |1.38x |1.37x |1.58x |
** MJIT benchmarks
Benchmark: https://github.com/benchmark-driver/mjit-benchmarks
(Original: https://github.com/vnmakarov/ruby/tree/rtl_mjit_branch/MJIT-benchmarks)
| |2.0.0-p0 |r62186 |JIT off |JIT on |
|:----------|:--------|:--------|:--------|:--------|
|aread |1.00 |1.09 |1.07 |2.19 |
|aref |1.00 |1.13 |1.11 |2.22 |
|aset |1.00 |1.50 |1.45 |2.64 |
|awrite |1.00 |1.17 |1.13 |2.20 |
|call |1.00 |1.29 |1.26 |2.02 |
|const2 |1.00 |1.10 |1.10 |2.19 |
|const |1.00 |1.11 |1.10 |2.19 |
|fannk |1.00 |1.04 |1.02 |1.00 |
|fib |1.00 |1.32 |1.31 |1.84 |
|ivread |1.00 |1.13 |1.12 |2.43 |
|ivwrite |1.00 |1.23 |1.21 |2.40 |
|mandelbrot |1.00 |1.13 |1.16 |1.28 |
|meteor |1.00 |2.97 |2.92 |3.17 |
|nbody |1.00 |1.17 |1.15 |1.49 |
|nest-ntimes|1.00 |1.22 |1.20 |1.39 |
|nest-while |1.00 |1.10 |1.10 |1.37 |
|norm |1.00 |1.18 |1.16 |1.24 |
|nsvb |1.00 |1.16 |1.16 |1.17 |
|red-black |1.00 |1.02 |0.99 |1.12 |
|sieve |1.00 |1.30 |1.28 |1.62 |
|trees |1.00 |1.14 |1.13 |1.19 |
|while |1.00 |1.12 |1.11 |2.41 |
** Discourse's script/bench.rb
Benchmark: https://github.com/discourse/discourse/blob/v1.8.7/script/bench.rb
NOTE: Rails performance was somehow a little degraded with JIT for now.
We should fix this.
(At least I know opt_aref is performing badly in JIT and I have an idea
to fix it. Please wait for the fix.)
*** JIT off
Your Results: (note for timings- percentile is first, duration is second in millisecs)
categories_admin:
50: 17
75: 18
90: 22
99: 29
home_admin:
50: 21
75: 21
90: 27
99: 40
topic_admin:
50: 17
75: 18
90: 22
99: 32
categories:
50: 35
75: 41
90: 43
99: 77
home:
50: 39
75: 46
90: 49
99: 95
topic:
50: 46
75: 52
90: 56
99: 101
*** JIT on
Your Results: (note for timings- percentile is first, duration is second in millisecs)
categories_admin:
50: 19
75: 21
90: 25
99: 33
home_admin:
50: 24
75: 26
90: 30
99: 35
topic_admin:
50: 19
75: 20
90: 25
99: 30
categories:
50: 40
75: 44
90: 48
99: 76
home:
50: 42
75: 48
90: 51
99: 89
topic:
50: 49
75: 55
90: 58
99: 99
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62197 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-02-04 06:22:28 -05:00
|
|
|
#endif
|
2018-09-14 03:44:44 -04:00
|
|
|
CALL_SIMPLE_METHOD();
|
2014-01-09 23:54:08 -05:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* recv[str] */
|
2014-01-09 23:54:08 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
opt_aref_with
|
2018-01-29 02:15:08 -05:00
|
|
|
(VALUE key, CALL_INFO ci, CALL_CACHE cc)
|
2014-01-09 23:54:08 -05:00
|
|
|
(VALUE recv)
|
|
|
|
(VALUE val)
|
2018-09-11 05:48:58 -04:00
|
|
|
/* Same discussion as opt_aref. */
|
|
|
|
// attr bool leaf = false; /* has rb_funcall() */ /* calls #yield */
|
2014-01-09 23:54:08 -05:00
|
|
|
{
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
val = vm_opt_aref_with(recv, key);
|
|
|
|
|
|
|
|
if (val == Qundef) {
|
2018-07-17 12:20:15 -04:00
|
|
|
#ifndef MJIT_HEADER
|
2018-07-19 09:25:22 -04:00
|
|
|
PUSH(rb_str_resurrect(key));
|
mjit_compile.c: merge initial JIT compiler
which has been developed by Takashi Kokubun <takashikkbn@gmail> as
YARV-MJIT. Many of its bugs are fixed by wanabe <s.wanabe@gmail.com>.
This JIT compiler is designed to be a safe migration path to introduce
JIT compiler to MRI. So this commit does not include any bytecode
changes or dynamic instruction modifications, which are done in original
MJIT.
This commit even strips off some aggressive optimizations from
YARV-MJIT, and thus it's slower than YARV-MJIT too. But it's still
fairly faster than Ruby 2.5 in some benchmarks (attached below).
Note that this JIT compiler passes `make test`, `make test-all`, `make
test-spec` without JIT, and even with JIT. Not only it's perfectly safe
with JIT disabled because it does not replace VM instructions unlike
MJIT, but also with JIT enabled it stably runs Ruby applications
including Rails applications.
I'm expecting this version as just "initial" JIT compiler. I have many
optimization ideas which are skipped for initial merging, and you may
easily replace this JIT compiler with a faster one by just replacing
mjit_compile.c. `mjit_compile` interface is designed for the purpose.
common.mk: update dependencies for mjit_compile.c.
internal.h: declare `rb_vm_insn_addr2insn` for MJIT.
vm.c: exclude some definitions if `-DMJIT_HEADER` is provided to
compiler. This avoids to include some functions which take a long time
to compile, e.g. vm_exec_core. Some of the purpose is achieved in
transform_mjit_header.rb (see `IGNORED_FUNCTIONS`) but others are
manually resolved for now. Load mjit_helper.h for MJIT header.
mjit_helper.h: New. This is a file used only by JIT-ed code. I'll
refactor `mjit_call_cfunc` later.
vm_eval.c: add some #ifdef switches to skip compiling some functions
like Init_vm_eval.
win32/mkexports.rb: export thread/ec functions, which are used by MJIT.
include/ruby/defines.h: add MJIT_FUNC_EXPORTED macro alis to clarify
that a function is exported only for MJIT.
array.c: export a function used by MJIT.
bignum.c: ditto.
class.c: ditto.
compile.c: ditto.
error.c: ditto.
gc.c: ditto.
hash.c: ditto.
iseq.c: ditto.
numeric.c: ditto.
object.c: ditto.
proc.c: ditto.
re.c: ditto.
st.c: ditto.
string.c: ditto.
thread.c: ditto.
variable.c: ditto.
vm_backtrace.c: ditto.
vm_insnhelper.c: ditto.
vm_method.c: ditto.
I would like to improve maintainability of function exports, but I
believe this way is acceptable as initial merging if we clarify the
new exports are for MJIT (so that we can use them as TODO list to fix)
and add unit tests to detect unresolved symbols.
I'll add unit tests of JIT compilations in succeeding commits.
Author: Takashi Kokubun <takashikkbn@gmail.com>
Contributor: wanabe <s.wanabe@gmail.com>
Part of [Feature #14235]
---
* Known issues
* Code generated by gcc is faster than clang. The benchmark may be worse
in macOS. Following benchmark result is provided by gcc w/ Linux.
* Performance is decreased when Google Chrome is running
* JIT can work on MinGW, but it doesn't improve performance at least
in short running benchmark.
* Currently it doesn't perform well with Rails. We'll try to fix this
before release.
---
* Benchmark reslts
Benchmarked with:
Intel 4.0GHz i7-4790K with 16GB memory under x86-64 Ubuntu 8 Cores
- 2.0.0-p0: Ruby 2.0.0-p0
- r62186: Ruby trunk (early 2.6.0), before MJIT changes
- JIT off: On this commit, but without `--jit` option
- JIT on: On this commit, and with `--jit` option
** Optcarrot fps
Benchmark: https://github.com/mame/optcarrot
| |2.0.0-p0 |r62186 |JIT off |JIT on |
|:--------|:--------|:--------|:--------|:--------|
|fps |37.32 |51.46 |51.31 |58.88 |
|vs 2.0.0 |1.00x |1.38x |1.37x |1.58x |
** MJIT benchmarks
Benchmark: https://github.com/benchmark-driver/mjit-benchmarks
(Original: https://github.com/vnmakarov/ruby/tree/rtl_mjit_branch/MJIT-benchmarks)
| |2.0.0-p0 |r62186 |JIT off |JIT on |
|:----------|:--------|:--------|:--------|:--------|
|aread |1.00 |1.09 |1.07 |2.19 |
|aref |1.00 |1.13 |1.11 |2.22 |
|aset |1.00 |1.50 |1.45 |2.64 |
|awrite |1.00 |1.17 |1.13 |2.20 |
|call |1.00 |1.29 |1.26 |2.02 |
|const2 |1.00 |1.10 |1.10 |2.19 |
|const |1.00 |1.11 |1.10 |2.19 |
|fannk |1.00 |1.04 |1.02 |1.00 |
|fib |1.00 |1.32 |1.31 |1.84 |
|ivread |1.00 |1.13 |1.12 |2.43 |
|ivwrite |1.00 |1.23 |1.21 |2.40 |
|mandelbrot |1.00 |1.13 |1.16 |1.28 |
|meteor |1.00 |2.97 |2.92 |3.17 |
|nbody |1.00 |1.17 |1.15 |1.49 |
|nest-ntimes|1.00 |1.22 |1.20 |1.39 |
|nest-while |1.00 |1.10 |1.10 |1.37 |
|norm |1.00 |1.18 |1.16 |1.24 |
|nsvb |1.00 |1.16 |1.16 |1.17 |
|red-black |1.00 |1.02 |0.99 |1.12 |
|sieve |1.00 |1.30 |1.28 |1.62 |
|trees |1.00 |1.14 |1.13 |1.19 |
|while |1.00 |1.12 |1.11 |2.41 |
** Discourse's script/bench.rb
Benchmark: https://github.com/discourse/discourse/blob/v1.8.7/script/bench.rb
NOTE: Rails performance was somehow a little degraded with JIT for now.
We should fix this.
(At least I know opt_aref is performing badly in JIT and I have an idea
to fix it. Please wait for the fix.)
*** JIT off
Your Results: (note for timings- percentile is first, duration is second in millisecs)
categories_admin:
50: 17
75: 18
90: 22
99: 29
home_admin:
50: 21
75: 21
90: 27
99: 40
topic_admin:
50: 17
75: 18
90: 22
99: 32
categories:
50: 35
75: 41
90: 43
99: 77
home:
50: 39
75: 46
90: 49
99: 95
topic:
50: 46
75: 52
90: 56
99: 101
*** JIT on
Your Results: (note for timings- percentile is first, duration is second in millisecs)
categories_admin:
50: 19
75: 21
90: 25
99: 33
home_admin:
50: 24
75: 26
90: 30
99: 35
topic_admin:
50: 19
75: 20
90: 25
99: 30
categories:
50: 40
75: 44
90: 48
99: 76
home:
50: 42
75: 48
90: 51
99: 89
topic:
50: 49
75: 55
90: 58
99: 99
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62197 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-02-04 06:22:28 -05:00
|
|
|
#endif
|
2018-09-14 03:57:19 -04:00
|
|
|
CALL_SIMPLE_METHOD();
|
2014-01-09 23:54:08 -05:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* optimized length */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
opt_length
|
2015-09-19 13:59:58 -04:00
|
|
|
(CALL_INFO ci, CALL_CACHE cc)
|
2007-01-16 03:52:22 -05:00
|
|
|
(VALUE recv)
|
|
|
|
(VALUE val)
|
|
|
|
{
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
val = vm_opt_length(recv, BOP_LENGTH);
|
|
|
|
|
|
|
|
if (val == Qundef) {
|
2018-09-14 03:44:44 -04:00
|
|
|
CALL_SIMPLE_METHOD();
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* optimized size */
|
2009-09-06 04:39:57 -04:00
|
|
|
DEFINE_INSN
|
|
|
|
opt_size
|
2015-09-19 13:59:58 -04:00
|
|
|
(CALL_INFO ci, CALL_CACHE cc)
|
2009-09-06 04:39:57 -04:00
|
|
|
(VALUE recv)
|
|
|
|
(VALUE val)
|
|
|
|
{
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
val = vm_opt_length(recv, BOP_SIZE);
|
|
|
|
|
|
|
|
if (val == Qundef) {
|
2018-09-14 03:44:44 -04:00
|
|
|
CALL_SIMPLE_METHOD();
|
2009-09-06 04:39:57 -04:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* optimized empty? */
|
2012-09-26 05:34:46 -04:00
|
|
|
DEFINE_INSN
|
|
|
|
opt_empty_p
|
2015-09-19 13:59:58 -04:00
|
|
|
(CALL_INFO ci, CALL_CACHE cc)
|
2012-09-26 05:34:46 -04:00
|
|
|
(VALUE recv)
|
|
|
|
(VALUE val)
|
|
|
|
{
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
val = vm_opt_empty_p(recv);
|
|
|
|
|
|
|
|
if (val == Qundef) {
|
2018-09-14 03:44:44 -04:00
|
|
|
CALL_SIMPLE_METHOD();
|
2012-09-26 05:34:46 -04:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* optimized succ */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
opt_succ
|
2015-09-19 13:59:58 -04:00
|
|
|
(CALL_INFO ci, CALL_CACHE cc)
|
2007-01-16 03:52:22 -05:00
|
|
|
(VALUE recv)
|
|
|
|
(VALUE val)
|
|
|
|
{
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
val = vm_opt_succ(recv);
|
|
|
|
|
|
|
|
if (val == Qundef) {
|
2018-09-14 03:44:44 -04:00
|
|
|
CALL_SIMPLE_METHOD();
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* optimized not */
|
2007-12-18 07:07:51 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
opt_not
|
2015-09-19 13:59:58 -04:00
|
|
|
(CALL_INFO ci, CALL_CACHE cc)
|
2007-12-18 07:07:51 -05:00
|
|
|
(VALUE recv)
|
|
|
|
(VALUE val)
|
|
|
|
{
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
val = vm_opt_not(ci, cc, recv);
|
2015-09-19 13:59:58 -04:00
|
|
|
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
if (val == Qundef) {
|
2018-09-14 03:44:44 -04:00
|
|
|
CALL_SIMPLE_METHOD();
|
2007-12-18 07:07:51 -05:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* optimized regexp match */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
opt_regexpmatch1
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
(VALUE recv)
|
2007-01-16 03:52:22 -05:00
|
|
|
(VALUE obj)
|
|
|
|
(VALUE val)
|
2018-09-11 05:48:58 -04:00
|
|
|
// attr bool leaf = BASIC_OP_UNREDEFINED_P(BOP_MATCH, REGEXP_REDEFINED_OP_FLAG);
|
2007-01-16 03:52:22 -05:00
|
|
|
{
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
val = vm_opt_regexpmatch1(recv, obj);
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* optimized regexp match 2 */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
opt_regexpmatch2
|
2015-09-19 13:59:58 -04:00
|
|
|
(CALL_INFO ci, CALL_CACHE cc)
|
2007-01-16 03:52:22 -05:00
|
|
|
(VALUE obj2, VALUE obj1)
|
|
|
|
(VALUE val)
|
|
|
|
{
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
val = vm_opt_regexpmatch2(obj2, obj1);
|
|
|
|
|
|
|
|
if (val == Qundef) {
|
2018-09-14 03:44:44 -04:00
|
|
|
CALL_SIMPLE_METHOD();
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* call native compiled method */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
2007-06-30 14:02:24 -04:00
|
|
|
opt_call_c_function
|
2007-08-12 15:09:15 -04:00
|
|
|
(rb_insn_func_t funcptr)
|
2007-01-16 03:52:22 -05:00
|
|
|
()
|
|
|
|
()
|
2018-09-11 05:48:58 -04:00
|
|
|
// attr bool leaf = false; /* anything can happen inside */
|
2018-07-25 10:55:43 -04:00
|
|
|
// attr bool handles_sp = true;
|
2007-01-16 03:52:22 -05:00
|
|
|
{
|
2017-10-27 15:08:31 -04:00
|
|
|
reg_cfp = (funcptr)(ec, reg_cfp);
|
2007-01-16 03:52:22 -05:00
|
|
|
|
2007-06-30 14:02:24 -04:00
|
|
|
if (reg_cfp == 0) {
|
2017-10-27 02:21:50 -04:00
|
|
|
VALUE err = ec->errinfo;
|
|
|
|
ec->errinfo = Qnil;
|
2007-07-01 22:59:37 -04:00
|
|
|
THROW_EXCEPTION(err);
|
2007-06-30 14:02:24 -04:00
|
|
|
}
|
|
|
|
|
mjit_compile.c: use local variables for stack
if catch_except_p is FALSE. If catch_except_p is TRUE, stack values
should be on VM's stack when exception is thrown and the JIT-ed frame
is re-executed by VM's exception handler. If it's FALSE, the JIT-ed
frame won't be re-executed and don't need to keep values on VM's stack.
Using local variables allows us to reduce cfp->sp motion. Moving cfp->sp
is needed only for insns whose handles_frame? is false. So it improves
performance.
_mjit_compile_insn.erb: Prepare `stack_size` variable for GET_SP,
STACK_ADDR_FROM_TOP, TOPN macros. Share pc and sp motion partial view.
Use cancel handler created in mjit_compile.c.
_mjit_compile_send.erb: ditto. Also, when iseq->body->catch_except_p is
TRUE, this stops to call mjit_exec directly. I described the reason in
vm_insnhelper.h's comment for EXEC_EC_CFP.
_mjit_compile_pc_and_sp.erb: Shared logic for moving sp and pc. As you
can see from thsi file, when status->local_stack_p is TRUE and
insn.handles_frame? is false, moving sp is skipped. But if
insn.handles_frame? is true, values should be rolled back to VM's stack.
common.mk: add dependency for the file
_mjit_compile_insn_body.erb: Set sp value before canceling JIT on
DISPATCH_ORIGINAL_INSN. Replace GET_SP, STACK_ADDR_FROM_TOP, TOPN macros
for the case ocal_stack_p is TRUE and insn.handles_frame? is false.
In that case, values are not available on VM's stack and those macros
should be replaced.
mjit_compile.inc.erb: updated comments of macros which are supported by
JIT compiler. All references to `cfp->sp` should be replaced and thus
INC_SP, SET_SV, PUSH are no longer supported for now, because they are
not used now.
vm_exec.h: moved EXEC_EC_CFP definition to vm_insnhelper.h because it's
tighly coupled to CALL_METHOD.
vm_insnhelper.h: Have revised EXEC_EC_CFP definition moved from vm_exec.h.
Now it triggers mjit_exec for VM, and has the guard for catch_except_p
on JIT-ed code. See comments for details. CALL_METHOD delegates
triggering mjit_exec to EXEC_EC_CFP.
insns.def: Stopped using EXEC_EC_CFP for the case we don't want to
trigger mjit_exec. Those insns (defineclass, opt_call_c_function) are
not supported by JIT and it's safe to use RESTORE_REGS(), NEXT_INSN().
expandarray is changed to pass GET_SP() to replace the macro in
_mjit_compile_insn_body.erb.
vm_insnhelper.c: change to take sp for the above reason.
[close https://github.com/ruby/ruby/pull/1828]
This patch resurrects the performance which was attached in
[Feature #14235].
* Benchmark
Optcarrot (with configuration for benchmark_driver.gem)
https://github.com/benchmark-driver/optcarrot
$ benchmark-driver benchmark.yml --verbose 1 --rbenv 'before;before+JIT::before,--jit;after;after+JIT::after,--jit' --repeat-count 10
before: ruby 2.6.0dev (2018-03-04 trunk 62652) [x86_64-linux]
before+JIT: ruby 2.6.0dev (2018-03-04 trunk 62652) +JIT [x86_64-linux]
after: ruby 2.6.0dev (2018-03-04 local-variable.. 62652) [x86_64-linux]
last_commit=mjit_compile.c: use local variables for stack
after+JIT: ruby 2.6.0dev (2018-03-04 local-variable.. 62652) +JIT [x86_64-linux]
last_commit=mjit_compile.c: use local variables for stack
Calculating -------------------------------------
before before+JIT after after+JIT
optcarrot 53.552 59.680 53.697 63.358 fps
Comparison:
optcarrot
after+JIT: 63.4 fps
before+JIT: 59.7 fps - 1.06x slower
after: 53.7 fps - 1.18x slower
before: 53.6 fps - 1.18x slower
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62655 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-03-04 02:04:40 -05:00
|
|
|
RESTORE_REGS();
|
|
|
|
NEXT_INSN();
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
|
2018-09-25 21:11:20 -04:00
|
|
|
/* BLT */
|
|
|
|
DEFINE_INSN
|
|
|
|
bitblt
|
|
|
|
()
|
|
|
|
()
|
|
|
|
(VALUE ret)
|
|
|
|
{
|
|
|
|
ret = rb_str_new2("a bit of bacon, lettuce and tomato");
|
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* The Answer to Life, the Universe, and Everything */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
answer
|
|
|
|
()
|
|
|
|
()
|
|
|
|
(VALUE ret)
|
|
|
|
{
|
|
|
|
ret = INT2FIX(42);
|
|
|
|
}
|