2018-01-12 03:38:08 -05:00
|
|
|
/* -*- mode:c; style:ruby; coding: utf-8 -*-
|
2007-01-16 03:52:22 -05:00
|
|
|
insns.def - YARV instruction definitions
|
|
|
|
|
|
|
|
$Author: $
|
|
|
|
created at: 04/01/01 01:17:55 JST
|
|
|
|
|
* blockinlining.c, compile.c, compile.h, debug.c, debug.h,
id.c, insnhelper.h, insns.def, thread.c, thread_pthread.ci,
thread_pthread.h, thread_win32.ci, thread_win32.h, vm.h,
vm_dump.c, vm_evalbody.ci, vm_opts.h: fix comments and
copyright year.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13920 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-11-13 17:13:04 -05:00
|
|
|
Copyright (C) 2004-2007 Koichi Sasada
|
2018-01-12 03:38:08 -05:00
|
|
|
Massive rewrite by @shyouhei in 2017.
|
2018-01-12 03:38:07 -05:00
|
|
|
*/
|
2018-01-09 20:53:24 -05:00
|
|
|
|
2018-01-12 03:38:08 -05:00
|
|
|
/* Some comments about this file's contents:
|
|
|
|
|
|
|
|
- The new format aims to be editable by C editor of your choice;
|
|
|
|
your mileage might vary of course.
|
|
|
|
|
|
|
|
- Each instructions are in following format:
|
|
|
|
|
|
|
|
DEFINE_INSN
|
|
|
|
instruction_name
|
|
|
|
(type operand, type operand, ..)
|
|
|
|
(pop_values, ..)
|
|
|
|
(return values ..)
|
|
|
|
// attr type name contents..
|
|
|
|
{
|
|
|
|
.. // insn body
|
|
|
|
}
|
|
|
|
|
|
|
|
- Unlike the old format which was line-oriented, you can now place
|
|
|
|
newlines and comments at liberal positions.
|
|
|
|
|
|
|
|
- `DEFINE_INSN` is a keyword.
|
|
|
|
|
|
|
|
- An instruction name must be a valid C identifier.
|
|
|
|
|
|
|
|
- Operands, pop values, return values are series of either variable
|
|
|
|
declarations, keyword `void`, or keyword `...`. They are much
|
|
|
|
like C function declarations.
|
|
|
|
|
|
|
|
- Attribute pragmas are optional, and can include arbitrary C
|
|
|
|
expressions. You can write anything there but as of writing,
|
2018-01-27 08:50:28 -05:00
|
|
|
supported attributes are:
|
|
|
|
|
|
|
|
* sp_inc: Used to dynamically calculate sp increase in
|
|
|
|
`insn_stack_increase`.
|
|
|
|
|
|
|
|
* handles_frame: If it is true, VM moves pc before insn body.
|
2018-01-12 03:38:08 -05:00
|
|
|
|
|
|
|
- Attributes can access operands, but not stack (push/pop) variables.
|
|
|
|
|
|
|
|
- An instruction's body is a pure C block, copied verbatimly into
|
|
|
|
the generated C source code.
|
2018-01-09 20:53:24 -05:00
|
|
|
*/
|
2018-01-09 08:30:29 -05:00
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* nop */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
nop
|
|
|
|
()
|
|
|
|
()
|
|
|
|
()
|
|
|
|
{
|
|
|
|
/* none */
|
|
|
|
}
|
|
|
|
|
|
|
|
/**********************************************************/
|
|
|
|
/* deal with variables */
|
|
|
|
/**********************************************************/
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* Get local variable (pointed by `idx' and `level').
|
2012-10-04 09:52:20 -04:00
|
|
|
'level' indicates the nesting depth from the current block.
|
2007-01-16 03:52:22 -05:00
|
|
|
*/
|
|
|
|
DEFINE_INSN
|
|
|
|
getlocal
|
2012-10-04 09:52:20 -04:00
|
|
|
(lindex_t idx, rb_num_t level)
|
2007-01-16 03:52:22 -05:00
|
|
|
()
|
|
|
|
(VALUE val)
|
|
|
|
{
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
val = *(vm_get_ep(GET_EP(), level) - idx);
|
2017-05-31 02:46:57 -04:00
|
|
|
RB_DEBUG_COUNTER_INC(lvar_get);
|
|
|
|
(void)RB_DEBUG_COUNTER_INC_IF(lvar_get_dynamic, level > 0);
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* Set a local variable (pointed to by 'idx') as val.
|
2012-10-04 09:52:20 -04:00
|
|
|
'level' indicates the nesting depth from the current block.
|
2007-01-16 03:52:22 -05:00
|
|
|
*/
|
|
|
|
DEFINE_INSN
|
|
|
|
setlocal
|
2012-10-04 09:52:20 -04:00
|
|
|
(lindex_t idx, rb_num_t level)
|
2007-01-16 03:52:22 -05:00
|
|
|
(VALUE val)
|
|
|
|
()
|
|
|
|
{
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
vm_env_write(vm_get_ep(GET_EP(), level), -(int)idx, val);
|
2017-05-31 02:46:57 -04:00
|
|
|
RB_DEBUG_COUNTER_INC(lvar_set);
|
|
|
|
(void)RB_DEBUG_COUNTER_INC_IF(lvar_set_dynamic, level > 0);
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* Get a block parameter. */
|
2017-10-24 07:13:49 -04:00
|
|
|
DEFINE_INSN
|
|
|
|
getblockparam
|
|
|
|
(lindex_t idx, rb_num_t level)
|
|
|
|
()
|
|
|
|
(VALUE val)
|
|
|
|
{
|
|
|
|
const VALUE *ep = vm_get_ep(GET_EP(), level);
|
|
|
|
VM_ASSERT(VM_ENV_LOCAL_P(ep));
|
|
|
|
|
|
|
|
if (!VM_ENV_FLAGS(ep, VM_FRAME_FLAG_MODIFIED_BLOCK_PARAM)) {
|
2017-10-27 02:21:50 -04:00
|
|
|
val = rb_vm_bh_to_procval(ec, VM_ENV_BLOCK_HANDLER(ep));
|
2017-10-24 07:13:49 -04:00
|
|
|
vm_env_write(ep, -(int)idx, val);
|
|
|
|
VM_ENV_FLAGS_SET(ep, VM_FRAME_FLAG_MODIFIED_BLOCK_PARAM);
|
|
|
|
}
|
|
|
|
else {
|
|
|
|
val = *(ep - idx);
|
|
|
|
RB_DEBUG_COUNTER_INC(lvar_get);
|
|
|
|
(void)RB_DEBUG_COUNTER_INC_IF(lvar_get_dynamic, level > 0);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* Set block parameter. */
|
2017-10-24 07:13:49 -04:00
|
|
|
DEFINE_INSN
|
|
|
|
setblockparam
|
|
|
|
(lindex_t idx, rb_num_t level)
|
|
|
|
(VALUE val)
|
|
|
|
()
|
|
|
|
{
|
|
|
|
const VALUE *ep = vm_get_ep(GET_EP(), level);
|
|
|
|
VM_ASSERT(VM_ENV_LOCAL_P(ep));
|
|
|
|
|
|
|
|
vm_env_write(ep, -(int)idx, val);
|
|
|
|
RB_DEBUG_COUNTER_INC(lvar_set);
|
|
|
|
(void)RB_DEBUG_COUNTER_INC_IF(lvar_set_dynamic, level > 0);
|
|
|
|
|
|
|
|
VM_ENV_FLAGS_SET(ep, VM_FRAME_FLAG_MODIFIED_BLOCK_PARAM);
|
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* Get special proxy object which only responds to `call` method if the block parameter
|
2018-01-07 14:18:49 -05:00
|
|
|
represents a iseq/ifunc block. Otherwise, same as `getblockparam`.
|
|
|
|
*/
|
|
|
|
DEFINE_INSN
|
|
|
|
getblockparamproxy
|
|
|
|
(lindex_t idx, rb_num_t level)
|
|
|
|
()
|
|
|
|
(VALUE val)
|
|
|
|
{
|
|
|
|
const VALUE *ep = vm_get_ep(GET_EP(), level);
|
|
|
|
VM_ASSERT(VM_ENV_LOCAL_P(ep));
|
|
|
|
|
|
|
|
if (!VM_ENV_FLAGS(ep, VM_FRAME_FLAG_MODIFIED_BLOCK_PARAM)) {
|
|
|
|
VALUE block_handler = VM_ENV_BLOCK_HANDLER(ep);
|
|
|
|
|
|
|
|
if (block_handler) {
|
|
|
|
switch (vm_block_handler_type(block_handler)) {
|
|
|
|
case block_handler_type_iseq:
|
|
|
|
case block_handler_type_ifunc:
|
|
|
|
val = rb_block_param_proxy;
|
|
|
|
break;
|
|
|
|
case block_handler_type_symbol:
|
|
|
|
val = rb_sym_to_proc(VM_BH_TO_SYMBOL(block_handler));
|
|
|
|
goto INSN_LABEL(set);
|
|
|
|
case block_handler_type_proc:
|
|
|
|
val = VM_BH_TO_PROC(block_handler);
|
|
|
|
goto INSN_LABEL(set);
|
|
|
|
default:
|
|
|
|
VM_UNREACHABLE(getblockparamproxy);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
else {
|
|
|
|
val = Qnil;
|
|
|
|
INSN_LABEL(set):
|
|
|
|
vm_env_write(ep, -(int)idx, val);
|
|
|
|
VM_ENV_FLAGS_SET(ep, VM_FRAME_FLAG_MODIFIED_BLOCK_PARAM);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
else {
|
|
|
|
val = *(ep - idx);
|
|
|
|
RB_DEBUG_COUNTER_INC(lvar_get);
|
|
|
|
(void)RB_DEBUG_COUNTER_INC_IF(lvar_get_dynamic, level > 0);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* Get value of special local variable ($~, $_, ..). */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
getspecial
|
2012-12-10 01:11:16 -05:00
|
|
|
(rb_num_t key, rb_num_t type)
|
2007-01-16 03:52:22 -05:00
|
|
|
()
|
|
|
|
(VALUE val)
|
|
|
|
{
|
2017-10-27 02:21:50 -04:00
|
|
|
val = vm_getspecial(ec, GET_LEP(), key, type);
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* Set value of special local variable ($~, $_, ...) to obj. */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
setspecial
|
2012-12-10 01:11:16 -05:00
|
|
|
(rb_num_t key)
|
2007-01-16 03:52:22 -05:00
|
|
|
(VALUE obj)
|
|
|
|
()
|
|
|
|
{
|
2017-10-27 02:21:50 -04:00
|
|
|
lep_svar_set(ec, GET_LEP(), key, obj);
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* Get value of instance variable id of self. */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
getinstancevariable
|
2009-07-13 00:44:20 -04:00
|
|
|
(ID id, IC ic)
|
2007-01-16 03:52:22 -05:00
|
|
|
()
|
|
|
|
(VALUE val)
|
|
|
|
{
|
2012-10-16 13:07:23 -04:00
|
|
|
val = vm_getinstancevariable(GET_SELF(), id, ic);
|
2007-02-04 14:17:33 -05:00
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* Set value of instance variable id of self to val. */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
setinstancevariable
|
2009-09-06 03:40:24 -04:00
|
|
|
(ID id, IC ic)
|
2007-01-16 03:52:22 -05:00
|
|
|
(VALUE val)
|
|
|
|
()
|
|
|
|
{
|
2012-10-16 13:07:23 -04:00
|
|
|
vm_setinstancevariable(GET_SELF(), id, val, ic);
|
2007-02-04 14:17:33 -05:00
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* Get value of class variable id of klass as val. */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
getclassvariable
|
|
|
|
(ID id)
|
|
|
|
()
|
|
|
|
(VALUE val)
|
|
|
|
{
|
2015-03-08 17:22:43 -04:00
|
|
|
val = rb_cvar_get(vm_get_cvar_base(rb_vm_get_cref(GET_EP()), GET_CFP()), id);
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* Set value of class variable id of klass as val. */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
setclassvariable
|
2007-02-04 14:15:38 -05:00
|
|
|
(ID id)
|
2007-01-16 03:52:22 -05:00
|
|
|
(VALUE val)
|
|
|
|
()
|
|
|
|
{
|
2016-09-08 00:44:51 -04:00
|
|
|
vm_ensure_not_refinement_module(GET_SELF());
|
2015-03-08 17:22:43 -04:00
|
|
|
rb_cvar_set(vm_get_cvar_base(rb_vm_get_cref(GET_EP()), GET_CFP()), id, val);
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* Get constant variable id. If klass is Qnil, constants
|
2011-11-05 07:30:51 -04:00
|
|
|
are searched in the current scope. If klass is Qfalse, constants
|
|
|
|
are searched as top level constants. Otherwise, get constant under klass
|
2007-01-16 03:52:22 -05:00
|
|
|
class or module.
|
|
|
|
*/
|
|
|
|
DEFINE_INSN
|
|
|
|
getconstant
|
|
|
|
(ID id)
|
|
|
|
(VALUE klass)
|
|
|
|
(VALUE val)
|
|
|
|
{
|
2017-10-27 02:21:50 -04:00
|
|
|
val = vm_get_ev_const(ec, klass, id, 0);
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* Set constant variable id. If klass is Qfalse, constant
|
2007-01-16 03:52:22 -05:00
|
|
|
is able to access in this scope. if klass is Qnil, set
|
|
|
|
top level constant. otherwise, set constant under klass
|
|
|
|
class or module.
|
|
|
|
*/
|
|
|
|
DEFINE_INSN
|
|
|
|
setconstant
|
|
|
|
(ID id)
|
2008-05-13 22:31:28 -04:00
|
|
|
(VALUE val, VALUE cbase)
|
2007-01-16 03:52:22 -05:00
|
|
|
()
|
|
|
|
{
|
2008-05-13 22:31:28 -04:00
|
|
|
vm_check_if_namespace(cbase);
|
2016-09-08 00:44:51 -04:00
|
|
|
vm_ensure_not_refinement_module(GET_SELF());
|
2008-05-13 22:31:28 -04:00
|
|
|
rb_const_set(cbase, id, val);
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* get global variable id. */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
getglobal
|
|
|
|
(GENTRY entry)
|
|
|
|
()
|
|
|
|
(VALUE val)
|
|
|
|
{
|
2010-10-12 10:35:40 -04:00
|
|
|
val = GET_GLOBAL((VALUE)entry);
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* set global variable id as val. */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
setglobal
|
|
|
|
(GENTRY entry)
|
|
|
|
(VALUE val)
|
|
|
|
()
|
|
|
|
{
|
2010-10-12 10:35:40 -04:00
|
|
|
SET_GLOBAL((VALUE)entry, val);
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
|
|
|
|
/**********************************************************/
|
|
|
|
/* deal with values */
|
|
|
|
/**********************************************************/
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* put nil to stack. */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
putnil
|
|
|
|
()
|
|
|
|
()
|
|
|
|
(VALUE val)
|
|
|
|
{
|
|
|
|
val = Qnil;
|
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* put self. */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
putself
|
|
|
|
()
|
|
|
|
()
|
|
|
|
(VALUE val)
|
|
|
|
{
|
|
|
|
val = GET_SELF();
|
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* put some object.
|
2008-06-30 23:05:58 -04:00
|
|
|
i.e. Fixnum, true, false, nil, and so on.
|
2008-05-13 22:31:28 -04:00
|
|
|
*/
|
|
|
|
DEFINE_INSN
|
2008-06-30 23:05:58 -04:00
|
|
|
putobject
|
|
|
|
(VALUE val)
|
2008-05-13 22:31:28 -04:00
|
|
|
()
|
|
|
|
(VALUE val)
|
|
|
|
{
|
2008-06-30 23:05:58 -04:00
|
|
|
/* */
|
2008-05-13 22:31:28 -04:00
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* put special object. "value_type" is for expansion. */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
2008-06-30 23:05:58 -04:00
|
|
|
putspecialobject
|
|
|
|
(rb_num_t value_type)
|
2007-01-16 03:52:22 -05:00
|
|
|
()
|
|
|
|
(VALUE val)
|
|
|
|
{
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
enum vm_special_object_type type;
|
|
|
|
|
|
|
|
type = (enum vm_special_object_type)value_type;
|
|
|
|
val = vm_get_special_object(GET_EP(), type);
|
2008-06-30 23:05:58 -04:00
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* put iseq value. */
|
2008-06-30 23:05:58 -04:00
|
|
|
DEFINE_INSN
|
|
|
|
putiseq
|
|
|
|
(ISEQ iseq)
|
|
|
|
()
|
|
|
|
(VALUE ret)
|
|
|
|
{
|
2015-07-21 18:52:59 -04:00
|
|
|
ret = (VALUE)iseq;
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* put string val. string will be copied. */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
putstring
|
2007-07-02 08:49:35 -04:00
|
|
|
(VALUE str)
|
2007-01-16 03:52:22 -05:00
|
|
|
()
|
|
|
|
(VALUE val)
|
|
|
|
{
|
2009-02-18 00:33:36 -05:00
|
|
|
val = rb_str_resurrect(str);
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* put concatenate strings */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
concatstrings
|
2007-05-03 05:09:14 -04:00
|
|
|
(rb_num_t num)
|
2007-01-16 03:52:22 -05:00
|
|
|
(...)
|
2018-01-12 03:38:07 -05:00
|
|
|
(VALUE val)
|
|
|
|
// attr rb_snum_t sp_inc = 1 - num;
|
2007-01-16 03:52:22 -05:00
|
|
|
{
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
val = rb_str_concat_literals(num, STACK_ADDR_FROM_TOP(num));
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* push the result of to_s. */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
tostring
|
|
|
|
()
|
2017-09-17 22:27:13 -04:00
|
|
|
(VALUE val, VALUE str)
|
2007-01-16 03:52:22 -05:00
|
|
|
(VALUE val)
|
|
|
|
{
|
2017-09-17 22:27:13 -04:00
|
|
|
VALUE rb_obj_as_string_result(VALUE str, VALUE obj);
|
|
|
|
val = rb_obj_as_string_result(str, val);
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* Freeze (dynamically) created strings. if debug_info is given, set it. */
|
2015-11-20 18:49:31 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
freezestring
|
|
|
|
(VALUE debug_info)
|
|
|
|
(VALUE str)
|
|
|
|
(VALUE str)
|
|
|
|
{
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
vm_freezestring(str, debug_info);
|
2015-11-20 18:49:31 -05:00
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* compile str to Regexp and push it.
|
2016-01-09 21:07:00 -05:00
|
|
|
opt is the option for the Regexp.
|
2007-01-16 03:52:22 -05:00
|
|
|
*/
|
|
|
|
DEFINE_INSN
|
|
|
|
toregexp
|
2008-01-29 03:03:51 -05:00
|
|
|
(rb_num_t opt, rb_num_t cnt)
|
|
|
|
(...)
|
2018-01-12 03:38:07 -05:00
|
|
|
(VALUE val)
|
|
|
|
// attr rb_snum_t sp_inc = 1 - cnt;
|
2007-01-16 03:52:22 -05:00
|
|
|
{
|
2008-01-29 03:03:51 -05:00
|
|
|
VALUE rb_reg_new_ary(VALUE ary, int options);
|
2017-04-20 06:32:08 -04:00
|
|
|
VALUE rb_ary_tmp_new_from_values(VALUE, long, const VALUE *);
|
|
|
|
const VALUE ary = rb_ary_tmp_new_from_values(0, cnt, STACK_ADDR_FROM_TOP(cnt));
|
2009-06-30 03:46:44 -04:00
|
|
|
val = rb_reg_new_ary(ary, (int)opt);
|
2009-02-11 00:46:17 -05:00
|
|
|
rb_ary_clear(ary);
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* intern str to Symbol and push it. */
|
2017-09-18 01:16:37 -04:00
|
|
|
DEFINE_INSN
|
|
|
|
intern
|
|
|
|
()
|
|
|
|
(VALUE str)
|
|
|
|
(VALUE sym)
|
|
|
|
{
|
|
|
|
sym = rb_str_intern(str);
|
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* put new array initialized with num values on the stack. */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
newarray
|
2007-05-03 05:09:14 -04:00
|
|
|
(rb_num_t num)
|
2007-01-16 03:52:22 -05:00
|
|
|
(...)
|
2018-01-12 03:38:07 -05:00
|
|
|
(VALUE val)
|
|
|
|
// attr rb_snum_t sp_inc = 1 - num;
|
2007-01-16 03:52:22 -05:00
|
|
|
{
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
val = rb_ary_new4(num, STACK_ADDR_FROM_TOP(num));
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* dup array */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
duparray
|
|
|
|
(VALUE ary)
|
|
|
|
()
|
|
|
|
(VALUE val)
|
|
|
|
{
|
2009-02-18 00:33:36 -05:00
|
|
|
val = rb_ary_resurrect(ary);
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* if TOS is an array expand, expand it to num objects.
|
2016-01-09 21:07:00 -05:00
|
|
|
if the number of the array is less than num, push nils to fill.
|
|
|
|
if it is greater than num, exceeding elements are dropped.
|
|
|
|
unless TOS is an array, push num - 1 nils.
|
|
|
|
if flags is non-zero, push the array of the rest elements.
|
|
|
|
flag: 0x01 - rest args array
|
|
|
|
flag: 0x02 - for postarg
|
|
|
|
flag: 0x04 - reverse?
|
2007-01-16 03:52:22 -05:00
|
|
|
*/
|
|
|
|
DEFINE_INSN
|
|
|
|
expandarray
|
2007-05-03 05:09:14 -04:00
|
|
|
(rb_num_t num, rb_num_t flag)
|
2007-01-16 03:52:22 -05:00
|
|
|
(..., VALUE ary)
|
2018-01-12 03:38:07 -05:00
|
|
|
(...)
|
|
|
|
// attr rb_snum_t sp_inc = num - 1 + (flag & 1 ? 1 : 0);
|
2007-01-16 03:52:22 -05:00
|
|
|
{
|
2009-06-30 03:46:44 -04:00
|
|
|
vm_expandarray(GET_CFP(), ary, num, (int)flag);
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* concat two arrays */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
concatarray
|
|
|
|
()
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
(VALUE ary1, VALUE ary2)
|
2007-01-16 03:52:22 -05:00
|
|
|
(VALUE ary)
|
|
|
|
{
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
ary = vm_concat_array(ary1, ary2);
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* call to_a on array ary to splat */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
splatarray
|
|
|
|
(VALUE flag)
|
|
|
|
(VALUE ary)
|
|
|
|
(VALUE obj)
|
|
|
|
{
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
obj = vm_splat_array(flag, ary);
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* put new Hash from n elements. n must be an even number. */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
newhash
|
2007-05-03 05:09:14 -04:00
|
|
|
(rb_num_t num)
|
2007-01-16 03:52:22 -05:00
|
|
|
(...)
|
2018-01-12 03:38:07 -05:00
|
|
|
(VALUE val)
|
|
|
|
// attr rb_snum_t sp_inc = 1 - num;
|
2007-01-16 03:52:22 -05:00
|
|
|
{
|
2015-10-29 01:32:57 -04:00
|
|
|
RUBY_DTRACE_CREATE_HOOK(HASH, num);
|
* probes.d: add DTrace probe declarations. [ruby-core:27448]
* array.c (empty_ary_alloc, ary_new): added array create DTrace probe.
* compile.c (rb_insns_name): allowing DTrace probes to access
instruction sequence name.
* Makefile.in: translate probes.d file to appropriate header file.
* common.mk: declare dependencies on the DTrace header.
* configure.in: add a test for existence of DTrace.
* eval.c (setup_exception): add a probe for when an exception is
raised.
* gc.c: Add DTrace probes for mark begin and end, and sweep begin and
end.
* hash.c (empty_hash_alloc): Add a probe for hash allocation.
* insns.def: Add probes for function entry and return.
* internal.h: function declaration for compile.c change.
* load.c (rb_f_load): add probes for `load` entry and exit, require
entry and exit, and wrapping search_required for load path search.
* object.c (rb_obj_alloc): added a probe for general object creation.
* parse.y (yycompile0): added a probe around parse and compile phase.
* string.c (empty_str_alloc, str_new): DTrace probes for string
allocation.
* test/dtrace/*: tests for DTrace probes.
* vm.c (vm_invoke_proc): add probes for function return on exception
raise, hash create, and instruction sequence execution.
* vm_core.h: add probe declarations for function entry and exit.
* vm_dump.c: add probes header file.
* vm_eval.c (vm_call0_cfunc, vm_call0_cfunc_with_frame): add probe on
function entry and return.
* vm_exec.c: expose instruction number to instruction name function.
* vm_insnshelper.c: add function entry and exit probes for cfunc
methods.
* vm_insnhelper.h: vm usage information is always collected, so
uncomment the functions.
12 19:14:50 2012 Akinori MUSHA <knu@iDaemons.org>
* configure.in (isinf, isnan): isinf() and isnan() are macros on
DragonFly which cannot be found by AC_REPLACE_FUNCS(). This
workaround enforces the fact that they exist on DragonFly.
12 15:59:38 2012 Shugo Maeda <shugo@ruby-lang.org>
* vm_core.h (rb_call_info_t::refinements), compile.c (new_callinfo),
vm_insnhelper.c (vm_search_method): revert r37616 because it's too
slow. [ruby-dev:46477]
* test/ruby/test_refinement.rb (test_inline_method_cache): skip
the test until the bug is fixed efficiently.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@37631 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2012-11-12 16:52:12 -05:00
|
|
|
|
2017-09-05 00:48:19 -04:00
|
|
|
val = rb_hash_new_with_size(num / 2);
|
2017-04-23 21:40:51 -04:00
|
|
|
|
2017-04-27 00:21:04 -04:00
|
|
|
if (num) {
|
|
|
|
rb_hash_bulk_insert(num, STACK_ADDR_FROM_TOP(num), val);
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* put new Range object.(Range.new(low, high, flag)) */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
newrange
|
2007-05-03 05:09:14 -04:00
|
|
|
(rb_num_t flag)
|
2007-01-16 03:52:22 -05:00
|
|
|
(VALUE low, VALUE high)
|
|
|
|
(VALUE val)
|
|
|
|
{
|
2009-06-30 03:46:44 -04:00
|
|
|
val = rb_range_new(low, high, (int)flag);
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
|
|
|
|
/**********************************************************/
|
|
|
|
/* deal with stack operation */
|
|
|
|
/**********************************************************/
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* pop from stack. */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
pop
|
|
|
|
()
|
|
|
|
(VALUE val)
|
|
|
|
()
|
|
|
|
{
|
2011-11-27 03:24:19 -05:00
|
|
|
(void)val;
|
2007-01-16 03:52:22 -05:00
|
|
|
/* none */
|
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* duplicate stack top. */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
dup
|
|
|
|
()
|
|
|
|
(VALUE val)
|
|
|
|
(VALUE val1, VALUE val2)
|
|
|
|
{
|
|
|
|
val1 = val2 = val;
|
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* duplicate stack top n elements */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
dupn
|
2007-05-03 05:09:14 -04:00
|
|
|
(rb_num_t n)
|
2007-01-16 03:52:22 -05:00
|
|
|
(...)
|
2018-01-12 03:38:07 -05:00
|
|
|
(...)
|
|
|
|
// attr rb_snum_t sp_inc = n;
|
2007-01-16 03:52:22 -05:00
|
|
|
{
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
void *dst = GET_SP();
|
|
|
|
void *src = STACK_ADDR_FROM_TOP(n);
|
|
|
|
|
|
|
|
MEMCPY(dst, src, VALUE, n);
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* swap top 2 vals */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
swap
|
|
|
|
()
|
|
|
|
(VALUE val, VALUE obj)
|
|
|
|
(VALUE obj, VALUE val)
|
|
|
|
{
|
|
|
|
/* none */
|
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* reverse stack top N order. */
|
2015-02-24 19:20:39 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
reverse
|
|
|
|
(rb_num_t n)
|
|
|
|
(...)
|
2018-01-12 03:38:07 -05:00
|
|
|
(...)
|
|
|
|
// attr rb_snum_t sp_inc = 0;
|
2015-02-24 19:20:39 -05:00
|
|
|
{
|
|
|
|
rb_num_t i;
|
|
|
|
VALUE *sp = STACK_ADDR_FROM_TOP(n);
|
|
|
|
|
|
|
|
for (i=0; i<n/2; i++) {
|
|
|
|
VALUE v0 = sp[i];
|
|
|
|
VALUE v1 = TOPN(i);
|
|
|
|
sp[i] = v1;
|
|
|
|
TOPN(i) = v0;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* for stack caching. */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
reput
|
|
|
|
()
|
|
|
|
(..., VALUE val)
|
2018-01-12 03:38:07 -05:00
|
|
|
(VALUE val)
|
|
|
|
// attr rb_snum_t sp_inc = 0;
|
2007-01-16 03:52:22 -05:00
|
|
|
{
|
|
|
|
/* none */
|
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* get nth stack value from stack top */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
topn
|
2007-05-03 05:09:14 -04:00
|
|
|
(rb_num_t n)
|
2007-01-16 03:52:22 -05:00
|
|
|
(...)
|
2018-01-12 03:38:07 -05:00
|
|
|
(VALUE val)
|
|
|
|
// attr rb_snum_t sp_inc = 1;
|
2007-01-16 03:52:22 -05:00
|
|
|
{
|
|
|
|
val = TOPN(n);
|
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* set Nth stack entry to stack top */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
setn
|
2007-05-03 05:09:14 -04:00
|
|
|
(rb_num_t n)
|
2007-01-16 03:52:22 -05:00
|
|
|
(..., VALUE val)
|
2018-01-12 03:38:07 -05:00
|
|
|
(VALUE val)
|
|
|
|
// attr rb_snum_t sp_inc = 0;
|
2007-01-16 03:52:22 -05:00
|
|
|
{
|
2018-01-29 01:56:56 -05:00
|
|
|
TOPN(n) = val;
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* empty current stack */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
2008-01-25 13:02:01 -05:00
|
|
|
adjuststack
|
|
|
|
(rb_num_t n)
|
2007-01-16 03:52:22 -05:00
|
|
|
(...)
|
2018-01-12 03:38:07 -05:00
|
|
|
(...)
|
2018-01-12 08:25:03 -05:00
|
|
|
// attr rb_snum_t sp_inc = -(rb_snum_t)n;
|
2007-01-16 03:52:22 -05:00
|
|
|
{
|
2018-01-29 01:56:56 -05:00
|
|
|
/* none */
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
|
|
|
|
/**********************************************************/
|
|
|
|
/* deal with setting */
|
|
|
|
/**********************************************************/
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* defined? */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
defined
|
2010-10-30 21:42:54 -04:00
|
|
|
(rb_num_t op_type, VALUE obj, VALUE needstr)
|
2007-01-16 03:52:22 -05:00
|
|
|
(VALUE v)
|
|
|
|
(VALUE val)
|
|
|
|
{
|
2017-10-27 02:21:50 -04:00
|
|
|
val = vm_defined(ec, GET_CFP(), op_type, obj, needstr, v);
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* check `target' matches `pattern'.
|
2012-08-08 03:52:19 -04:00
|
|
|
`flag & VM_CHECKMATCH_TYPE_MASK' describe how to check pattern.
|
|
|
|
VM_CHECKMATCH_TYPE_WHEN: ignore target and check pattern is truthy.
|
|
|
|
VM_CHECKMATCH_TYPE_CASE: check `patten === target'.
|
|
|
|
VM_CHECKMATCH_TYPE_RESCUE: check `pattern.kind_op?(Module) && pattern == target'.
|
|
|
|
if `flag & VM_CHECKMATCH_ARRAY' is not 0, then `patten' is array of patterns.
|
|
|
|
*/
|
|
|
|
DEFINE_INSN
|
|
|
|
checkmatch
|
|
|
|
(rb_num_t flag)
|
|
|
|
(VALUE target, VALUE pattern)
|
|
|
|
(VALUE result)
|
|
|
|
{
|
2017-11-16 01:10:31 -05:00
|
|
|
result = vm_check_match(ec, target, pattern, flag);
|
2012-08-08 03:52:19 -04:00
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* check keywords are specified or not. */
|
* rewrite method/block parameter fitting logic to optimize
keyword arguments/parameters and a splat argument.
[Feature #10440] (Details are described in this ticket)
Most of complex part is moved to vm_args.c.
Now, ISeq#to_a does not catch up new instruction format.
* vm_core.h: change iseq data structures.
* introduce rb_call_info_kw_arg_t to represent keyword arguments.
* add rb_call_info_t::kw_arg.
* rename rb_iseq_t::arg_post_len to rb_iseq_t::arg_post_num.
* rename rb_iseq_t::arg_keywords to arg_keyword_num.
* rename rb_iseq_t::arg_keyword to rb_iseq_t::arg_keyword_bits.
to represent keyword bitmap parameter index.
This bitmap parameter shows that which keyword parameters are given
or not given (0 for given).
It is refered by `checkkeyword' instruction described bellow.
* rename rb_iseq_t::arg_keyword_check to rb_iseq_t::arg_keyword_rest
to represent keyword rest parameter index.
* add rb_iseq_t::arg_keyword_default_values to represent default
keyword values.
* rename VM_CALL_ARGS_SKIP_SETUP to VM_CALL_ARGS_SIMPLE
to represent
(ci->flag & (SPLAT|BLOCKARG)) &&
ci->blockiseq == NULL &&
ci->kw_arg == NULL.
* vm_insnhelper.c, vm_args.c: rewrite with refactoring.
* rewrite splat argument code.
* rewrite keyword arguments/parameters code.
* merge method and block parameter fitting code into one code base.
* vm.c, vm_eval.c: catch up these changes.
* compile.c (new_callinfo): callinfo requires kw_arg parameter.
* compile.c (compile_array_): check the last argument Hash object or
not. If Hash object and all keys are Symbol literals, they are
compiled to keyword arguments.
* insns.def (checkkeyword): add new instruction.
This instruction check the availability of corresponding keyword.
For example, a method "def foo k1: 'v1'; end" is cimpiled to the
following instructions.
0000 checkkeyword 2, 0 # check k1 is given.
0003 branchif 9 # if given, jump to address #9
0005 putstring "v1"
0007 setlocal_OP__WC__0 3 # k1 = 'v1'
0009 trace 8
0011 putnil
0012 trace 16
0014 leave
* insns.def (opt_send_simple): removed and add new instruction
"opt_send_without_block".
* parse.y (new_args_tail_gen): reorder variables.
Before this patch, a method "def foo(k1: 1, kr1:, k2: 2, **krest, &b)"
has parameter variables "k1, kr1, k2, &b, internal_id, krest",
but this patch reorders to "kr1, k1, k2, internal_id, krest, &b".
(locate a block variable at last)
* parse.y (vtable_pop): added.
This function remove latest `n' variables from vtable.
* iseq.c: catch up iseq data changes.
* proc.c: ditto.
* class.c (keyword_error): export as rb_keyword_error().
* common.mk: depend vm_args.c for vm.o.
* hash.c (rb_hash_has_key): export.
* internal.h: ditto.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@48239 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2014-11-02 13:02:55 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
checkkeyword
|
2017-12-22 19:51:36 -05:00
|
|
|
(lindex_t kw_bits_index, lindex_t keyword_index)
|
* rewrite method/block parameter fitting logic to optimize
keyword arguments/parameters and a splat argument.
[Feature #10440] (Details are described in this ticket)
Most of complex part is moved to vm_args.c.
Now, ISeq#to_a does not catch up new instruction format.
* vm_core.h: change iseq data structures.
* introduce rb_call_info_kw_arg_t to represent keyword arguments.
* add rb_call_info_t::kw_arg.
* rename rb_iseq_t::arg_post_len to rb_iseq_t::arg_post_num.
* rename rb_iseq_t::arg_keywords to arg_keyword_num.
* rename rb_iseq_t::arg_keyword to rb_iseq_t::arg_keyword_bits.
to represent keyword bitmap parameter index.
This bitmap parameter shows that which keyword parameters are given
or not given (0 for given).
It is refered by `checkkeyword' instruction described bellow.
* rename rb_iseq_t::arg_keyword_check to rb_iseq_t::arg_keyword_rest
to represent keyword rest parameter index.
* add rb_iseq_t::arg_keyword_default_values to represent default
keyword values.
* rename VM_CALL_ARGS_SKIP_SETUP to VM_CALL_ARGS_SIMPLE
to represent
(ci->flag & (SPLAT|BLOCKARG)) &&
ci->blockiseq == NULL &&
ci->kw_arg == NULL.
* vm_insnhelper.c, vm_args.c: rewrite with refactoring.
* rewrite splat argument code.
* rewrite keyword arguments/parameters code.
* merge method and block parameter fitting code into one code base.
* vm.c, vm_eval.c: catch up these changes.
* compile.c (new_callinfo): callinfo requires kw_arg parameter.
* compile.c (compile_array_): check the last argument Hash object or
not. If Hash object and all keys are Symbol literals, they are
compiled to keyword arguments.
* insns.def (checkkeyword): add new instruction.
This instruction check the availability of corresponding keyword.
For example, a method "def foo k1: 'v1'; end" is cimpiled to the
following instructions.
0000 checkkeyword 2, 0 # check k1 is given.
0003 branchif 9 # if given, jump to address #9
0005 putstring "v1"
0007 setlocal_OP__WC__0 3 # k1 = 'v1'
0009 trace 8
0011 putnil
0012 trace 16
0014 leave
* insns.def (opt_send_simple): removed and add new instruction
"opt_send_without_block".
* parse.y (new_args_tail_gen): reorder variables.
Before this patch, a method "def foo(k1: 1, kr1:, k2: 2, **krest, &b)"
has parameter variables "k1, kr1, k2, &b, internal_id, krest",
but this patch reorders to "kr1, k1, k2, internal_id, krest, &b".
(locate a block variable at last)
* parse.y (vtable_pop): added.
This function remove latest `n' variables from vtable.
* iseq.c: catch up iseq data changes.
* proc.c: ditto.
* class.c (keyword_error): export as rb_keyword_error().
* common.mk: depend vm_args.c for vm.o.
* hash.c (rb_hash_has_key): export.
* internal.h: ditto.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@48239 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2014-11-02 13:02:55 -05:00
|
|
|
()
|
|
|
|
(VALUE ret)
|
|
|
|
{
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
ret = vm_check_keyword(kw_bits_index, keyword_index, GET_EP());
|
* rewrite method/block parameter fitting logic to optimize
keyword arguments/parameters and a splat argument.
[Feature #10440] (Details are described in this ticket)
Most of complex part is moved to vm_args.c.
Now, ISeq#to_a does not catch up new instruction format.
* vm_core.h: change iseq data structures.
* introduce rb_call_info_kw_arg_t to represent keyword arguments.
* add rb_call_info_t::kw_arg.
* rename rb_iseq_t::arg_post_len to rb_iseq_t::arg_post_num.
* rename rb_iseq_t::arg_keywords to arg_keyword_num.
* rename rb_iseq_t::arg_keyword to rb_iseq_t::arg_keyword_bits.
to represent keyword bitmap parameter index.
This bitmap parameter shows that which keyword parameters are given
or not given (0 for given).
It is refered by `checkkeyword' instruction described bellow.
* rename rb_iseq_t::arg_keyword_check to rb_iseq_t::arg_keyword_rest
to represent keyword rest parameter index.
* add rb_iseq_t::arg_keyword_default_values to represent default
keyword values.
* rename VM_CALL_ARGS_SKIP_SETUP to VM_CALL_ARGS_SIMPLE
to represent
(ci->flag & (SPLAT|BLOCKARG)) &&
ci->blockiseq == NULL &&
ci->kw_arg == NULL.
* vm_insnhelper.c, vm_args.c: rewrite with refactoring.
* rewrite splat argument code.
* rewrite keyword arguments/parameters code.
* merge method and block parameter fitting code into one code base.
* vm.c, vm_eval.c: catch up these changes.
* compile.c (new_callinfo): callinfo requires kw_arg parameter.
* compile.c (compile_array_): check the last argument Hash object or
not. If Hash object and all keys are Symbol literals, they are
compiled to keyword arguments.
* insns.def (checkkeyword): add new instruction.
This instruction check the availability of corresponding keyword.
For example, a method "def foo k1: 'v1'; end" is cimpiled to the
following instructions.
0000 checkkeyword 2, 0 # check k1 is given.
0003 branchif 9 # if given, jump to address #9
0005 putstring "v1"
0007 setlocal_OP__WC__0 3 # k1 = 'v1'
0009 trace 8
0011 putnil
0012 trace 16
0014 leave
* insns.def (opt_send_simple): removed and add new instruction
"opt_send_without_block".
* parse.y (new_args_tail_gen): reorder variables.
Before this patch, a method "def foo(k1: 1, kr1:, k2: 2, **krest, &b)"
has parameter variables "k1, kr1, k2, &b, internal_id, krest",
but this patch reorders to "kr1, k1, k2, internal_id, krest, &b".
(locate a block variable at last)
* parse.y (vtable_pop): added.
This function remove latest `n' variables from vtable.
* iseq.c: catch up iseq data changes.
* proc.c: ditto.
* class.c (keyword_error): export as rb_keyword_error().
* common.mk: depend vm_args.c for vm.o.
* hash.c (rb_hash_has_key): export.
* internal.h: ditto.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@48239 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2014-11-02 13:02:55 -05:00
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* fire a coverage event (currently, this is used for line coverage and branch coverage) */
|
2017-09-13 21:55:30 -04:00
|
|
|
DEFINE_INSN
|
2017-12-19 23:24:14 -05:00
|
|
|
tracecoverage
|
2017-09-13 21:55:30 -04:00
|
|
|
(rb_num_t nf, VALUE data)
|
|
|
|
()
|
|
|
|
()
|
|
|
|
{
|
|
|
|
rb_event_flag_t flag = (rb_event_flag_t)nf;
|
|
|
|
|
2017-11-07 03:19:25 -05:00
|
|
|
vm_dtrace(flag, ec);
|
2017-10-29 09:19:14 -04:00
|
|
|
EXEC_EVENT_HOOK(ec, flag, GET_SELF(), 0, 0, 0 /* id and klass are resolved at callee */, data);
|
2017-09-13 21:55:30 -04:00
|
|
|
}
|
|
|
|
|
2007-01-16 03:52:22 -05:00
|
|
|
/**********************************************************/
|
|
|
|
/* deal with control flow 1: class/module */
|
|
|
|
/**********************************************************/
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* enter class definition scope. if super is Qfalse, and class
|
2007-01-16 03:52:22 -05:00
|
|
|
"klass" is defined, it's redefine. otherwise, define "klass" class.
|
|
|
|
*/
|
|
|
|
DEFINE_INSN
|
|
|
|
defineclass
|
2012-12-20 03:13:53 -05:00
|
|
|
(ID id, ISEQ class_iseq, rb_num_t flags)
|
2007-01-16 03:52:22 -05:00
|
|
|
(VALUE cbase, VALUE super)
|
|
|
|
(VALUE val)
|
2018-01-26 01:30:58 -05:00
|
|
|
// attr bool handles_frame = true;
|
2007-01-16 03:52:22 -05:00
|
|
|
{
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
VALUE klass = vm_find_or_create_class_by_id(id, flags, cbase, super);
|
2007-08-12 15:09:15 -04:00
|
|
|
|
2015-12-08 08:58:50 -05:00
|
|
|
rb_iseq_check(class_iseq);
|
|
|
|
|
2007-01-16 03:52:22 -05:00
|
|
|
/* enter scope */
|
2017-10-27 02:21:50 -04:00
|
|
|
vm_push_frame(ec, class_iseq, VM_FRAME_MAGIC_CLASS | VM_ENV_FLAG_LOCAL, klass,
|
2016-07-28 07:02:30 -04:00
|
|
|
GET_BLOCK_HANDLER(),
|
2017-10-27 02:21:50 -04:00
|
|
|
(VALUE)vm_cref_push(ec, klass, NULL, FALSE),
|
2015-07-21 18:52:59 -04:00
|
|
|
class_iseq->body->iseq_encoded, GET_SP(),
|
2016-07-28 07:02:30 -04:00
|
|
|
class_iseq->body->local_table_size,
|
2015-12-08 08:58:50 -05:00
|
|
|
class_iseq->body->stack_max);
|
mjit_compile.c: merge initial JIT compiler
which has been developed by Takashi Kokubun <takashikkbn@gmail> as
YARV-MJIT. Many of its bugs are fixed by wanabe <s.wanabe@gmail.com>.
This JIT compiler is designed to be a safe migration path to introduce
JIT compiler to MRI. So this commit does not include any bytecode
changes or dynamic instruction modifications, which are done in original
MJIT.
This commit even strips off some aggressive optimizations from
YARV-MJIT, and thus it's slower than YARV-MJIT too. But it's still
fairly faster than Ruby 2.5 in some benchmarks (attached below).
Note that this JIT compiler passes `make test`, `make test-all`, `make
test-spec` without JIT, and even with JIT. Not only it's perfectly safe
with JIT disabled because it does not replace VM instructions unlike
MJIT, but also with JIT enabled it stably runs Ruby applications
including Rails applications.
I'm expecting this version as just "initial" JIT compiler. I have many
optimization ideas which are skipped for initial merging, and you may
easily replace this JIT compiler with a faster one by just replacing
mjit_compile.c. `mjit_compile` interface is designed for the purpose.
common.mk: update dependencies for mjit_compile.c.
internal.h: declare `rb_vm_insn_addr2insn` for MJIT.
vm.c: exclude some definitions if `-DMJIT_HEADER` is provided to
compiler. This avoids to include some functions which take a long time
to compile, e.g. vm_exec_core. Some of the purpose is achieved in
transform_mjit_header.rb (see `IGNORED_FUNCTIONS`) but others are
manually resolved for now. Load mjit_helper.h for MJIT header.
mjit_helper.h: New. This is a file used only by JIT-ed code. I'll
refactor `mjit_call_cfunc` later.
vm_eval.c: add some #ifdef switches to skip compiling some functions
like Init_vm_eval.
win32/mkexports.rb: export thread/ec functions, which are used by MJIT.
include/ruby/defines.h: add MJIT_FUNC_EXPORTED macro alis to clarify
that a function is exported only for MJIT.
array.c: export a function used by MJIT.
bignum.c: ditto.
class.c: ditto.
compile.c: ditto.
error.c: ditto.
gc.c: ditto.
hash.c: ditto.
iseq.c: ditto.
numeric.c: ditto.
object.c: ditto.
proc.c: ditto.
re.c: ditto.
st.c: ditto.
string.c: ditto.
thread.c: ditto.
variable.c: ditto.
vm_backtrace.c: ditto.
vm_insnhelper.c: ditto.
vm_method.c: ditto.
I would like to improve maintainability of function exports, but I
believe this way is acceptable as initial merging if we clarify the
new exports are for MJIT (so that we can use them as TODO list to fix)
and add unit tests to detect unresolved symbols.
I'll add unit tests of JIT compilations in succeeding commits.
Author: Takashi Kokubun <takashikkbn@gmail.com>
Contributor: wanabe <s.wanabe@gmail.com>
Part of [Feature #14235]
---
* Known issues
* Code generated by gcc is faster than clang. The benchmark may be worse
in macOS. Following benchmark result is provided by gcc w/ Linux.
* Performance is decreased when Google Chrome is running
* JIT can work on MinGW, but it doesn't improve performance at least
in short running benchmark.
* Currently it doesn't perform well with Rails. We'll try to fix this
before release.
---
* Benchmark reslts
Benchmarked with:
Intel 4.0GHz i7-4790K with 16GB memory under x86-64 Ubuntu 8 Cores
- 2.0.0-p0: Ruby 2.0.0-p0
- r62186: Ruby trunk (early 2.6.0), before MJIT changes
- JIT off: On this commit, but without `--jit` option
- JIT on: On this commit, and with `--jit` option
** Optcarrot fps
Benchmark: https://github.com/mame/optcarrot
| |2.0.0-p0 |r62186 |JIT off |JIT on |
|:--------|:--------|:--------|:--------|:--------|
|fps |37.32 |51.46 |51.31 |58.88 |
|vs 2.0.0 |1.00x |1.38x |1.37x |1.58x |
** MJIT benchmarks
Benchmark: https://github.com/benchmark-driver/mjit-benchmarks
(Original: https://github.com/vnmakarov/ruby/tree/rtl_mjit_branch/MJIT-benchmarks)
| |2.0.0-p0 |r62186 |JIT off |JIT on |
|:----------|:--------|:--------|:--------|:--------|
|aread |1.00 |1.09 |1.07 |2.19 |
|aref |1.00 |1.13 |1.11 |2.22 |
|aset |1.00 |1.50 |1.45 |2.64 |
|awrite |1.00 |1.17 |1.13 |2.20 |
|call |1.00 |1.29 |1.26 |2.02 |
|const2 |1.00 |1.10 |1.10 |2.19 |
|const |1.00 |1.11 |1.10 |2.19 |
|fannk |1.00 |1.04 |1.02 |1.00 |
|fib |1.00 |1.32 |1.31 |1.84 |
|ivread |1.00 |1.13 |1.12 |2.43 |
|ivwrite |1.00 |1.23 |1.21 |2.40 |
|mandelbrot |1.00 |1.13 |1.16 |1.28 |
|meteor |1.00 |2.97 |2.92 |3.17 |
|nbody |1.00 |1.17 |1.15 |1.49 |
|nest-ntimes|1.00 |1.22 |1.20 |1.39 |
|nest-while |1.00 |1.10 |1.10 |1.37 |
|norm |1.00 |1.18 |1.16 |1.24 |
|nsvb |1.00 |1.16 |1.16 |1.17 |
|red-black |1.00 |1.02 |0.99 |1.12 |
|sieve |1.00 |1.30 |1.28 |1.62 |
|trees |1.00 |1.14 |1.13 |1.19 |
|while |1.00 |1.12 |1.11 |2.41 |
** Discourse's script/bench.rb
Benchmark: https://github.com/discourse/discourse/blob/v1.8.7/script/bench.rb
NOTE: Rails performance was somehow a little degraded with JIT for now.
We should fix this.
(At least I know opt_aref is performing badly in JIT and I have an idea
to fix it. Please wait for the fix.)
*** JIT off
Your Results: (note for timings- percentile is first, duration is second in millisecs)
categories_admin:
50: 17
75: 18
90: 22
99: 29
home_admin:
50: 21
75: 21
90: 27
99: 40
topic_admin:
50: 17
75: 18
90: 22
99: 32
categories:
50: 35
75: 41
90: 43
99: 77
home:
50: 39
75: 46
90: 49
99: 95
topic:
50: 46
75: 52
90: 56
99: 101
*** JIT on
Your Results: (note for timings- percentile is first, duration is second in millisecs)
categories_admin:
50: 19
75: 21
90: 25
99: 33
home_admin:
50: 24
75: 26
90: 30
99: 35
topic_admin:
50: 19
75: 20
90: 25
99: 30
categories:
50: 40
75: 44
90: 48
99: 76
home:
50: 42
75: 48
90: 51
99: 89
topic:
50: 49
75: 55
90: 58
99: 99
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62197 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-02-04 06:22:28 -05:00
|
|
|
EXEC_EC_CFP();
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
|
|
|
|
/**********************************************************/
|
|
|
|
/* deal with control flow 2: method/iterator */
|
|
|
|
/**********************************************************/
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* invoke method. */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
send
|
2015-09-19 13:59:58 -04:00
|
|
|
(CALL_INFO ci, CALL_CACHE cc, ISEQ blockiseq)
|
2007-01-16 03:52:22 -05:00
|
|
|
(...)
|
2018-01-12 03:38:07 -05:00
|
|
|
(VALUE val)
|
2018-01-29 02:04:50 -05:00
|
|
|
// attr bool handles_frame = true;
|
2018-01-12 03:38:07 -05:00
|
|
|
// attr rb_snum_t sp_inc = - (int)(ci->orig_argc + ((ci->flag & VM_CALL_ARGS_BLOCKARG) ? 1 : 0));
|
2007-01-16 03:52:22 -05:00
|
|
|
{
|
2015-09-19 13:59:58 -04:00
|
|
|
struct rb_calling_info calling;
|
2015-10-01 06:50:49 -04:00
|
|
|
|
2017-10-27 02:21:50 -04:00
|
|
|
vm_caller_setup_arg_block(ec, reg_cfp, &calling, ci, blockiseq, FALSE);
|
2015-09-19 13:59:58 -04:00
|
|
|
vm_search_method(ci, cc, calling.recv = TOPN(calling.argc = ci->orig_argc));
|
|
|
|
CALL_METHOD(&calling, ci, cc);
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
|
2013-11-09 16:17:06 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
opt_str_freeze
|
|
|
|
(VALUE str)
|
|
|
|
()
|
|
|
|
(VALUE val)
|
|
|
|
{
|
|
|
|
if (BASIC_OP_UNREDEFINED_P(BOP_FREEZE, STRING_REDEFINED_OP_FLAG)) {
|
|
|
|
val = str;
|
|
|
|
}
|
|
|
|
else {
|
|
|
|
val = rb_funcall(rb_str_resurrect(str), idFreeze, 0);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2017-03-27 02:12:37 -04:00
|
|
|
DEFINE_INSN
|
|
|
|
opt_str_uminus
|
|
|
|
(VALUE str)
|
|
|
|
()
|
|
|
|
(VALUE val)
|
|
|
|
{
|
|
|
|
if (BASIC_OP_UNREDEFINED_P(BOP_UMINUS, STRING_REDEFINED_OP_FLAG)) {
|
|
|
|
val = str;
|
|
|
|
}
|
|
|
|
else {
|
|
|
|
val = rb_funcall(rb_str_resurrect(str), idUMinus, 0);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2016-03-17 08:47:31 -04:00
|
|
|
DEFINE_INSN
|
|
|
|
opt_newarray_max
|
|
|
|
(rb_num_t num)
|
|
|
|
(...)
|
2018-01-12 03:38:07 -05:00
|
|
|
(VALUE val)
|
|
|
|
// attr rb_snum_t sp_inc = 1 - num;
|
2016-03-17 08:47:31 -04:00
|
|
|
{
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
val = vm_opt_newarray_max(num, STACK_ADDR_FROM_TOP(num));
|
2016-03-17 08:47:31 -04:00
|
|
|
}
|
|
|
|
|
|
|
|
DEFINE_INSN
|
|
|
|
opt_newarray_min
|
|
|
|
(rb_num_t num)
|
|
|
|
(...)
|
2018-01-12 03:38:07 -05:00
|
|
|
(VALUE val)
|
|
|
|
// attr rb_snum_t sp_inc = 1 - num;
|
2016-03-17 08:47:31 -04:00
|
|
|
{
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
val = vm_opt_newarray_min(num, STACK_ADDR_FROM_TOP(num));
|
2016-03-17 08:47:31 -04:00
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* Invoke method without block */
|
2012-10-18 05:44:19 -04:00
|
|
|
DEFINE_INSN
|
* rewrite method/block parameter fitting logic to optimize
keyword arguments/parameters and a splat argument.
[Feature #10440] (Details are described in this ticket)
Most of complex part is moved to vm_args.c.
Now, ISeq#to_a does not catch up new instruction format.
* vm_core.h: change iseq data structures.
* introduce rb_call_info_kw_arg_t to represent keyword arguments.
* add rb_call_info_t::kw_arg.
* rename rb_iseq_t::arg_post_len to rb_iseq_t::arg_post_num.
* rename rb_iseq_t::arg_keywords to arg_keyword_num.
* rename rb_iseq_t::arg_keyword to rb_iseq_t::arg_keyword_bits.
to represent keyword bitmap parameter index.
This bitmap parameter shows that which keyword parameters are given
or not given (0 for given).
It is refered by `checkkeyword' instruction described bellow.
* rename rb_iseq_t::arg_keyword_check to rb_iseq_t::arg_keyword_rest
to represent keyword rest parameter index.
* add rb_iseq_t::arg_keyword_default_values to represent default
keyword values.
* rename VM_CALL_ARGS_SKIP_SETUP to VM_CALL_ARGS_SIMPLE
to represent
(ci->flag & (SPLAT|BLOCKARG)) &&
ci->blockiseq == NULL &&
ci->kw_arg == NULL.
* vm_insnhelper.c, vm_args.c: rewrite with refactoring.
* rewrite splat argument code.
* rewrite keyword arguments/parameters code.
* merge method and block parameter fitting code into one code base.
* vm.c, vm_eval.c: catch up these changes.
* compile.c (new_callinfo): callinfo requires kw_arg parameter.
* compile.c (compile_array_): check the last argument Hash object or
not. If Hash object and all keys are Symbol literals, they are
compiled to keyword arguments.
* insns.def (checkkeyword): add new instruction.
This instruction check the availability of corresponding keyword.
For example, a method "def foo k1: 'v1'; end" is cimpiled to the
following instructions.
0000 checkkeyword 2, 0 # check k1 is given.
0003 branchif 9 # if given, jump to address #9
0005 putstring "v1"
0007 setlocal_OP__WC__0 3 # k1 = 'v1'
0009 trace 8
0011 putnil
0012 trace 16
0014 leave
* insns.def (opt_send_simple): removed and add new instruction
"opt_send_without_block".
* parse.y (new_args_tail_gen): reorder variables.
Before this patch, a method "def foo(k1: 1, kr1:, k2: 2, **krest, &b)"
has parameter variables "k1, kr1, k2, &b, internal_id, krest",
but this patch reorders to "kr1, k1, k2, internal_id, krest, &b".
(locate a block variable at last)
* parse.y (vtable_pop): added.
This function remove latest `n' variables from vtable.
* iseq.c: catch up iseq data changes.
* proc.c: ditto.
* class.c (keyword_error): export as rb_keyword_error().
* common.mk: depend vm_args.c for vm.o.
* hash.c (rb_hash_has_key): export.
* internal.h: ditto.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@48239 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2014-11-02 13:02:55 -05:00
|
|
|
opt_send_without_block
|
2015-09-19 13:59:58 -04:00
|
|
|
(CALL_INFO ci, CALL_CACHE cc)
|
2012-10-18 05:44:19 -04:00
|
|
|
(...)
|
2018-01-12 03:38:07 -05:00
|
|
|
(VALUE val)
|
2018-01-29 02:04:50 -05:00
|
|
|
// attr bool handles_frame = true;
|
2018-01-12 03:38:07 -05:00
|
|
|
// attr rb_snum_t sp_inc = -ci->orig_argc;
|
2012-10-18 05:44:19 -04:00
|
|
|
{
|
2015-09-19 13:59:58 -04:00
|
|
|
struct rb_calling_info calling;
|
2016-07-28 07:02:30 -04:00
|
|
|
calling.block_handler = VM_BLOCK_HANDLER_NONE;
|
2015-09-19 13:59:58 -04:00
|
|
|
vm_search_method(ci, cc, calling.recv = TOPN(calling.argc = ci->orig_argc));
|
|
|
|
CALL_METHOD(&calling, ci, cc);
|
2012-10-18 05:44:19 -04:00
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* super(args) # args.size => num */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
invokesuper
|
2015-09-19 13:59:58 -04:00
|
|
|
(CALL_INFO ci, CALL_CACHE cc, ISEQ blockiseq)
|
2007-01-16 03:52:22 -05:00
|
|
|
(...)
|
2018-01-12 03:38:07 -05:00
|
|
|
(VALUE val)
|
2018-01-29 02:04:50 -05:00
|
|
|
// attr bool handles_frame = true;
|
2018-01-12 03:38:07 -05:00
|
|
|
// attr rb_snum_t sp_inc = - (int)(ci->orig_argc + ((ci->flag & VM_CALL_ARGS_BLOCKARG) ? 1 : 0));
|
2007-01-16 03:52:22 -05:00
|
|
|
{
|
2015-09-19 13:59:58 -04:00
|
|
|
struct rb_calling_info calling;
|
|
|
|
calling.argc = ci->orig_argc;
|
|
|
|
|
2017-10-27 02:21:50 -04:00
|
|
|
vm_caller_setup_arg_block(ec, reg_cfp, &calling, ci, blockiseq, TRUE);
|
2015-09-19 13:59:58 -04:00
|
|
|
calling.recv = GET_SELF();
|
2017-10-27 02:21:50 -04:00
|
|
|
vm_search_super_method(ec, GET_CFP(), &calling, ci, cc);
|
2015-09-19 13:59:58 -04:00
|
|
|
CALL_METHOD(&calling, ci, cc);
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* yield(args) */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
invokeblock
|
* insns.def (send, invokesuper, invokeblock, opt_*), vm_core.h:
use only a `ci' (rb_call_info_t) parameter instead of using
parameters such as `op_id', 'op_argc', `blockiseq' and flag.
These information are stored in rb_call_info_t at the compile
time.
This technique simplifies parameter passings at related
function calls (~10% speedups for simple mehtod invocation at
my machine).
`rb_call_info_t' also has new function pointer variable `call'.
This `call' variable enables to customize method (block)
invocation process for each place. However, it always call
`vm_call_general()' at this changes.
`rb_call_info_t' also has temporary variables for method
(block) invocation.
* vm_core.h, compile.c, insns.def: introduce VM_CALL_ARGS_SKIP_SETUP
VM_CALL macro. This flag indicates that this call can skip
caller_setup (block arg and splat arg).
* compile.c: catch up above changes.
* iseq.c: catch up above changes (especially for TS_CALLINFO).
* tool/instruction.rb: catch up above chagnes.
* vm_insnhelper.c, vm_insnhelper.h: ditto. Macros and functions
parameters are changed.
* vm_eval.c (vm_call0): ditto (it will be rewriten soon).
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@37180 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2012-10-14 12:59:05 -04:00
|
|
|
(CALL_INFO ci)
|
2007-01-16 03:52:22 -05:00
|
|
|
(...)
|
2018-01-12 03:38:07 -05:00
|
|
|
(VALUE val)
|
2018-01-29 02:04:50 -05:00
|
|
|
// attr bool handles_frame = true;
|
2018-01-12 03:38:07 -05:00
|
|
|
// attr rb_snum_t sp_inc = 1 - ci->orig_argc;
|
2007-01-16 03:52:22 -05:00
|
|
|
{
|
2015-09-19 13:59:58 -04:00
|
|
|
struct rb_calling_info calling;
|
2018-01-05 12:51:10 -05:00
|
|
|
VALUE block_handler;
|
|
|
|
|
2015-09-19 13:59:58 -04:00
|
|
|
calling.argc = ci->orig_argc;
|
2016-07-28 07:02:30 -04:00
|
|
|
calling.block_handler = VM_BLOCK_HANDLER_NONE;
|
2018-01-05 02:25:55 -05:00
|
|
|
calling.recv = Qundef; /* should not be used */
|
2015-09-19 13:59:58 -04:00
|
|
|
|
2018-01-05 12:51:10 -05:00
|
|
|
block_handler = VM_CF_BLOCK_HANDLER(GET_CFP());
|
|
|
|
if (block_handler == VM_BLOCK_HANDLER_NONE) {
|
|
|
|
rb_vm_localjump_error("no block given (yield)", Qnil, 0);
|
|
|
|
}
|
|
|
|
|
|
|
|
val = vm_invoke_block(ec, GET_CFP(), &calling, ci, block_handler);
|
2007-08-06 07:36:30 -04:00
|
|
|
if (val == Qundef) {
|
mjit_compile.c: merge initial JIT compiler
which has been developed by Takashi Kokubun <takashikkbn@gmail> as
YARV-MJIT. Many of its bugs are fixed by wanabe <s.wanabe@gmail.com>.
This JIT compiler is designed to be a safe migration path to introduce
JIT compiler to MRI. So this commit does not include any bytecode
changes or dynamic instruction modifications, which are done in original
MJIT.
This commit even strips off some aggressive optimizations from
YARV-MJIT, and thus it's slower than YARV-MJIT too. But it's still
fairly faster than Ruby 2.5 in some benchmarks (attached below).
Note that this JIT compiler passes `make test`, `make test-all`, `make
test-spec` without JIT, and even with JIT. Not only it's perfectly safe
with JIT disabled because it does not replace VM instructions unlike
MJIT, but also with JIT enabled it stably runs Ruby applications
including Rails applications.
I'm expecting this version as just "initial" JIT compiler. I have many
optimization ideas which are skipped for initial merging, and you may
easily replace this JIT compiler with a faster one by just replacing
mjit_compile.c. `mjit_compile` interface is designed for the purpose.
common.mk: update dependencies for mjit_compile.c.
internal.h: declare `rb_vm_insn_addr2insn` for MJIT.
vm.c: exclude some definitions if `-DMJIT_HEADER` is provided to
compiler. This avoids to include some functions which take a long time
to compile, e.g. vm_exec_core. Some of the purpose is achieved in
transform_mjit_header.rb (see `IGNORED_FUNCTIONS`) but others are
manually resolved for now. Load mjit_helper.h for MJIT header.
mjit_helper.h: New. This is a file used only by JIT-ed code. I'll
refactor `mjit_call_cfunc` later.
vm_eval.c: add some #ifdef switches to skip compiling some functions
like Init_vm_eval.
win32/mkexports.rb: export thread/ec functions, which are used by MJIT.
include/ruby/defines.h: add MJIT_FUNC_EXPORTED macro alis to clarify
that a function is exported only for MJIT.
array.c: export a function used by MJIT.
bignum.c: ditto.
class.c: ditto.
compile.c: ditto.
error.c: ditto.
gc.c: ditto.
hash.c: ditto.
iseq.c: ditto.
numeric.c: ditto.
object.c: ditto.
proc.c: ditto.
re.c: ditto.
st.c: ditto.
string.c: ditto.
thread.c: ditto.
variable.c: ditto.
vm_backtrace.c: ditto.
vm_insnhelper.c: ditto.
vm_method.c: ditto.
I would like to improve maintainability of function exports, but I
believe this way is acceptable as initial merging if we clarify the
new exports are for MJIT (so that we can use them as TODO list to fix)
and add unit tests to detect unresolved symbols.
I'll add unit tests of JIT compilations in succeeding commits.
Author: Takashi Kokubun <takashikkbn@gmail.com>
Contributor: wanabe <s.wanabe@gmail.com>
Part of [Feature #14235]
---
* Known issues
* Code generated by gcc is faster than clang. The benchmark may be worse
in macOS. Following benchmark result is provided by gcc w/ Linux.
* Performance is decreased when Google Chrome is running
* JIT can work on MinGW, but it doesn't improve performance at least
in short running benchmark.
* Currently it doesn't perform well with Rails. We'll try to fix this
before release.
---
* Benchmark reslts
Benchmarked with:
Intel 4.0GHz i7-4790K with 16GB memory under x86-64 Ubuntu 8 Cores
- 2.0.0-p0: Ruby 2.0.0-p0
- r62186: Ruby trunk (early 2.6.0), before MJIT changes
- JIT off: On this commit, but without `--jit` option
- JIT on: On this commit, and with `--jit` option
** Optcarrot fps
Benchmark: https://github.com/mame/optcarrot
| |2.0.0-p0 |r62186 |JIT off |JIT on |
|:--------|:--------|:--------|:--------|:--------|
|fps |37.32 |51.46 |51.31 |58.88 |
|vs 2.0.0 |1.00x |1.38x |1.37x |1.58x |
** MJIT benchmarks
Benchmark: https://github.com/benchmark-driver/mjit-benchmarks
(Original: https://github.com/vnmakarov/ruby/tree/rtl_mjit_branch/MJIT-benchmarks)
| |2.0.0-p0 |r62186 |JIT off |JIT on |
|:----------|:--------|:--------|:--------|:--------|
|aread |1.00 |1.09 |1.07 |2.19 |
|aref |1.00 |1.13 |1.11 |2.22 |
|aset |1.00 |1.50 |1.45 |2.64 |
|awrite |1.00 |1.17 |1.13 |2.20 |
|call |1.00 |1.29 |1.26 |2.02 |
|const2 |1.00 |1.10 |1.10 |2.19 |
|const |1.00 |1.11 |1.10 |2.19 |
|fannk |1.00 |1.04 |1.02 |1.00 |
|fib |1.00 |1.32 |1.31 |1.84 |
|ivread |1.00 |1.13 |1.12 |2.43 |
|ivwrite |1.00 |1.23 |1.21 |2.40 |
|mandelbrot |1.00 |1.13 |1.16 |1.28 |
|meteor |1.00 |2.97 |2.92 |3.17 |
|nbody |1.00 |1.17 |1.15 |1.49 |
|nest-ntimes|1.00 |1.22 |1.20 |1.39 |
|nest-while |1.00 |1.10 |1.10 |1.37 |
|norm |1.00 |1.18 |1.16 |1.24 |
|nsvb |1.00 |1.16 |1.16 |1.17 |
|red-black |1.00 |1.02 |0.99 |1.12 |
|sieve |1.00 |1.30 |1.28 |1.62 |
|trees |1.00 |1.14 |1.13 |1.19 |
|while |1.00 |1.12 |1.11 |2.41 |
** Discourse's script/bench.rb
Benchmark: https://github.com/discourse/discourse/blob/v1.8.7/script/bench.rb
NOTE: Rails performance was somehow a little degraded with JIT for now.
We should fix this.
(At least I know opt_aref is performing badly in JIT and I have an idea
to fix it. Please wait for the fix.)
*** JIT off
Your Results: (note for timings- percentile is first, duration is second in millisecs)
categories_admin:
50: 17
75: 18
90: 22
99: 29
home_admin:
50: 21
75: 21
90: 27
99: 40
topic_admin:
50: 17
75: 18
90: 22
99: 32
categories:
50: 35
75: 41
90: 43
99: 77
home:
50: 39
75: 46
90: 49
99: 95
topic:
50: 46
75: 52
90: 56
99: 101
*** JIT on
Your Results: (note for timings- percentile is first, duration is second in millisecs)
categories_admin:
50: 19
75: 21
90: 25
99: 33
home_admin:
50: 24
75: 26
90: 30
99: 35
topic_admin:
50: 19
75: 20
90: 25
99: 30
categories:
50: 40
75: 44
90: 48
99: 76
home:
50: 42
75: 48
90: 51
99: 89
topic:
50: 49
75: 55
90: 58
99: 99
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62197 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-02-04 06:22:28 -05:00
|
|
|
EXEC_EC_CFP();
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* return from this scope. */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
leave
|
|
|
|
()
|
|
|
|
(VALUE val)
|
|
|
|
(VALUE val)
|
2018-01-26 01:30:58 -05:00
|
|
|
// attr bool handles_frame = true;
|
2007-01-16 03:52:22 -05:00
|
|
|
{
|
|
|
|
if (OPT_CHECKED_RUN) {
|
2015-08-05 01:43:58 -04:00
|
|
|
const VALUE *const bp = vm_base_ptr(reg_cfp);
|
|
|
|
if (reg_cfp->sp != bp) {
|
2017-10-27 02:21:50 -04:00
|
|
|
vm_stack_consistency_error(ec, reg_cfp, bp);
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2017-11-06 02:44:28 -05:00
|
|
|
RUBY_VM_CHECK_INTS(ec);
|
2007-01-16 03:52:22 -05:00
|
|
|
|
2017-10-27 02:21:50 -04:00
|
|
|
if (vm_pop_frame(ec, GET_CFP(), GET_EP())) {
|
2007-06-27 04:21:21 -04:00
|
|
|
#if OPT_CALL_THREADED_CODE
|
2017-10-27 15:16:51 -04:00
|
|
|
rb_ec_thread_ptr(ec)->retval = val;
|
2012-08-07 07:13:57 -04:00
|
|
|
return 0;
|
2007-06-27 04:21:21 -04:00
|
|
|
#else
|
* vm_core.h: remove VM_FRAME_MAGIC_FINISH (finish frame type).
Before this commit:
`finish frame' was place holder which indicates that VM loop
needs to return function.
If a C method calls a Ruby methods (a method written by Ruby),
then VM loop will be (re-)invoked. When the Ruby method returns,
then also VM loop should be escaped. `finish frame' has only
one instruction `finish', which returns VM loop function.
VM loop function executes `finish' instruction, then VM loop
function returns itself.
With such mechanism, `leave' instruction (which returns one
frame from current scope) doesn't need to check that this `leave'
should also return from VM loop function.
Strictly, one branch can be removed from `leave' instructon.
Consideration:
However, pushing the `finish frame' needs costs because
it needs several memory accesses. The number of pushing
`finish frame' is greater than I had assumed. Of course,
pushing `finish frame' consumes additional control frame.
Moreover, recent processors has good branch prediction,
with which we can ignore such trivial checking.
After this commit:
Finally, I decide to remove `finish frame' and `finish'
instruction. Some parts of VM depend on `finish frame',
so the new frame flag VM_FRAME_FLAG_FINISH is introduced.
If this frame should escape from VM function loop, then
the result of VM_FRAME_TYPE_FINISH_P(cfp) is true.
`leave' instruction checks this flag every time.
I measured performance on it. However on my environments,
it improves some benchmarks and slows some benchmarks down.
Maybe it is because of C compiler optimization parameters.
I'll re-visit here if this cause problems.
* insns.def (leave, finish): remove finish instruction.
* vm.c, vm_eval.c, vm_exec.c, vm_backtrace.c, vm_dump.c:
apply above changes.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@36099 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2012-06-15 06:22:34 -04:00
|
|
|
return val;
|
2007-06-27 04:21:21 -04:00
|
|
|
#endif
|
* vm_core.h: remove VM_FRAME_MAGIC_FINISH (finish frame type).
Before this commit:
`finish frame' was place holder which indicates that VM loop
needs to return function.
If a C method calls a Ruby methods (a method written by Ruby),
then VM loop will be (re-)invoked. When the Ruby method returns,
then also VM loop should be escaped. `finish frame' has only
one instruction `finish', which returns VM loop function.
VM loop function executes `finish' instruction, then VM loop
function returns itself.
With such mechanism, `leave' instruction (which returns one
frame from current scope) doesn't need to check that this `leave'
should also return from VM loop function.
Strictly, one branch can be removed from `leave' instructon.
Consideration:
However, pushing the `finish frame' needs costs because
it needs several memory accesses. The number of pushing
`finish frame' is greater than I had assumed. Of course,
pushing `finish frame' consumes additional control frame.
Moreover, recent processors has good branch prediction,
with which we can ignore such trivial checking.
After this commit:
Finally, I decide to remove `finish frame' and `finish'
instruction. Some parts of VM depend on `finish frame',
so the new frame flag VM_FRAME_FLAG_FINISH is introduced.
If this frame should escape from VM function loop, then
the result of VM_FRAME_TYPE_FINISH_P(cfp) is true.
`leave' instruction checks this flag every time.
I measured performance on it. However on my environments,
it improves some benchmarks and slows some benchmarks down.
Maybe it is because of C compiler optimization parameters.
I'll re-visit here if this cause problems.
* insns.def (leave, finish): remove finish instruction.
* vm.c, vm_eval.c, vm_exec.c, vm_backtrace.c, vm_dump.c:
apply above changes.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@36099 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2012-06-15 06:22:34 -04:00
|
|
|
}
|
|
|
|
else {
|
|
|
|
RESTORE_REGS();
|
|
|
|
}
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
|
|
|
|
/**********************************************************/
|
|
|
|
/* deal with control flow 3: exception */
|
|
|
|
/**********************************************************/
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* longjump */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
throw
|
2007-05-03 05:09:14 -04:00
|
|
|
(rb_num_t throw_state)
|
2007-01-16 03:52:22 -05:00
|
|
|
(VALUE throwobj)
|
|
|
|
(VALUE val)
|
|
|
|
{
|
2017-11-06 02:44:28 -05:00
|
|
|
RUBY_VM_CHECK_INTS(ec);
|
2017-10-27 02:21:50 -04:00
|
|
|
val = vm_throw(ec, GET_CFP(), throw_state, throwobj);
|
2007-08-06 07:36:30 -04:00
|
|
|
THROW_EXCEPTION(val);
|
2007-01-16 03:52:22 -05:00
|
|
|
/* unreachable */
|
|
|
|
}
|
|
|
|
|
|
|
|
/**********************************************************/
|
|
|
|
/* deal with control flow 4: local jump */
|
|
|
|
/**********************************************************/
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* set PC to (PC + dst). */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
jump
|
|
|
|
(OFFSET dst)
|
|
|
|
()
|
|
|
|
()
|
|
|
|
{
|
2017-11-06 02:44:28 -05:00
|
|
|
RUBY_VM_CHECK_INTS(ec);
|
2007-01-16 03:52:22 -05:00
|
|
|
JUMP(dst);
|
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* if val is not false or nil, set PC to (PC + dst). */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
branchif
|
|
|
|
(OFFSET dst)
|
|
|
|
(VALUE val)
|
|
|
|
()
|
|
|
|
{
|
|
|
|
if (RTEST(val)) {
|
2017-11-06 02:44:28 -05:00
|
|
|
RUBY_VM_CHECK_INTS(ec);
|
2007-01-16 03:52:22 -05:00
|
|
|
JUMP(dst);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* if val is false or nil, set PC to (PC + dst). */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
branchunless
|
|
|
|
(OFFSET dst)
|
|
|
|
(VALUE val)
|
|
|
|
()
|
|
|
|
{
|
|
|
|
if (!RTEST(val)) {
|
2017-11-06 02:44:28 -05:00
|
|
|
RUBY_VM_CHECK_INTS(ec);
|
2007-01-16 03:52:22 -05:00
|
|
|
JUMP(dst);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* if val is nil, set PC to (PC + dst). */
|
2015-10-22 02:30:12 -04:00
|
|
|
DEFINE_INSN
|
|
|
|
branchnil
|
|
|
|
(OFFSET dst)
|
|
|
|
(VALUE val)
|
|
|
|
()
|
|
|
|
{
|
|
|
|
if (NIL_P(val)) {
|
2017-11-06 02:44:28 -05:00
|
|
|
RUBY_VM_CHECK_INTS(ec);
|
2015-10-22 02:30:12 -04:00
|
|
|
JUMP(dst);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* if val is type, set PC to (PC + dst). */
|
2017-09-17 22:27:13 -04:00
|
|
|
DEFINE_INSN
|
|
|
|
branchiftype
|
|
|
|
(rb_num_t type, OFFSET dst)
|
|
|
|
(VALUE val)
|
|
|
|
()
|
|
|
|
{
|
|
|
|
if (TYPE(val) == (int)type) {
|
2017-11-06 02:44:28 -05:00
|
|
|
RUBY_VM_CHECK_INTS(ec);
|
2017-09-17 22:27:13 -04:00
|
|
|
JUMP(dst);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2007-01-16 03:52:22 -05:00
|
|
|
/**********************************************************/
|
|
|
|
/* for optimize */
|
|
|
|
/**********************************************************/
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* push inline-cached value and go to dst if it is valid */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
getinlinecache
|
2009-07-13 00:44:20 -04:00
|
|
|
(OFFSET dst, IC ic)
|
2007-01-16 03:52:22 -05:00
|
|
|
()
|
|
|
|
(VALUE val)
|
|
|
|
{
|
2018-02-10 11:54:47 -05:00
|
|
|
if (vm_ic_hit_p(ic, GET_EP())) {
|
|
|
|
val = ic->ic_value.value;
|
2007-01-16 03:52:22 -05:00
|
|
|
JUMP(dst);
|
|
|
|
}
|
2018-02-10 11:54:47 -05:00
|
|
|
else {
|
|
|
|
val = Qnil;
|
|
|
|
}
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* set inline cache */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
setinlinecache
|
2010-02-24 12:06:15 -05:00
|
|
|
(IC ic)
|
2007-01-16 03:52:22 -05:00
|
|
|
(VALUE val)
|
|
|
|
(VALUE val)
|
|
|
|
{
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
vm_ic_update(ic, val, GET_EP());
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* run iseq only once */
|
2013-08-20 13:41:13 -04:00
|
|
|
DEFINE_INSN
|
|
|
|
once
|
|
|
|
(ISEQ iseq, IC ic)
|
|
|
|
()
|
|
|
|
(VALUE val)
|
|
|
|
{
|
2017-11-07 01:14:00 -05:00
|
|
|
val = vm_once_dispatch(ec, iseq, ic);
|
2013-08-20 13:41:13 -04:00
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* case dispatcher, jump by table if possible */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
opt_case_dispatch
|
|
|
|
(CDHASH hash, OFFSET else_offset)
|
|
|
|
(..., VALUE key)
|
2018-01-12 03:38:07 -05:00
|
|
|
()
|
|
|
|
// attr rb_snum_t sp_inc = -1;
|
2007-01-16 03:52:22 -05:00
|
|
|
{
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
OFFSET dst = vm_case_dispatch(hash, else_offset, key);
|
|
|
|
|
|
|
|
if (dst) {
|
|
|
|
JUMP(dst);
|
2009-08-12 01:55:06 -04:00
|
|
|
}
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
|
|
|
|
/** simple functions */
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* optimized X+Y. */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
opt_plus
|
2015-09-19 13:59:58 -04:00
|
|
|
(CALL_INFO ci, CALL_CACHE cc)
|
2007-01-16 03:52:22 -05:00
|
|
|
(VALUE recv, VALUE obj)
|
|
|
|
(VALUE val)
|
|
|
|
{
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
val = vm_opt_plus(recv, obj);
|
|
|
|
|
|
|
|
if (val == Qundef) {
|
2018-01-29 02:04:50 -05:00
|
|
|
DISPATCH_ORIGINAL_INSN(opt_send_without_block);
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* optimized X-Y. */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
opt_minus
|
2015-09-19 13:59:58 -04:00
|
|
|
(CALL_INFO ci, CALL_CACHE cc)
|
2007-01-16 03:52:22 -05:00
|
|
|
(VALUE recv, VALUE obj)
|
|
|
|
(VALUE val)
|
|
|
|
{
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
val = vm_opt_minus(recv, obj);
|
|
|
|
|
|
|
|
if (val == Qundef) {
|
2018-01-29 02:04:50 -05:00
|
|
|
DISPATCH_ORIGINAL_INSN(opt_send_without_block);
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* optimized X*Y. */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
opt_mult
|
2015-09-19 13:59:58 -04:00
|
|
|
(CALL_INFO ci, CALL_CACHE cc)
|
2007-01-16 03:52:22 -05:00
|
|
|
(VALUE recv, VALUE obj)
|
|
|
|
(VALUE val)
|
|
|
|
{
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
val = vm_opt_mult(recv, obj);
|
|
|
|
|
|
|
|
if (val == Qundef) {
|
2018-01-29 02:04:50 -05:00
|
|
|
DISPATCH_ORIGINAL_INSN(opt_send_without_block);
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* optimized X/Y. */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
opt_div
|
2015-09-19 13:59:58 -04:00
|
|
|
(CALL_INFO ci, CALL_CACHE cc)
|
2007-01-16 03:52:22 -05:00
|
|
|
(VALUE recv, VALUE obj)
|
|
|
|
(VALUE val)
|
|
|
|
{
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
val = vm_opt_div(recv, obj);
|
|
|
|
|
|
|
|
if (val == Qundef) {
|
2018-01-29 02:04:50 -05:00
|
|
|
DISPATCH_ORIGINAL_INSN(opt_send_without_block);
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* optimized X%Y. */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
opt_mod
|
2015-09-19 13:59:58 -04:00
|
|
|
(CALL_INFO ci, CALL_CACHE cc)
|
2007-01-16 03:52:22 -05:00
|
|
|
(VALUE recv, VALUE obj)
|
|
|
|
(VALUE val)
|
|
|
|
{
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
val = vm_opt_mod(recv, obj);
|
|
|
|
|
|
|
|
if (val == Qundef) {
|
2018-01-29 02:04:50 -05:00
|
|
|
DISPATCH_ORIGINAL_INSN(opt_send_without_block);
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* optimized X==Y. */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
opt_eq
|
2015-09-19 13:59:58 -04:00
|
|
|
(CALL_INFO ci, CALL_CACHE cc)
|
2007-01-16 03:52:22 -05:00
|
|
|
(VALUE recv, VALUE obj)
|
|
|
|
(VALUE val)
|
|
|
|
{
|
2015-09-19 13:59:58 -04:00
|
|
|
val = opt_eq_func(recv, obj, ci, cc);
|
2007-01-16 03:52:22 -05:00
|
|
|
|
2007-12-18 07:07:51 -05:00
|
|
|
if (val == Qundef) {
|
2018-01-29 02:04:50 -05:00
|
|
|
DISPATCH_ORIGINAL_INSN(opt_send_without_block);
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
2007-12-18 07:07:51 -05:00
|
|
|
}
|
2007-01-16 03:52:22 -05:00
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* optimized X!=Y. */
|
2007-12-18 07:07:51 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
opt_neq
|
2018-01-29 02:15:08 -05:00
|
|
|
(CALL_INFO ci_eq, CALL_CACHE cc_eq, CALL_INFO ci, CALL_CACHE cc)
|
2007-12-18 07:07:51 -05:00
|
|
|
(VALUE recv, VALUE obj)
|
|
|
|
(VALUE val)
|
|
|
|
{
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
val = vm_opt_neq(ci, cc, ci_eq, cc_eq, recv, obj);
|
2007-12-18 07:07:51 -05:00
|
|
|
|
|
|
|
if (val == Qundef) {
|
mjit_compile.c: merge initial JIT compiler
which has been developed by Takashi Kokubun <takashikkbn@gmail> as
YARV-MJIT. Many of its bugs are fixed by wanabe <s.wanabe@gmail.com>.
This JIT compiler is designed to be a safe migration path to introduce
JIT compiler to MRI. So this commit does not include any bytecode
changes or dynamic instruction modifications, which are done in original
MJIT.
This commit even strips off some aggressive optimizations from
YARV-MJIT, and thus it's slower than YARV-MJIT too. But it's still
fairly faster than Ruby 2.5 in some benchmarks (attached below).
Note that this JIT compiler passes `make test`, `make test-all`, `make
test-spec` without JIT, and even with JIT. Not only it's perfectly safe
with JIT disabled because it does not replace VM instructions unlike
MJIT, but also with JIT enabled it stably runs Ruby applications
including Rails applications.
I'm expecting this version as just "initial" JIT compiler. I have many
optimization ideas which are skipped for initial merging, and you may
easily replace this JIT compiler with a faster one by just replacing
mjit_compile.c. `mjit_compile` interface is designed for the purpose.
common.mk: update dependencies for mjit_compile.c.
internal.h: declare `rb_vm_insn_addr2insn` for MJIT.
vm.c: exclude some definitions if `-DMJIT_HEADER` is provided to
compiler. This avoids to include some functions which take a long time
to compile, e.g. vm_exec_core. Some of the purpose is achieved in
transform_mjit_header.rb (see `IGNORED_FUNCTIONS`) but others are
manually resolved for now. Load mjit_helper.h for MJIT header.
mjit_helper.h: New. This is a file used only by JIT-ed code. I'll
refactor `mjit_call_cfunc` later.
vm_eval.c: add some #ifdef switches to skip compiling some functions
like Init_vm_eval.
win32/mkexports.rb: export thread/ec functions, which are used by MJIT.
include/ruby/defines.h: add MJIT_FUNC_EXPORTED macro alis to clarify
that a function is exported only for MJIT.
array.c: export a function used by MJIT.
bignum.c: ditto.
class.c: ditto.
compile.c: ditto.
error.c: ditto.
gc.c: ditto.
hash.c: ditto.
iseq.c: ditto.
numeric.c: ditto.
object.c: ditto.
proc.c: ditto.
re.c: ditto.
st.c: ditto.
string.c: ditto.
thread.c: ditto.
variable.c: ditto.
vm_backtrace.c: ditto.
vm_insnhelper.c: ditto.
vm_method.c: ditto.
I would like to improve maintainability of function exports, but I
believe this way is acceptable as initial merging if we clarify the
new exports are for MJIT (so that we can use them as TODO list to fix)
and add unit tests to detect unresolved symbols.
I'll add unit tests of JIT compilations in succeeding commits.
Author: Takashi Kokubun <takashikkbn@gmail.com>
Contributor: wanabe <s.wanabe@gmail.com>
Part of [Feature #14235]
---
* Known issues
* Code generated by gcc is faster than clang. The benchmark may be worse
in macOS. Following benchmark result is provided by gcc w/ Linux.
* Performance is decreased when Google Chrome is running
* JIT can work on MinGW, but it doesn't improve performance at least
in short running benchmark.
* Currently it doesn't perform well with Rails. We'll try to fix this
before release.
---
* Benchmark reslts
Benchmarked with:
Intel 4.0GHz i7-4790K with 16GB memory under x86-64 Ubuntu 8 Cores
- 2.0.0-p0: Ruby 2.0.0-p0
- r62186: Ruby trunk (early 2.6.0), before MJIT changes
- JIT off: On this commit, but without `--jit` option
- JIT on: On this commit, and with `--jit` option
** Optcarrot fps
Benchmark: https://github.com/mame/optcarrot
| |2.0.0-p0 |r62186 |JIT off |JIT on |
|:--------|:--------|:--------|:--------|:--------|
|fps |37.32 |51.46 |51.31 |58.88 |
|vs 2.0.0 |1.00x |1.38x |1.37x |1.58x |
** MJIT benchmarks
Benchmark: https://github.com/benchmark-driver/mjit-benchmarks
(Original: https://github.com/vnmakarov/ruby/tree/rtl_mjit_branch/MJIT-benchmarks)
| |2.0.0-p0 |r62186 |JIT off |JIT on |
|:----------|:--------|:--------|:--------|:--------|
|aread |1.00 |1.09 |1.07 |2.19 |
|aref |1.00 |1.13 |1.11 |2.22 |
|aset |1.00 |1.50 |1.45 |2.64 |
|awrite |1.00 |1.17 |1.13 |2.20 |
|call |1.00 |1.29 |1.26 |2.02 |
|const2 |1.00 |1.10 |1.10 |2.19 |
|const |1.00 |1.11 |1.10 |2.19 |
|fannk |1.00 |1.04 |1.02 |1.00 |
|fib |1.00 |1.32 |1.31 |1.84 |
|ivread |1.00 |1.13 |1.12 |2.43 |
|ivwrite |1.00 |1.23 |1.21 |2.40 |
|mandelbrot |1.00 |1.13 |1.16 |1.28 |
|meteor |1.00 |2.97 |2.92 |3.17 |
|nbody |1.00 |1.17 |1.15 |1.49 |
|nest-ntimes|1.00 |1.22 |1.20 |1.39 |
|nest-while |1.00 |1.10 |1.10 |1.37 |
|norm |1.00 |1.18 |1.16 |1.24 |
|nsvb |1.00 |1.16 |1.16 |1.17 |
|red-black |1.00 |1.02 |0.99 |1.12 |
|sieve |1.00 |1.30 |1.28 |1.62 |
|trees |1.00 |1.14 |1.13 |1.19 |
|while |1.00 |1.12 |1.11 |2.41 |
** Discourse's script/bench.rb
Benchmark: https://github.com/discourse/discourse/blob/v1.8.7/script/bench.rb
NOTE: Rails performance was somehow a little degraded with JIT for now.
We should fix this.
(At least I know opt_aref is performing badly in JIT and I have an idea
to fix it. Please wait for the fix.)
*** JIT off
Your Results: (note for timings- percentile is first, duration is second in millisecs)
categories_admin:
50: 17
75: 18
90: 22
99: 29
home_admin:
50: 21
75: 21
90: 27
99: 40
topic_admin:
50: 17
75: 18
90: 22
99: 32
categories:
50: 35
75: 41
90: 43
99: 77
home:
50: 39
75: 46
90: 49
99: 95
topic:
50: 46
75: 52
90: 56
99: 101
*** JIT on
Your Results: (note for timings- percentile is first, duration is second in millisecs)
categories_admin:
50: 19
75: 21
90: 25
99: 33
home_admin:
50: 24
75: 26
90: 30
99: 35
topic_admin:
50: 19
75: 20
90: 25
99: 30
categories:
50: 40
75: 44
90: 48
99: 76
home:
50: 42
75: 48
90: 51
99: 89
topic:
50: 49
75: 55
90: 58
99: 99
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62197 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-02-04 06:22:28 -05:00
|
|
|
#ifndef MJIT_HEADER
|
2018-01-29 02:15:08 -05:00
|
|
|
ADD_PC(2); /* !!! */
|
mjit_compile.c: merge initial JIT compiler
which has been developed by Takashi Kokubun <takashikkbn@gmail> as
YARV-MJIT. Many of its bugs are fixed by wanabe <s.wanabe@gmail.com>.
This JIT compiler is designed to be a safe migration path to introduce
JIT compiler to MRI. So this commit does not include any bytecode
changes or dynamic instruction modifications, which are done in original
MJIT.
This commit even strips off some aggressive optimizations from
YARV-MJIT, and thus it's slower than YARV-MJIT too. But it's still
fairly faster than Ruby 2.5 in some benchmarks (attached below).
Note that this JIT compiler passes `make test`, `make test-all`, `make
test-spec` without JIT, and even with JIT. Not only it's perfectly safe
with JIT disabled because it does not replace VM instructions unlike
MJIT, but also with JIT enabled it stably runs Ruby applications
including Rails applications.
I'm expecting this version as just "initial" JIT compiler. I have many
optimization ideas which are skipped for initial merging, and you may
easily replace this JIT compiler with a faster one by just replacing
mjit_compile.c. `mjit_compile` interface is designed for the purpose.
common.mk: update dependencies for mjit_compile.c.
internal.h: declare `rb_vm_insn_addr2insn` for MJIT.
vm.c: exclude some definitions if `-DMJIT_HEADER` is provided to
compiler. This avoids to include some functions which take a long time
to compile, e.g. vm_exec_core. Some of the purpose is achieved in
transform_mjit_header.rb (see `IGNORED_FUNCTIONS`) but others are
manually resolved for now. Load mjit_helper.h for MJIT header.
mjit_helper.h: New. This is a file used only by JIT-ed code. I'll
refactor `mjit_call_cfunc` later.
vm_eval.c: add some #ifdef switches to skip compiling some functions
like Init_vm_eval.
win32/mkexports.rb: export thread/ec functions, which are used by MJIT.
include/ruby/defines.h: add MJIT_FUNC_EXPORTED macro alis to clarify
that a function is exported only for MJIT.
array.c: export a function used by MJIT.
bignum.c: ditto.
class.c: ditto.
compile.c: ditto.
error.c: ditto.
gc.c: ditto.
hash.c: ditto.
iseq.c: ditto.
numeric.c: ditto.
object.c: ditto.
proc.c: ditto.
re.c: ditto.
st.c: ditto.
string.c: ditto.
thread.c: ditto.
variable.c: ditto.
vm_backtrace.c: ditto.
vm_insnhelper.c: ditto.
vm_method.c: ditto.
I would like to improve maintainability of function exports, but I
believe this way is acceptable as initial merging if we clarify the
new exports are for MJIT (so that we can use them as TODO list to fix)
and add unit tests to detect unresolved symbols.
I'll add unit tests of JIT compilations in succeeding commits.
Author: Takashi Kokubun <takashikkbn@gmail.com>
Contributor: wanabe <s.wanabe@gmail.com>
Part of [Feature #14235]
---
* Known issues
* Code generated by gcc is faster than clang. The benchmark may be worse
in macOS. Following benchmark result is provided by gcc w/ Linux.
* Performance is decreased when Google Chrome is running
* JIT can work on MinGW, but it doesn't improve performance at least
in short running benchmark.
* Currently it doesn't perform well with Rails. We'll try to fix this
before release.
---
* Benchmark reslts
Benchmarked with:
Intel 4.0GHz i7-4790K with 16GB memory under x86-64 Ubuntu 8 Cores
- 2.0.0-p0: Ruby 2.0.0-p0
- r62186: Ruby trunk (early 2.6.0), before MJIT changes
- JIT off: On this commit, but without `--jit` option
- JIT on: On this commit, and with `--jit` option
** Optcarrot fps
Benchmark: https://github.com/mame/optcarrot
| |2.0.0-p0 |r62186 |JIT off |JIT on |
|:--------|:--------|:--------|:--------|:--------|
|fps |37.32 |51.46 |51.31 |58.88 |
|vs 2.0.0 |1.00x |1.38x |1.37x |1.58x |
** MJIT benchmarks
Benchmark: https://github.com/benchmark-driver/mjit-benchmarks
(Original: https://github.com/vnmakarov/ruby/tree/rtl_mjit_branch/MJIT-benchmarks)
| |2.0.0-p0 |r62186 |JIT off |JIT on |
|:----------|:--------|:--------|:--------|:--------|
|aread |1.00 |1.09 |1.07 |2.19 |
|aref |1.00 |1.13 |1.11 |2.22 |
|aset |1.00 |1.50 |1.45 |2.64 |
|awrite |1.00 |1.17 |1.13 |2.20 |
|call |1.00 |1.29 |1.26 |2.02 |
|const2 |1.00 |1.10 |1.10 |2.19 |
|const |1.00 |1.11 |1.10 |2.19 |
|fannk |1.00 |1.04 |1.02 |1.00 |
|fib |1.00 |1.32 |1.31 |1.84 |
|ivread |1.00 |1.13 |1.12 |2.43 |
|ivwrite |1.00 |1.23 |1.21 |2.40 |
|mandelbrot |1.00 |1.13 |1.16 |1.28 |
|meteor |1.00 |2.97 |2.92 |3.17 |
|nbody |1.00 |1.17 |1.15 |1.49 |
|nest-ntimes|1.00 |1.22 |1.20 |1.39 |
|nest-while |1.00 |1.10 |1.10 |1.37 |
|norm |1.00 |1.18 |1.16 |1.24 |
|nsvb |1.00 |1.16 |1.16 |1.17 |
|red-black |1.00 |1.02 |0.99 |1.12 |
|sieve |1.00 |1.30 |1.28 |1.62 |
|trees |1.00 |1.14 |1.13 |1.19 |
|while |1.00 |1.12 |1.11 |2.41 |
** Discourse's script/bench.rb
Benchmark: https://github.com/discourse/discourse/blob/v1.8.7/script/bench.rb
NOTE: Rails performance was somehow a little degraded with JIT for now.
We should fix this.
(At least I know opt_aref is performing badly in JIT and I have an idea
to fix it. Please wait for the fix.)
*** JIT off
Your Results: (note for timings- percentile is first, duration is second in millisecs)
categories_admin:
50: 17
75: 18
90: 22
99: 29
home_admin:
50: 21
75: 21
90: 27
99: 40
topic_admin:
50: 17
75: 18
90: 22
99: 32
categories:
50: 35
75: 41
90: 43
99: 77
home:
50: 39
75: 46
90: 49
99: 95
topic:
50: 46
75: 52
90: 56
99: 101
*** JIT on
Your Results: (note for timings- percentile is first, duration is second in millisecs)
categories_admin:
50: 19
75: 21
90: 25
99: 33
home_admin:
50: 24
75: 26
90: 30
99: 35
topic_admin:
50: 19
75: 20
90: 25
99: 30
categories:
50: 40
75: 44
90: 48
99: 76
home:
50: 42
75: 48
90: 51
99: 89
topic:
50: 49
75: 55
90: 58
99: 99
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62197 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-02-04 06:22:28 -05:00
|
|
|
#endif
|
2018-01-29 02:15:08 -05:00
|
|
|
DISPATCH_ORIGINAL_INSN(opt_send_without_block);
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* optimized X<Y. */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
opt_lt
|
2015-09-19 13:59:58 -04:00
|
|
|
(CALL_INFO ci, CALL_CACHE cc)
|
2007-01-16 03:52:22 -05:00
|
|
|
(VALUE recv, VALUE obj)
|
|
|
|
(VALUE val)
|
|
|
|
{
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
val = vm_opt_lt(recv, obj);
|
2007-01-16 03:52:22 -05:00
|
|
|
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
if (val == Qundef) {
|
2018-01-29 02:04:50 -05:00
|
|
|
DISPATCH_ORIGINAL_INSN(opt_send_without_block);
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* optimized X<=Y. */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
opt_le
|
2015-09-19 13:59:58 -04:00
|
|
|
(CALL_INFO ci, CALL_CACHE cc)
|
2007-01-16 03:52:22 -05:00
|
|
|
(VALUE recv, VALUE obj)
|
|
|
|
(VALUE val)
|
|
|
|
{
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
val = vm_opt_le(recv, obj);
|
2007-01-16 03:52:22 -05:00
|
|
|
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
if (val == Qundef) {
|
2018-01-29 02:04:50 -05:00
|
|
|
DISPATCH_ORIGINAL_INSN(opt_send_without_block);
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* optimized X>Y. */
|
2007-05-21 00:46:51 -04:00
|
|
|
DEFINE_INSN
|
|
|
|
opt_gt
|
2015-09-19 13:59:58 -04:00
|
|
|
(CALL_INFO ci, CALL_CACHE cc)
|
2007-05-21 00:46:51 -04:00
|
|
|
(VALUE recv, VALUE obj)
|
|
|
|
(VALUE val)
|
|
|
|
{
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
val = vm_opt_gt(recv, obj);
|
2007-05-21 00:46:51 -04:00
|
|
|
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
if (val == Qundef) {
|
2018-01-29 02:04:50 -05:00
|
|
|
DISPATCH_ORIGINAL_INSN(opt_send_without_block);
|
2007-05-21 00:46:51 -04:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* optimized X>=Y. */
|
2007-05-21 00:46:51 -04:00
|
|
|
DEFINE_INSN
|
|
|
|
opt_ge
|
2015-09-19 13:59:58 -04:00
|
|
|
(CALL_INFO ci, CALL_CACHE cc)
|
2007-05-21 00:46:51 -04:00
|
|
|
(VALUE recv, VALUE obj)
|
|
|
|
(VALUE val)
|
|
|
|
{
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
val = vm_opt_ge(recv, obj);
|
2007-05-21 00:46:51 -04:00
|
|
|
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
if (val == Qundef) {
|
2018-01-29 02:04:50 -05:00
|
|
|
DISPATCH_ORIGINAL_INSN(opt_send_without_block);
|
2007-05-21 00:46:51 -04:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* << */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
opt_ltlt
|
2015-09-19 13:59:58 -04:00
|
|
|
(CALL_INFO ci, CALL_CACHE cc)
|
2007-01-16 03:52:22 -05:00
|
|
|
(VALUE recv, VALUE obj)
|
|
|
|
(VALUE val)
|
|
|
|
{
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
val = vm_opt_ltlt(recv, obj);
|
|
|
|
|
|
|
|
if (val == Qundef) {
|
2018-01-29 02:04:50 -05:00
|
|
|
DISPATCH_ORIGINAL_INSN(opt_send_without_block);
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* [] */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
opt_aref
|
2015-09-19 13:59:58 -04:00
|
|
|
(CALL_INFO ci, CALL_CACHE cc)
|
2007-01-16 03:52:22 -05:00
|
|
|
(VALUE recv, VALUE obj)
|
|
|
|
(VALUE val)
|
|
|
|
{
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
val = vm_opt_aref(recv, obj);
|
|
|
|
|
|
|
|
if (val == Qundef) {
|
2018-01-29 02:04:50 -05:00
|
|
|
DISPATCH_ORIGINAL_INSN(opt_send_without_block);
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* recv[obj] = set */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
opt_aset
|
2015-09-19 13:59:58 -04:00
|
|
|
(CALL_INFO ci, CALL_CACHE cc)
|
2007-01-16 03:52:22 -05:00
|
|
|
(VALUE recv, VALUE obj, VALUE set)
|
|
|
|
(VALUE val)
|
|
|
|
{
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
val = vm_opt_aset(recv, obj, set);
|
|
|
|
|
|
|
|
if (val == Qundef) {
|
2018-01-29 02:04:50 -05:00
|
|
|
DISPATCH_ORIGINAL_INSN(opt_send_without_block);
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* recv[str] = set */
|
2014-01-09 23:54:08 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
opt_aset_with
|
2018-01-29 02:15:08 -05:00
|
|
|
(VALUE key, CALL_INFO ci, CALL_CACHE cc)
|
2014-01-09 23:54:08 -05:00
|
|
|
(VALUE recv, VALUE val)
|
|
|
|
(VALUE val)
|
|
|
|
{
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
VALUE tmp = vm_opt_aset_with(recv, key, val);
|
|
|
|
|
|
|
|
if (tmp != Qundef) {
|
2017-04-18 07:06:58 -04:00
|
|
|
val = tmp;
|
2014-01-24 22:15:30 -05:00
|
|
|
}
|
|
|
|
else {
|
mjit_compile.c: merge initial JIT compiler
which has been developed by Takashi Kokubun <takashikkbn@gmail> as
YARV-MJIT. Many of its bugs are fixed by wanabe <s.wanabe@gmail.com>.
This JIT compiler is designed to be a safe migration path to introduce
JIT compiler to MRI. So this commit does not include any bytecode
changes or dynamic instruction modifications, which are done in original
MJIT.
This commit even strips off some aggressive optimizations from
YARV-MJIT, and thus it's slower than YARV-MJIT too. But it's still
fairly faster than Ruby 2.5 in some benchmarks (attached below).
Note that this JIT compiler passes `make test`, `make test-all`, `make
test-spec` without JIT, and even with JIT. Not only it's perfectly safe
with JIT disabled because it does not replace VM instructions unlike
MJIT, but also with JIT enabled it stably runs Ruby applications
including Rails applications.
I'm expecting this version as just "initial" JIT compiler. I have many
optimization ideas which are skipped for initial merging, and you may
easily replace this JIT compiler with a faster one by just replacing
mjit_compile.c. `mjit_compile` interface is designed for the purpose.
common.mk: update dependencies for mjit_compile.c.
internal.h: declare `rb_vm_insn_addr2insn` for MJIT.
vm.c: exclude some definitions if `-DMJIT_HEADER` is provided to
compiler. This avoids to include some functions which take a long time
to compile, e.g. vm_exec_core. Some of the purpose is achieved in
transform_mjit_header.rb (see `IGNORED_FUNCTIONS`) but others are
manually resolved for now. Load mjit_helper.h for MJIT header.
mjit_helper.h: New. This is a file used only by JIT-ed code. I'll
refactor `mjit_call_cfunc` later.
vm_eval.c: add some #ifdef switches to skip compiling some functions
like Init_vm_eval.
win32/mkexports.rb: export thread/ec functions, which are used by MJIT.
include/ruby/defines.h: add MJIT_FUNC_EXPORTED macro alis to clarify
that a function is exported only for MJIT.
array.c: export a function used by MJIT.
bignum.c: ditto.
class.c: ditto.
compile.c: ditto.
error.c: ditto.
gc.c: ditto.
hash.c: ditto.
iseq.c: ditto.
numeric.c: ditto.
object.c: ditto.
proc.c: ditto.
re.c: ditto.
st.c: ditto.
string.c: ditto.
thread.c: ditto.
variable.c: ditto.
vm_backtrace.c: ditto.
vm_insnhelper.c: ditto.
vm_method.c: ditto.
I would like to improve maintainability of function exports, but I
believe this way is acceptable as initial merging if we clarify the
new exports are for MJIT (so that we can use them as TODO list to fix)
and add unit tests to detect unresolved symbols.
I'll add unit tests of JIT compilations in succeeding commits.
Author: Takashi Kokubun <takashikkbn@gmail.com>
Contributor: wanabe <s.wanabe@gmail.com>
Part of [Feature #14235]
---
* Known issues
* Code generated by gcc is faster than clang. The benchmark may be worse
in macOS. Following benchmark result is provided by gcc w/ Linux.
* Performance is decreased when Google Chrome is running
* JIT can work on MinGW, but it doesn't improve performance at least
in short running benchmark.
* Currently it doesn't perform well with Rails. We'll try to fix this
before release.
---
* Benchmark reslts
Benchmarked with:
Intel 4.0GHz i7-4790K with 16GB memory under x86-64 Ubuntu 8 Cores
- 2.0.0-p0: Ruby 2.0.0-p0
- r62186: Ruby trunk (early 2.6.0), before MJIT changes
- JIT off: On this commit, but without `--jit` option
- JIT on: On this commit, and with `--jit` option
** Optcarrot fps
Benchmark: https://github.com/mame/optcarrot
| |2.0.0-p0 |r62186 |JIT off |JIT on |
|:--------|:--------|:--------|:--------|:--------|
|fps |37.32 |51.46 |51.31 |58.88 |
|vs 2.0.0 |1.00x |1.38x |1.37x |1.58x |
** MJIT benchmarks
Benchmark: https://github.com/benchmark-driver/mjit-benchmarks
(Original: https://github.com/vnmakarov/ruby/tree/rtl_mjit_branch/MJIT-benchmarks)
| |2.0.0-p0 |r62186 |JIT off |JIT on |
|:----------|:--------|:--------|:--------|:--------|
|aread |1.00 |1.09 |1.07 |2.19 |
|aref |1.00 |1.13 |1.11 |2.22 |
|aset |1.00 |1.50 |1.45 |2.64 |
|awrite |1.00 |1.17 |1.13 |2.20 |
|call |1.00 |1.29 |1.26 |2.02 |
|const2 |1.00 |1.10 |1.10 |2.19 |
|const |1.00 |1.11 |1.10 |2.19 |
|fannk |1.00 |1.04 |1.02 |1.00 |
|fib |1.00 |1.32 |1.31 |1.84 |
|ivread |1.00 |1.13 |1.12 |2.43 |
|ivwrite |1.00 |1.23 |1.21 |2.40 |
|mandelbrot |1.00 |1.13 |1.16 |1.28 |
|meteor |1.00 |2.97 |2.92 |3.17 |
|nbody |1.00 |1.17 |1.15 |1.49 |
|nest-ntimes|1.00 |1.22 |1.20 |1.39 |
|nest-while |1.00 |1.10 |1.10 |1.37 |
|norm |1.00 |1.18 |1.16 |1.24 |
|nsvb |1.00 |1.16 |1.16 |1.17 |
|red-black |1.00 |1.02 |0.99 |1.12 |
|sieve |1.00 |1.30 |1.28 |1.62 |
|trees |1.00 |1.14 |1.13 |1.19 |
|while |1.00 |1.12 |1.11 |2.41 |
** Discourse's script/bench.rb
Benchmark: https://github.com/discourse/discourse/blob/v1.8.7/script/bench.rb
NOTE: Rails performance was somehow a little degraded with JIT for now.
We should fix this.
(At least I know opt_aref is performing badly in JIT and I have an idea
to fix it. Please wait for the fix.)
*** JIT off
Your Results: (note for timings- percentile is first, duration is second in millisecs)
categories_admin:
50: 17
75: 18
90: 22
99: 29
home_admin:
50: 21
75: 21
90: 27
99: 40
topic_admin:
50: 17
75: 18
90: 22
99: 32
categories:
50: 35
75: 41
90: 43
99: 77
home:
50: 39
75: 46
90: 49
99: 95
topic:
50: 46
75: 52
90: 56
99: 101
*** JIT on
Your Results: (note for timings- percentile is first, duration is second in millisecs)
categories_admin:
50: 19
75: 21
90: 25
99: 33
home_admin:
50: 24
75: 26
90: 30
99: 35
topic_admin:
50: 19
75: 20
90: 25
99: 30
categories:
50: 40
75: 44
90: 48
99: 76
home:
50: 42
75: 48
90: 51
99: 89
topic:
50: 49
75: 55
90: 58
99: 99
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62197 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-02-04 06:22:28 -05:00
|
|
|
#ifndef MJIT_HEADER
|
2018-01-29 02:15:08 -05:00
|
|
|
TOPN(0) = rb_str_resurrect(key);
|
2014-01-09 23:54:08 -05:00
|
|
|
PUSH(val);
|
2018-01-29 02:15:08 -05:00
|
|
|
ADD_PC(1); /* !!! */
|
mjit_compile.c: merge initial JIT compiler
which has been developed by Takashi Kokubun <takashikkbn@gmail> as
YARV-MJIT. Many of its bugs are fixed by wanabe <s.wanabe@gmail.com>.
This JIT compiler is designed to be a safe migration path to introduce
JIT compiler to MRI. So this commit does not include any bytecode
changes or dynamic instruction modifications, which are done in original
MJIT.
This commit even strips off some aggressive optimizations from
YARV-MJIT, and thus it's slower than YARV-MJIT too. But it's still
fairly faster than Ruby 2.5 in some benchmarks (attached below).
Note that this JIT compiler passes `make test`, `make test-all`, `make
test-spec` without JIT, and even with JIT. Not only it's perfectly safe
with JIT disabled because it does not replace VM instructions unlike
MJIT, but also with JIT enabled it stably runs Ruby applications
including Rails applications.
I'm expecting this version as just "initial" JIT compiler. I have many
optimization ideas which are skipped for initial merging, and you may
easily replace this JIT compiler with a faster one by just replacing
mjit_compile.c. `mjit_compile` interface is designed for the purpose.
common.mk: update dependencies for mjit_compile.c.
internal.h: declare `rb_vm_insn_addr2insn` for MJIT.
vm.c: exclude some definitions if `-DMJIT_HEADER` is provided to
compiler. This avoids to include some functions which take a long time
to compile, e.g. vm_exec_core. Some of the purpose is achieved in
transform_mjit_header.rb (see `IGNORED_FUNCTIONS`) but others are
manually resolved for now. Load mjit_helper.h for MJIT header.
mjit_helper.h: New. This is a file used only by JIT-ed code. I'll
refactor `mjit_call_cfunc` later.
vm_eval.c: add some #ifdef switches to skip compiling some functions
like Init_vm_eval.
win32/mkexports.rb: export thread/ec functions, which are used by MJIT.
include/ruby/defines.h: add MJIT_FUNC_EXPORTED macro alis to clarify
that a function is exported only for MJIT.
array.c: export a function used by MJIT.
bignum.c: ditto.
class.c: ditto.
compile.c: ditto.
error.c: ditto.
gc.c: ditto.
hash.c: ditto.
iseq.c: ditto.
numeric.c: ditto.
object.c: ditto.
proc.c: ditto.
re.c: ditto.
st.c: ditto.
string.c: ditto.
thread.c: ditto.
variable.c: ditto.
vm_backtrace.c: ditto.
vm_insnhelper.c: ditto.
vm_method.c: ditto.
I would like to improve maintainability of function exports, but I
believe this way is acceptable as initial merging if we clarify the
new exports are for MJIT (so that we can use them as TODO list to fix)
and add unit tests to detect unresolved symbols.
I'll add unit tests of JIT compilations in succeeding commits.
Author: Takashi Kokubun <takashikkbn@gmail.com>
Contributor: wanabe <s.wanabe@gmail.com>
Part of [Feature #14235]
---
* Known issues
* Code generated by gcc is faster than clang. The benchmark may be worse
in macOS. Following benchmark result is provided by gcc w/ Linux.
* Performance is decreased when Google Chrome is running
* JIT can work on MinGW, but it doesn't improve performance at least
in short running benchmark.
* Currently it doesn't perform well with Rails. We'll try to fix this
before release.
---
* Benchmark reslts
Benchmarked with:
Intel 4.0GHz i7-4790K with 16GB memory under x86-64 Ubuntu 8 Cores
- 2.0.0-p0: Ruby 2.0.0-p0
- r62186: Ruby trunk (early 2.6.0), before MJIT changes
- JIT off: On this commit, but without `--jit` option
- JIT on: On this commit, and with `--jit` option
** Optcarrot fps
Benchmark: https://github.com/mame/optcarrot
| |2.0.0-p0 |r62186 |JIT off |JIT on |
|:--------|:--------|:--------|:--------|:--------|
|fps |37.32 |51.46 |51.31 |58.88 |
|vs 2.0.0 |1.00x |1.38x |1.37x |1.58x |
** MJIT benchmarks
Benchmark: https://github.com/benchmark-driver/mjit-benchmarks
(Original: https://github.com/vnmakarov/ruby/tree/rtl_mjit_branch/MJIT-benchmarks)
| |2.0.0-p0 |r62186 |JIT off |JIT on |
|:----------|:--------|:--------|:--------|:--------|
|aread |1.00 |1.09 |1.07 |2.19 |
|aref |1.00 |1.13 |1.11 |2.22 |
|aset |1.00 |1.50 |1.45 |2.64 |
|awrite |1.00 |1.17 |1.13 |2.20 |
|call |1.00 |1.29 |1.26 |2.02 |
|const2 |1.00 |1.10 |1.10 |2.19 |
|const |1.00 |1.11 |1.10 |2.19 |
|fannk |1.00 |1.04 |1.02 |1.00 |
|fib |1.00 |1.32 |1.31 |1.84 |
|ivread |1.00 |1.13 |1.12 |2.43 |
|ivwrite |1.00 |1.23 |1.21 |2.40 |
|mandelbrot |1.00 |1.13 |1.16 |1.28 |
|meteor |1.00 |2.97 |2.92 |3.17 |
|nbody |1.00 |1.17 |1.15 |1.49 |
|nest-ntimes|1.00 |1.22 |1.20 |1.39 |
|nest-while |1.00 |1.10 |1.10 |1.37 |
|norm |1.00 |1.18 |1.16 |1.24 |
|nsvb |1.00 |1.16 |1.16 |1.17 |
|red-black |1.00 |1.02 |0.99 |1.12 |
|sieve |1.00 |1.30 |1.28 |1.62 |
|trees |1.00 |1.14 |1.13 |1.19 |
|while |1.00 |1.12 |1.11 |2.41 |
** Discourse's script/bench.rb
Benchmark: https://github.com/discourse/discourse/blob/v1.8.7/script/bench.rb
NOTE: Rails performance was somehow a little degraded with JIT for now.
We should fix this.
(At least I know opt_aref is performing badly in JIT and I have an idea
to fix it. Please wait for the fix.)
*** JIT off
Your Results: (note for timings- percentile is first, duration is second in millisecs)
categories_admin:
50: 17
75: 18
90: 22
99: 29
home_admin:
50: 21
75: 21
90: 27
99: 40
topic_admin:
50: 17
75: 18
90: 22
99: 32
categories:
50: 35
75: 41
90: 43
99: 77
home:
50: 39
75: 46
90: 49
99: 95
topic:
50: 46
75: 52
90: 56
99: 101
*** JIT on
Your Results: (note for timings- percentile is first, duration is second in millisecs)
categories_admin:
50: 19
75: 21
90: 25
99: 33
home_admin:
50: 24
75: 26
90: 30
99: 35
topic_admin:
50: 19
75: 20
90: 25
99: 30
categories:
50: 40
75: 44
90: 48
99: 76
home:
50: 42
75: 48
90: 51
99: 89
topic:
50: 49
75: 55
90: 58
99: 99
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62197 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-02-04 06:22:28 -05:00
|
|
|
#endif
|
2018-01-29 02:15:08 -05:00
|
|
|
DISPATCH_ORIGINAL_INSN(opt_send_without_block);
|
2014-01-09 23:54:08 -05:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* recv[str] */
|
2014-01-09 23:54:08 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
opt_aref_with
|
2018-01-29 02:15:08 -05:00
|
|
|
(VALUE key, CALL_INFO ci, CALL_CACHE cc)
|
2014-01-09 23:54:08 -05:00
|
|
|
(VALUE recv)
|
|
|
|
(VALUE val)
|
|
|
|
{
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
val = vm_opt_aref_with(recv, key);
|
|
|
|
|
|
|
|
if (val == Qundef) {
|
mjit_compile.c: merge initial JIT compiler
which has been developed by Takashi Kokubun <takashikkbn@gmail> as
YARV-MJIT. Many of its bugs are fixed by wanabe <s.wanabe@gmail.com>.
This JIT compiler is designed to be a safe migration path to introduce
JIT compiler to MRI. So this commit does not include any bytecode
changes or dynamic instruction modifications, which are done in original
MJIT.
This commit even strips off some aggressive optimizations from
YARV-MJIT, and thus it's slower than YARV-MJIT too. But it's still
fairly faster than Ruby 2.5 in some benchmarks (attached below).
Note that this JIT compiler passes `make test`, `make test-all`, `make
test-spec` without JIT, and even with JIT. Not only it's perfectly safe
with JIT disabled because it does not replace VM instructions unlike
MJIT, but also with JIT enabled it stably runs Ruby applications
including Rails applications.
I'm expecting this version as just "initial" JIT compiler. I have many
optimization ideas which are skipped for initial merging, and you may
easily replace this JIT compiler with a faster one by just replacing
mjit_compile.c. `mjit_compile` interface is designed for the purpose.
common.mk: update dependencies for mjit_compile.c.
internal.h: declare `rb_vm_insn_addr2insn` for MJIT.
vm.c: exclude some definitions if `-DMJIT_HEADER` is provided to
compiler. This avoids to include some functions which take a long time
to compile, e.g. vm_exec_core. Some of the purpose is achieved in
transform_mjit_header.rb (see `IGNORED_FUNCTIONS`) but others are
manually resolved for now. Load mjit_helper.h for MJIT header.
mjit_helper.h: New. This is a file used only by JIT-ed code. I'll
refactor `mjit_call_cfunc` later.
vm_eval.c: add some #ifdef switches to skip compiling some functions
like Init_vm_eval.
win32/mkexports.rb: export thread/ec functions, which are used by MJIT.
include/ruby/defines.h: add MJIT_FUNC_EXPORTED macro alis to clarify
that a function is exported only for MJIT.
array.c: export a function used by MJIT.
bignum.c: ditto.
class.c: ditto.
compile.c: ditto.
error.c: ditto.
gc.c: ditto.
hash.c: ditto.
iseq.c: ditto.
numeric.c: ditto.
object.c: ditto.
proc.c: ditto.
re.c: ditto.
st.c: ditto.
string.c: ditto.
thread.c: ditto.
variable.c: ditto.
vm_backtrace.c: ditto.
vm_insnhelper.c: ditto.
vm_method.c: ditto.
I would like to improve maintainability of function exports, but I
believe this way is acceptable as initial merging if we clarify the
new exports are for MJIT (so that we can use them as TODO list to fix)
and add unit tests to detect unresolved symbols.
I'll add unit tests of JIT compilations in succeeding commits.
Author: Takashi Kokubun <takashikkbn@gmail.com>
Contributor: wanabe <s.wanabe@gmail.com>
Part of [Feature #14235]
---
* Known issues
* Code generated by gcc is faster than clang. The benchmark may be worse
in macOS. Following benchmark result is provided by gcc w/ Linux.
* Performance is decreased when Google Chrome is running
* JIT can work on MinGW, but it doesn't improve performance at least
in short running benchmark.
* Currently it doesn't perform well with Rails. We'll try to fix this
before release.
---
* Benchmark reslts
Benchmarked with:
Intel 4.0GHz i7-4790K with 16GB memory under x86-64 Ubuntu 8 Cores
- 2.0.0-p0: Ruby 2.0.0-p0
- r62186: Ruby trunk (early 2.6.0), before MJIT changes
- JIT off: On this commit, but without `--jit` option
- JIT on: On this commit, and with `--jit` option
** Optcarrot fps
Benchmark: https://github.com/mame/optcarrot
| |2.0.0-p0 |r62186 |JIT off |JIT on |
|:--------|:--------|:--------|:--------|:--------|
|fps |37.32 |51.46 |51.31 |58.88 |
|vs 2.0.0 |1.00x |1.38x |1.37x |1.58x |
** MJIT benchmarks
Benchmark: https://github.com/benchmark-driver/mjit-benchmarks
(Original: https://github.com/vnmakarov/ruby/tree/rtl_mjit_branch/MJIT-benchmarks)
| |2.0.0-p0 |r62186 |JIT off |JIT on |
|:----------|:--------|:--------|:--------|:--------|
|aread |1.00 |1.09 |1.07 |2.19 |
|aref |1.00 |1.13 |1.11 |2.22 |
|aset |1.00 |1.50 |1.45 |2.64 |
|awrite |1.00 |1.17 |1.13 |2.20 |
|call |1.00 |1.29 |1.26 |2.02 |
|const2 |1.00 |1.10 |1.10 |2.19 |
|const |1.00 |1.11 |1.10 |2.19 |
|fannk |1.00 |1.04 |1.02 |1.00 |
|fib |1.00 |1.32 |1.31 |1.84 |
|ivread |1.00 |1.13 |1.12 |2.43 |
|ivwrite |1.00 |1.23 |1.21 |2.40 |
|mandelbrot |1.00 |1.13 |1.16 |1.28 |
|meteor |1.00 |2.97 |2.92 |3.17 |
|nbody |1.00 |1.17 |1.15 |1.49 |
|nest-ntimes|1.00 |1.22 |1.20 |1.39 |
|nest-while |1.00 |1.10 |1.10 |1.37 |
|norm |1.00 |1.18 |1.16 |1.24 |
|nsvb |1.00 |1.16 |1.16 |1.17 |
|red-black |1.00 |1.02 |0.99 |1.12 |
|sieve |1.00 |1.30 |1.28 |1.62 |
|trees |1.00 |1.14 |1.13 |1.19 |
|while |1.00 |1.12 |1.11 |2.41 |
** Discourse's script/bench.rb
Benchmark: https://github.com/discourse/discourse/blob/v1.8.7/script/bench.rb
NOTE: Rails performance was somehow a little degraded with JIT for now.
We should fix this.
(At least I know opt_aref is performing badly in JIT and I have an idea
to fix it. Please wait for the fix.)
*** JIT off
Your Results: (note for timings- percentile is first, duration is second in millisecs)
categories_admin:
50: 17
75: 18
90: 22
99: 29
home_admin:
50: 21
75: 21
90: 27
99: 40
topic_admin:
50: 17
75: 18
90: 22
99: 32
categories:
50: 35
75: 41
90: 43
99: 77
home:
50: 39
75: 46
90: 49
99: 95
topic:
50: 46
75: 52
90: 56
99: 101
*** JIT on
Your Results: (note for timings- percentile is first, duration is second in millisecs)
categories_admin:
50: 19
75: 21
90: 25
99: 33
home_admin:
50: 24
75: 26
90: 30
99: 35
topic_admin:
50: 19
75: 20
90: 25
99: 30
categories:
50: 40
75: 44
90: 48
99: 76
home:
50: 42
75: 48
90: 51
99: 89
topic:
50: 49
75: 55
90: 58
99: 99
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62197 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-02-04 06:22:28 -05:00
|
|
|
#ifndef MJIT_HEADER
|
2014-01-09 23:54:08 -05:00
|
|
|
PUSH(rb_str_resurrect(key));
|
2018-01-29 02:15:08 -05:00
|
|
|
ADD_PC(1); /* !!! */
|
mjit_compile.c: merge initial JIT compiler
which has been developed by Takashi Kokubun <takashikkbn@gmail> as
YARV-MJIT. Many of its bugs are fixed by wanabe <s.wanabe@gmail.com>.
This JIT compiler is designed to be a safe migration path to introduce
JIT compiler to MRI. So this commit does not include any bytecode
changes or dynamic instruction modifications, which are done in original
MJIT.
This commit even strips off some aggressive optimizations from
YARV-MJIT, and thus it's slower than YARV-MJIT too. But it's still
fairly faster than Ruby 2.5 in some benchmarks (attached below).
Note that this JIT compiler passes `make test`, `make test-all`, `make
test-spec` without JIT, and even with JIT. Not only it's perfectly safe
with JIT disabled because it does not replace VM instructions unlike
MJIT, but also with JIT enabled it stably runs Ruby applications
including Rails applications.
I'm expecting this version as just "initial" JIT compiler. I have many
optimization ideas which are skipped for initial merging, and you may
easily replace this JIT compiler with a faster one by just replacing
mjit_compile.c. `mjit_compile` interface is designed for the purpose.
common.mk: update dependencies for mjit_compile.c.
internal.h: declare `rb_vm_insn_addr2insn` for MJIT.
vm.c: exclude some definitions if `-DMJIT_HEADER` is provided to
compiler. This avoids to include some functions which take a long time
to compile, e.g. vm_exec_core. Some of the purpose is achieved in
transform_mjit_header.rb (see `IGNORED_FUNCTIONS`) but others are
manually resolved for now. Load mjit_helper.h for MJIT header.
mjit_helper.h: New. This is a file used only by JIT-ed code. I'll
refactor `mjit_call_cfunc` later.
vm_eval.c: add some #ifdef switches to skip compiling some functions
like Init_vm_eval.
win32/mkexports.rb: export thread/ec functions, which are used by MJIT.
include/ruby/defines.h: add MJIT_FUNC_EXPORTED macro alis to clarify
that a function is exported only for MJIT.
array.c: export a function used by MJIT.
bignum.c: ditto.
class.c: ditto.
compile.c: ditto.
error.c: ditto.
gc.c: ditto.
hash.c: ditto.
iseq.c: ditto.
numeric.c: ditto.
object.c: ditto.
proc.c: ditto.
re.c: ditto.
st.c: ditto.
string.c: ditto.
thread.c: ditto.
variable.c: ditto.
vm_backtrace.c: ditto.
vm_insnhelper.c: ditto.
vm_method.c: ditto.
I would like to improve maintainability of function exports, but I
believe this way is acceptable as initial merging if we clarify the
new exports are for MJIT (so that we can use them as TODO list to fix)
and add unit tests to detect unresolved symbols.
I'll add unit tests of JIT compilations in succeeding commits.
Author: Takashi Kokubun <takashikkbn@gmail.com>
Contributor: wanabe <s.wanabe@gmail.com>
Part of [Feature #14235]
---
* Known issues
* Code generated by gcc is faster than clang. The benchmark may be worse
in macOS. Following benchmark result is provided by gcc w/ Linux.
* Performance is decreased when Google Chrome is running
* JIT can work on MinGW, but it doesn't improve performance at least
in short running benchmark.
* Currently it doesn't perform well with Rails. We'll try to fix this
before release.
---
* Benchmark reslts
Benchmarked with:
Intel 4.0GHz i7-4790K with 16GB memory under x86-64 Ubuntu 8 Cores
- 2.0.0-p0: Ruby 2.0.0-p0
- r62186: Ruby trunk (early 2.6.0), before MJIT changes
- JIT off: On this commit, but without `--jit` option
- JIT on: On this commit, and with `--jit` option
** Optcarrot fps
Benchmark: https://github.com/mame/optcarrot
| |2.0.0-p0 |r62186 |JIT off |JIT on |
|:--------|:--------|:--------|:--------|:--------|
|fps |37.32 |51.46 |51.31 |58.88 |
|vs 2.0.0 |1.00x |1.38x |1.37x |1.58x |
** MJIT benchmarks
Benchmark: https://github.com/benchmark-driver/mjit-benchmarks
(Original: https://github.com/vnmakarov/ruby/tree/rtl_mjit_branch/MJIT-benchmarks)
| |2.0.0-p0 |r62186 |JIT off |JIT on |
|:----------|:--------|:--------|:--------|:--------|
|aread |1.00 |1.09 |1.07 |2.19 |
|aref |1.00 |1.13 |1.11 |2.22 |
|aset |1.00 |1.50 |1.45 |2.64 |
|awrite |1.00 |1.17 |1.13 |2.20 |
|call |1.00 |1.29 |1.26 |2.02 |
|const2 |1.00 |1.10 |1.10 |2.19 |
|const |1.00 |1.11 |1.10 |2.19 |
|fannk |1.00 |1.04 |1.02 |1.00 |
|fib |1.00 |1.32 |1.31 |1.84 |
|ivread |1.00 |1.13 |1.12 |2.43 |
|ivwrite |1.00 |1.23 |1.21 |2.40 |
|mandelbrot |1.00 |1.13 |1.16 |1.28 |
|meteor |1.00 |2.97 |2.92 |3.17 |
|nbody |1.00 |1.17 |1.15 |1.49 |
|nest-ntimes|1.00 |1.22 |1.20 |1.39 |
|nest-while |1.00 |1.10 |1.10 |1.37 |
|norm |1.00 |1.18 |1.16 |1.24 |
|nsvb |1.00 |1.16 |1.16 |1.17 |
|red-black |1.00 |1.02 |0.99 |1.12 |
|sieve |1.00 |1.30 |1.28 |1.62 |
|trees |1.00 |1.14 |1.13 |1.19 |
|while |1.00 |1.12 |1.11 |2.41 |
** Discourse's script/bench.rb
Benchmark: https://github.com/discourse/discourse/blob/v1.8.7/script/bench.rb
NOTE: Rails performance was somehow a little degraded with JIT for now.
We should fix this.
(At least I know opt_aref is performing badly in JIT and I have an idea
to fix it. Please wait for the fix.)
*** JIT off
Your Results: (note for timings- percentile is first, duration is second in millisecs)
categories_admin:
50: 17
75: 18
90: 22
99: 29
home_admin:
50: 21
75: 21
90: 27
99: 40
topic_admin:
50: 17
75: 18
90: 22
99: 32
categories:
50: 35
75: 41
90: 43
99: 77
home:
50: 39
75: 46
90: 49
99: 95
topic:
50: 46
75: 52
90: 56
99: 101
*** JIT on
Your Results: (note for timings- percentile is first, duration is second in millisecs)
categories_admin:
50: 19
75: 21
90: 25
99: 33
home_admin:
50: 24
75: 26
90: 30
99: 35
topic_admin:
50: 19
75: 20
90: 25
99: 30
categories:
50: 40
75: 44
90: 48
99: 76
home:
50: 42
75: 48
90: 51
99: 89
topic:
50: 49
75: 55
90: 58
99: 99
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62197 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-02-04 06:22:28 -05:00
|
|
|
#endif
|
2018-01-29 02:15:08 -05:00
|
|
|
DISPATCH_ORIGINAL_INSN(opt_send_without_block);
|
2014-01-09 23:54:08 -05:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* optimized length */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
opt_length
|
2015-09-19 13:59:58 -04:00
|
|
|
(CALL_INFO ci, CALL_CACHE cc)
|
2007-01-16 03:52:22 -05:00
|
|
|
(VALUE recv)
|
|
|
|
(VALUE val)
|
|
|
|
{
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
val = vm_opt_length(recv, BOP_LENGTH);
|
|
|
|
|
|
|
|
if (val == Qundef) {
|
2018-01-29 02:04:50 -05:00
|
|
|
DISPATCH_ORIGINAL_INSN(opt_send_without_block);
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* optimized size */
|
2009-09-06 04:39:57 -04:00
|
|
|
DEFINE_INSN
|
|
|
|
opt_size
|
2015-09-19 13:59:58 -04:00
|
|
|
(CALL_INFO ci, CALL_CACHE cc)
|
2009-09-06 04:39:57 -04:00
|
|
|
(VALUE recv)
|
|
|
|
(VALUE val)
|
|
|
|
{
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
val = vm_opt_length(recv, BOP_SIZE);
|
|
|
|
|
|
|
|
if (val == Qundef) {
|
2018-01-29 02:04:50 -05:00
|
|
|
DISPATCH_ORIGINAL_INSN(opt_send_without_block);
|
2009-09-06 04:39:57 -04:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* optimized empty? */
|
2012-09-26 05:34:46 -04:00
|
|
|
DEFINE_INSN
|
|
|
|
opt_empty_p
|
2015-09-19 13:59:58 -04:00
|
|
|
(CALL_INFO ci, CALL_CACHE cc)
|
2012-09-26 05:34:46 -04:00
|
|
|
(VALUE recv)
|
|
|
|
(VALUE val)
|
|
|
|
{
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
val = vm_opt_empty_p(recv);
|
|
|
|
|
|
|
|
if (val == Qundef) {
|
2018-01-29 02:04:50 -05:00
|
|
|
DISPATCH_ORIGINAL_INSN(opt_send_without_block);
|
2012-09-26 05:34:46 -04:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* optimized succ */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
opt_succ
|
2015-09-19 13:59:58 -04:00
|
|
|
(CALL_INFO ci, CALL_CACHE cc)
|
2007-01-16 03:52:22 -05:00
|
|
|
(VALUE recv)
|
|
|
|
(VALUE val)
|
|
|
|
{
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
val = vm_opt_succ(recv);
|
|
|
|
|
|
|
|
if (val == Qundef) {
|
2018-01-29 02:04:50 -05:00
|
|
|
DISPATCH_ORIGINAL_INSN(opt_send_without_block);
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* optimized not */
|
2007-12-18 07:07:51 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
opt_not
|
2015-09-19 13:59:58 -04:00
|
|
|
(CALL_INFO ci, CALL_CACHE cc)
|
2007-12-18 07:07:51 -05:00
|
|
|
(VALUE recv)
|
|
|
|
(VALUE val)
|
|
|
|
{
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
val = vm_opt_not(ci, cc, recv);
|
2015-09-19 13:59:58 -04:00
|
|
|
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
if (val == Qundef) {
|
2018-01-29 02:04:50 -05:00
|
|
|
DISPATCH_ORIGINAL_INSN(opt_send_without_block);
|
2007-12-18 07:07:51 -05:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* optimized regexp match */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
opt_regexpmatch1
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
(VALUE recv)
|
2007-01-16 03:52:22 -05:00
|
|
|
(VALUE obj)
|
|
|
|
(VALUE val)
|
|
|
|
{
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
val = vm_opt_regexpmatch1(recv, obj);
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* optimized regexp match 2 */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
opt_regexpmatch2
|
2015-09-19 13:59:58 -04:00
|
|
|
(CALL_INFO ci, CALL_CACHE cc)
|
2007-01-16 03:52:22 -05:00
|
|
|
(VALUE obj2, VALUE obj1)
|
|
|
|
(VALUE val)
|
|
|
|
{
|
split insns.def into functions
Contemporary C compilers are good at function inlining. They fold
multiple functions into one. However they are not yet smart enough to
unfold a function into several ones. So generally speaking, it is
wiser for a C programmer to manually split C functions whenever
possible. That should make rooms for compilers to optimize at will.
Before this changeset insns.def was converted into single HUGE
function called vm_exec_core(). By moving each instruction's core
into individual functions, generated C source code is reduced from
3,428 lines to 2,847 lines. Looking at the generated assembly
however, it seems my compiler (gcc 6.2) is extraordinary smart so that
it inlines almost all functions I introduced in this changeset back
into that vm_exec_core. On my machine compiled machine binary of the
function does not shrink very much in size (28,432 bytes to 26,816
bytes, according to nm(1)).
I believe this change is zero-cost. Several benchmarks I exercised
showed no significant difference beyond error mergin. For instance
3 repeated runs of optcarrot benchmark on my machine resulted in:
before this: 28.330329285707490, 27.513378371065920, 29.40420215754537
after this: 27.107195867280414, 25.549324021385907, 30.31581919050884
in fps (greater==faster).
----
* internal.h (rb_obj_not_equal): used from vm_insnhelper.c
* insns.def: move vast majority of lines into vm_insnhelper.c
* vm_insnhelper.c: moved here.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@58390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2017-04-18 06:58:49 -04:00
|
|
|
val = vm_opt_regexpmatch2(obj2, obj1);
|
|
|
|
|
|
|
|
if (val == Qundef) {
|
2018-01-29 02:04:50 -05:00
|
|
|
DISPATCH_ORIGINAL_INSN(opt_send_without_block);
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* call native compiled method */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
2007-06-30 14:02:24 -04:00
|
|
|
opt_call_c_function
|
2007-08-12 15:09:15 -04:00
|
|
|
(rb_insn_func_t funcptr)
|
2007-01-16 03:52:22 -05:00
|
|
|
()
|
|
|
|
()
|
2018-01-26 01:30:58 -05:00
|
|
|
// attr bool handles_frame = true;
|
2007-01-16 03:52:22 -05:00
|
|
|
{
|
2017-10-27 15:08:31 -04:00
|
|
|
reg_cfp = (funcptr)(ec, reg_cfp);
|
2007-01-16 03:52:22 -05:00
|
|
|
|
2007-06-30 14:02:24 -04:00
|
|
|
if (reg_cfp == 0) {
|
2017-10-27 02:21:50 -04:00
|
|
|
VALUE err = ec->errinfo;
|
|
|
|
ec->errinfo = Qnil;
|
2007-07-01 22:59:37 -04:00
|
|
|
THROW_EXCEPTION(err);
|
2007-06-30 14:02:24 -04:00
|
|
|
}
|
|
|
|
|
mjit_compile.c: merge initial JIT compiler
which has been developed by Takashi Kokubun <takashikkbn@gmail> as
YARV-MJIT. Many of its bugs are fixed by wanabe <s.wanabe@gmail.com>.
This JIT compiler is designed to be a safe migration path to introduce
JIT compiler to MRI. So this commit does not include any bytecode
changes or dynamic instruction modifications, which are done in original
MJIT.
This commit even strips off some aggressive optimizations from
YARV-MJIT, and thus it's slower than YARV-MJIT too. But it's still
fairly faster than Ruby 2.5 in some benchmarks (attached below).
Note that this JIT compiler passes `make test`, `make test-all`, `make
test-spec` without JIT, and even with JIT. Not only it's perfectly safe
with JIT disabled because it does not replace VM instructions unlike
MJIT, but also with JIT enabled it stably runs Ruby applications
including Rails applications.
I'm expecting this version as just "initial" JIT compiler. I have many
optimization ideas which are skipped for initial merging, and you may
easily replace this JIT compiler with a faster one by just replacing
mjit_compile.c. `mjit_compile` interface is designed for the purpose.
common.mk: update dependencies for mjit_compile.c.
internal.h: declare `rb_vm_insn_addr2insn` for MJIT.
vm.c: exclude some definitions if `-DMJIT_HEADER` is provided to
compiler. This avoids to include some functions which take a long time
to compile, e.g. vm_exec_core. Some of the purpose is achieved in
transform_mjit_header.rb (see `IGNORED_FUNCTIONS`) but others are
manually resolved for now. Load mjit_helper.h for MJIT header.
mjit_helper.h: New. This is a file used only by JIT-ed code. I'll
refactor `mjit_call_cfunc` later.
vm_eval.c: add some #ifdef switches to skip compiling some functions
like Init_vm_eval.
win32/mkexports.rb: export thread/ec functions, which are used by MJIT.
include/ruby/defines.h: add MJIT_FUNC_EXPORTED macro alis to clarify
that a function is exported only for MJIT.
array.c: export a function used by MJIT.
bignum.c: ditto.
class.c: ditto.
compile.c: ditto.
error.c: ditto.
gc.c: ditto.
hash.c: ditto.
iseq.c: ditto.
numeric.c: ditto.
object.c: ditto.
proc.c: ditto.
re.c: ditto.
st.c: ditto.
string.c: ditto.
thread.c: ditto.
variable.c: ditto.
vm_backtrace.c: ditto.
vm_insnhelper.c: ditto.
vm_method.c: ditto.
I would like to improve maintainability of function exports, but I
believe this way is acceptable as initial merging if we clarify the
new exports are for MJIT (so that we can use them as TODO list to fix)
and add unit tests to detect unresolved symbols.
I'll add unit tests of JIT compilations in succeeding commits.
Author: Takashi Kokubun <takashikkbn@gmail.com>
Contributor: wanabe <s.wanabe@gmail.com>
Part of [Feature #14235]
---
* Known issues
* Code generated by gcc is faster than clang. The benchmark may be worse
in macOS. Following benchmark result is provided by gcc w/ Linux.
* Performance is decreased when Google Chrome is running
* JIT can work on MinGW, but it doesn't improve performance at least
in short running benchmark.
* Currently it doesn't perform well with Rails. We'll try to fix this
before release.
---
* Benchmark reslts
Benchmarked with:
Intel 4.0GHz i7-4790K with 16GB memory under x86-64 Ubuntu 8 Cores
- 2.0.0-p0: Ruby 2.0.0-p0
- r62186: Ruby trunk (early 2.6.0), before MJIT changes
- JIT off: On this commit, but without `--jit` option
- JIT on: On this commit, and with `--jit` option
** Optcarrot fps
Benchmark: https://github.com/mame/optcarrot
| |2.0.0-p0 |r62186 |JIT off |JIT on |
|:--------|:--------|:--------|:--------|:--------|
|fps |37.32 |51.46 |51.31 |58.88 |
|vs 2.0.0 |1.00x |1.38x |1.37x |1.58x |
** MJIT benchmarks
Benchmark: https://github.com/benchmark-driver/mjit-benchmarks
(Original: https://github.com/vnmakarov/ruby/tree/rtl_mjit_branch/MJIT-benchmarks)
| |2.0.0-p0 |r62186 |JIT off |JIT on |
|:----------|:--------|:--------|:--------|:--------|
|aread |1.00 |1.09 |1.07 |2.19 |
|aref |1.00 |1.13 |1.11 |2.22 |
|aset |1.00 |1.50 |1.45 |2.64 |
|awrite |1.00 |1.17 |1.13 |2.20 |
|call |1.00 |1.29 |1.26 |2.02 |
|const2 |1.00 |1.10 |1.10 |2.19 |
|const |1.00 |1.11 |1.10 |2.19 |
|fannk |1.00 |1.04 |1.02 |1.00 |
|fib |1.00 |1.32 |1.31 |1.84 |
|ivread |1.00 |1.13 |1.12 |2.43 |
|ivwrite |1.00 |1.23 |1.21 |2.40 |
|mandelbrot |1.00 |1.13 |1.16 |1.28 |
|meteor |1.00 |2.97 |2.92 |3.17 |
|nbody |1.00 |1.17 |1.15 |1.49 |
|nest-ntimes|1.00 |1.22 |1.20 |1.39 |
|nest-while |1.00 |1.10 |1.10 |1.37 |
|norm |1.00 |1.18 |1.16 |1.24 |
|nsvb |1.00 |1.16 |1.16 |1.17 |
|red-black |1.00 |1.02 |0.99 |1.12 |
|sieve |1.00 |1.30 |1.28 |1.62 |
|trees |1.00 |1.14 |1.13 |1.19 |
|while |1.00 |1.12 |1.11 |2.41 |
** Discourse's script/bench.rb
Benchmark: https://github.com/discourse/discourse/blob/v1.8.7/script/bench.rb
NOTE: Rails performance was somehow a little degraded with JIT for now.
We should fix this.
(At least I know opt_aref is performing badly in JIT and I have an idea
to fix it. Please wait for the fix.)
*** JIT off
Your Results: (note for timings- percentile is first, duration is second in millisecs)
categories_admin:
50: 17
75: 18
90: 22
99: 29
home_admin:
50: 21
75: 21
90: 27
99: 40
topic_admin:
50: 17
75: 18
90: 22
99: 32
categories:
50: 35
75: 41
90: 43
99: 77
home:
50: 39
75: 46
90: 49
99: 95
topic:
50: 46
75: 52
90: 56
99: 101
*** JIT on
Your Results: (note for timings- percentile is first, duration is second in millisecs)
categories_admin:
50: 19
75: 21
90: 25
99: 33
home_admin:
50: 24
75: 26
90: 30
99: 35
topic_admin:
50: 19
75: 20
90: 25
99: 30
categories:
50: 40
75: 44
90: 48
99: 76
home:
50: 42
75: 48
90: 51
99: 89
topic:
50: 49
75: 55
90: 58
99: 99
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@62197 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2018-02-04 06:22:28 -05:00
|
|
|
EXEC_EC_CFP();
|
2007-01-16 03:52:22 -05:00
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* BLT */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
bitblt
|
|
|
|
()
|
|
|
|
()
|
|
|
|
(VALUE ret)
|
|
|
|
{
|
|
|
|
ret = rb_str_new2("a bit of bacon, lettuce and tomato");
|
|
|
|
}
|
|
|
|
|
2018-01-12 03:38:07 -05:00
|
|
|
/* The Answer to Life, the Universe, and Everything */
|
2007-01-16 03:52:22 -05:00
|
|
|
DEFINE_INSN
|
|
|
|
answer
|
|
|
|
()
|
|
|
|
()
|
|
|
|
(VALUE ret)
|
|
|
|
{
|
|
|
|
ret = INT2FIX(42);
|
|
|
|
}
|