Commit Graph

21 Commits

Author SHA1 Message Date
Pandora 599158e2ee
fix compiling on ARM
thanks toshi
2017-12-08 22:26:20 -05:00
Pandora 4318dbe051
fix compiling for 32-bit machines / machine without SSE2 2017-12-07 16:09:13 -05:00
Pandora 2040285ce9 revert back to better blurring behaviour 2017-12-06 13:57:07 -05:00
Chris Guillott 9f8496441c blurring stuff should work perfectly fine now 2017-12-05 22:07:38 -05:00
Chris Guillott 99842b3963 fix msse3 2017-12-05 17:47:21 -05:00
Chris Guillott 8946358b26 revert to former implementation with array size fix 2017-12-05 14:28:01 -05:00
Chris Guillott 5e0aeccbb3 first commit towards fixing this 2017-12-05 12:53:54 -05:00
Sebastian Frysztak 024dc2980e Minor style changes 2017-02-15 11:27:43 +01:00
Sebastian Frysztak 3598cf19e8 Implement generic box blur 2017-02-15 11:22:06 +01:00
Sebastian Frysztak 6029c8e0b5 Clean up a bit. 2016-11-11 18:45:20 +01:00
Sebastian Frysztak 020af692e6 Remove AVX version. 2016-11-11 17:11:31 +01:00
Sebastian Frysztak e5e6368926 Remove SSSE3 version. 2016-11-11 16:46:53 +01:00
Sebastian Frysztak b47631d785 SSE2: resize filter to 7x7. clean up a little. 2016-11-11 13:11:11 +01:00
Sebastian Frysztak ab41586b39 SSE2: switch from Gaussian to box blur 2016-11-05 16:01:40 +01:00
Sebastian Frysztak 252999f640 Slightly refactor border handling code. 2016-11-04 22:41:17 +01:00
Sebastian Frysztak f06dc6cbc4 Add AVX version.
It relies on some SSE2 instructions, so performance gain is not that
huge (about 1.4x).
I experimented with 256-bit loads, but they turned out to be slower (at
least on Sandy Bridge).
2016-11-04 22:19:29 +01:00
Sebastian Frysztak 95c333cba5 SSSE3: use 16-bit weights.
Overall, I'm very happy with performance of this code, but not so much
with resulting image. It seems like integer approximations won't do.
I might remove this code altogether, so I didn't update comments.
2016-11-03 20:16:06 +01:00
Sebastian Frysztak 72aec87047 Add SSSE3-based blur implementation.
Calculations are done on integer, rather than floating point numbers,
so this implementation is not as accurate (but when scale factor is
reasonable enough, no artifacs are visible).
It is, however, faster by a factor of ~3.
2016-10-29 14:32:49 +02:00
Sebastian Frysztak 3662b8e187 Improve border handling for larger kernels. 2016-10-28 17:36:43 +02:00
Sebastian Frysztak afe41c5754 Extend kernel size to 15x15. 2016-10-28 17:35:33 +02:00
Sebastian Frysztak fb5dbbe661 Add SSE2-optimized blur.
About 4-6 times faster than naive implementation.
2016-10-22 15:30:27 +02:00