Pandora
599158e2ee
fix compiling on ARM
...
thanks toshi
2017-12-08 22:26:20 -05:00
Pandora
4318dbe051
fix compiling for 32-bit machines / machine without SSE2
2017-12-07 16:09:13 -05:00
Pandora
2040285ce9
revert back to better blurring behaviour
2017-12-06 13:57:07 -05:00
Chris Guillott
9f8496441c
blurring stuff should work perfectly fine now
2017-12-05 22:07:38 -05:00
Chris Guillott
99842b3963
fix msse3
2017-12-05 17:47:21 -05:00
Chris Guillott
8946358b26
revert to former implementation with array size fix
2017-12-05 14:28:01 -05:00
Chris Guillott
5e0aeccbb3
first commit towards fixing this
2017-12-05 12:53:54 -05:00
Sebastian Frysztak
024dc2980e
Minor style changes
2017-02-15 11:27:43 +01:00
Sebastian Frysztak
3598cf19e8
Implement generic box blur
2017-02-15 11:22:06 +01:00
Sebastian Frysztak
6029c8e0b5
Clean up a bit.
2016-11-11 18:45:20 +01:00
Sebastian Frysztak
020af692e6
Remove AVX version.
2016-11-11 17:11:31 +01:00
Sebastian Frysztak
e5e6368926
Remove SSSE3 version.
2016-11-11 16:46:53 +01:00
Sebastian Frysztak
b47631d785
SSE2: resize filter to 7x7. clean up a little.
2016-11-11 13:11:11 +01:00
Sebastian Frysztak
ab41586b39
SSE2: switch from Gaussian to box blur
2016-11-05 16:01:40 +01:00
Sebastian Frysztak
252999f640
Slightly refactor border handling code.
2016-11-04 22:41:17 +01:00
Sebastian Frysztak
f06dc6cbc4
Add AVX version.
...
It relies on some SSE2 instructions, so performance gain is not that
huge (about 1.4x).
I experimented with 256-bit loads, but they turned out to be slower (at
least on Sandy Bridge).
2016-11-04 22:19:29 +01:00
Sebastian Frysztak
95c333cba5
SSSE3: use 16-bit weights.
...
Overall, I'm very happy with performance of this code, but not so much
with resulting image. It seems like integer approximations won't do.
I might remove this code altogether, so I didn't update comments.
2016-11-03 20:16:06 +01:00
Sebastian Frysztak
72aec87047
Add SSSE3-based blur implementation.
...
Calculations are done on integer, rather than floating point numbers,
so this implementation is not as accurate (but when scale factor is
reasonable enough, no artifacs are visible).
It is, however, faster by a factor of ~3.
2016-10-29 14:32:49 +02:00
Sebastian Frysztak
3662b8e187
Improve border handling for larger kernels.
2016-10-28 17:36:43 +02:00
Sebastian Frysztak
afe41c5754
Extend kernel size to 15x15.
2016-10-28 17:35:33 +02:00
Sebastian Frysztak
fb5dbbe661
Add SSE2-optimized blur.
...
About 4-6 times faster than naive implementation.
2016-10-22 15:30:27 +02:00