Difference between revisions of "Porting/ncnn"
Jump to navigation
Jump to search
JeremyRand (talk | contribs) (Finished: VSX toolchains: check for SSE2 support) |
JeremyRand (talk | contribs) (→Finished: python: document CMAKE_TOOLCHAIN_FILE env var) |
||
(11 intermediate revisions by the same user not shown) | |||
Line 3: | Line 3: | ||
* [https://github.com/Tencent/ncnn/pull/4807 Translate x86_64 SSE to ppc64le VSX intrinsics] | * [https://github.com/Tencent/ncnn/pull/4807 Translate x86_64 SSE to ppc64le VSX intrinsics] | ||
* [https://github.com/Tencent/ncnn/pull/4845 VSX toolchains: check for SSE2 support] | * [https://github.com/Tencent/ncnn/pull/4845 VSX toolchains: check for SSE2 support] | ||
+ | * [https://github.com/Tencent/ncnn/pull/4853 Add POWER8 VSX toolchains] | ||
+ | * [https://github.com/Tencent/ncnn/pull/4924 load_param_mem pybind] | ||
+ | * [https://github.com/Tencent/ncnn/issues/5063 test_squeezenet failed under Lubuntu 16.04 PowerPC 32-bit] | ||
+ | * [https://github.com/Tencent/ncnn/pull/5121 support big endian platform, add powerpc ci] | ||
+ | * [https://github.com/Tencent/ncnn/pull/5174 Update POWER Clang version docs] | ||
+ | * [https://github.com/Tencent/ncnn/pull/5178 Update Vulkan dependency docs] | ||
+ | * [https://github.com/Tencent/ncnn/pull/5228 Document libomp-dev dependency] | ||
+ | * [https://github.com/Tencent/ncnn/pull/5229 python: document CMAKE_TOOLCHAIN_FILE env var] | ||
= In progress = | = In progress = | ||
* CI missing for POWER9/Clang | * CI missing for POWER9/Clang | ||
− | |||
− | |||
* Replace SSE with native VSX | * Replace SSE with native VSX | ||
+ | |||
+ | = VSX Targets = | ||
+ | |||
+ | When running Real-ESRGAN in ncnn on POWER9, most CPU time (over 81%) is spent inside [https://github.com/Tencent/ncnn/blob/575098640c254be2208095254f3e5de412751447/src/layer/x86/convolution_3x3_winograd.h#L613 <code>gemm_transB_packed_tile</code> in <code>convolution_3x3_winograd.h</code>], which uses SSE2. This may be a good target for rewriting in native VSX. | ||
= See Also = | = See Also = |
Latest revision as of 12:24, 19 December 2023
Contents
Finished
- Translate x86_64 SSE to ppc64le VSX intrinsics
- VSX toolchains: check for SSE2 support
- Add POWER8 VSX toolchains
- load_param_mem pybind
- test_squeezenet failed under Lubuntu 16.04 PowerPC 32-bit
- support big endian platform, add powerpc ci
- Update POWER Clang version docs
- Update Vulkan dependency docs
- Document libomp-dev dependency
- python: document CMAKE_TOOLCHAIN_FILE env var
In progress
- CI missing for POWER9/Clang
- Replace SSE with native VSX
VSX Targets
When running Real-ESRGAN in ncnn on POWER9, most CPU time (over 81%) is spent inside gemm_transB_packed_tile
in convolution_3x3_winograd.h
, which uses SSE2. This may be a good target for rewriting in native VSX.