Investigating SSE Cross Product Performance

Today’s little snippet shows a variant of the usual cross product implementation in your average SSE vector library. In pseudo-code, we can express the cross product formula as This is reasonably straightforward to implement as an SSE2 function, named cross_4shuffles because of reasons that will become apparent soon: At first glance, there doesn't seem to […]