Cortex-A9 NEON™ Media Processing Engine

Cortex-A9 NEON™ Media Processing Engine

Introduction

The Cortex-A9 NEON MPE extends the Cortex-A9 functionality to provide support for the ARM v7 Advanced SIMD and Vector Floating-Point v3 (VFPv3) instruction sets. The Cortex-A9 NEON MPE supports all addressing modes and data-processing operations described in the ARM Architecture Reference Manual.

The Cortex-A9 NEON MPE features are:

SIMD and scalar single-precision floating-point computation
scalar double-precision floating-point computation
SIMD and scalar half-precision floating-point conversion
8, 16, 32, and 64-bit signed and unsigned integer SIMD computation
8 or 16-bit polynomial computation for single-bit coefficients
structured data load capabilities
dual issue with Cortex-A9 processor ARM or Thumb instructions
independent pipelines for VFPv3 and Advanced SIMD instructions
large, shared register file, addressable as:
— thirty-two 32-bit S (single) registers
— thirty-two 64-bit D (double) registers
— sixteen 128-bit Q (quad) registers.

The Cortex-A9 NEON MPE provides high-performance SIMD vector operations for:

unsigned and signed integers
single bit coefficient polynomials
single-precision floating-point values.

The operations include:

addition and subtraction
multiplication with optional accumulation
maximum or minimum value driven lane selection operations
inverse square-root approximation
comprehensive data-structure load instructions, including register-bank-resident table lookup.

VFPv3

The Cortex-A9 NEON MPE hardware supports single and double-precision add, subtract, multiply, divide, multiply and accumulate, and square root operations as described in the ARM VFPv3 architecture. It provides conversions between 16-bit, 32-bit and 64-bit floating-point formats and ARM integer word formats, with special operations to perform conversions in round-towards-zero mode for high-level language support.

ARMv7 deprecates the use of VFP vector mode. The Cortex-A9 NEON MPE hardware does not support VFP vector operations. In this manual, the term vector refers to Advanced SIMD integer, polynomial and single-precision vector operations. The Cortex-A9 NEON MPE provides high speed VFP operation without support code. However, if an application requires VFP vector operation, then it must use support code. See the ARM Architecture Reference Manual for information on VFP vector operation support.
此处提到的support code指的是：为VFP专有结构，对boot code（汇编代码）进行适应性的改造，以完成专有指令以及异常的处理。具体可以参考VFP Support Code

Supported formats

Table 2-1 shows the formats supported for each of the Advanced SIMD and VFPv3 instruction sets implemented by the Cortex-A9 NEON MPE. All signed integers are two's complement representations.

Writing optimal VFP and Advanced SIMD code

The following guidelines can provide significant performance increases for VFP and Advanced SIMD code:
Where possible avoid:

unnecessary accesses to the VFP control registers
transferring values between the Cortex-A9 core registers and VFP or Advanced SIMD register file, see the ARM Architecture Reference Manual for definition of core registers
register dependencies between neighboring instructions
mixing Advanced SIMD only instructions with VFP only instructions.

Be aware that:

with the exception of simultaneous loads and stores, the processor can execute VFP and Advanced SIMD instructions in parallel with ARM or Thumb instructions
using Advanced SIMD value selection operations is more efficient than using the equivalent VFP compare with conditional execution.

Instruction timing tables

内容较多，这里不列出了，具体参考《Cortex™-A9 NEON™ Media Processing
Engine Technical Reference Manual》

Cortex-A9 NEON™ Media Processing Engine

Cortex-A9 NEON™ Media Processing Engine

Introduction

VFPv3

Supported formats

Writing optimal VFP and Advanced SIMD code

Instruction timing tables

推荐阅读