Embedded SystemsDMAZynq-7000March 15, 2026

Direct Memory Access via High-Performance AXI Ports (Simple Polling Example)

Implementing high-speed data transfers on the Zynq-7000 SoC. A practical guide to architecting DMA engines, managing cache coherency, and optimizing data flow.

The Challenge of High-Speed Data

In high-performance embedded systems, the bottleneck is rarely the processing power itself, but the movement of data. Using the Zynq-7000's AXI High-Performance (HP) Slave ports provides a dedicated, low-latency pathway between Programmable Logic (PL) and DDR memory. However, this efficiency comes at a cost: the need for manual cache management.

Required Materials

  • Hardware: Cora Z7-07S (Zynq-7000)
  • Toolchain: Xilinx Design Pack 2018.3 (Vivado + SDK)
  • Monitoring: Tera Term (Serial) & HxD (Binary analysis)

DMA Polling Architecture

Full view

Figure 1: The AXI DMA acts as the "mover," bridging the gap between DDR memory and custom logic streams.

Mastering Cache Coherency

Because the HP ports operate in a non-coherent domain, the DMA controller is oblivious to the CPU's L1/L2 caches. Failing to synchronize leads to "stale data" bugs where the CPU reads what it *thinks* is in memory, while the actual data sits in the cache.

  • Xil_DCacheFlushRange: Essential before a DMA Read (moving RAM to Peripheral) to push cache updates to DDR.
  • Xil_DCacheInvalidateRange: Vital after a DMA Write (Peripheral to RAM) to discard stale CPU cache lines.

Configure the hardware

XAxiDma_Config *CfgPtr;
CfgPtr = XAxiDma_LookupConfig(DeviceId);
if (!CfgPtr) {
    xil_printf("No config found for %d
", DeviceId);
    return XST_FAILURE;
}
Status = XAxiDma_CfgInitialize(&AxiDma, CfgPtr);
if (Status != XST_SUCCESS) {
    xil_printf("Initialization failed %d
", Status);
    return XST_FAILURE;
}

Disable interrupts for polling mode

// s2mm_intr: Device to DMA (Memory) Interrupt
XAxiDma_IntrDisable(&AxiDma, XAXIDMA_IRQ_ALL_MASK,
                    XAXIDMA_DEVICE_TO_DMA);

// mm2s_intr: DMA (Memory) to Device Interrupt
XAxiDma_IntrDisable(&AxiDma, XAXIDMA_IRQ_ALL_MASK,
                    XAXIDMA_DMA_TO_DEVICE);

// Generate TxBuffer Data
...
// Flush the cache to ensure data coherency
Xil_DCacheFlushRange((UINTPTR)TxBufferPtr, MAX_PKT_LEN);
Xil_DCacheFlushRange((UINTPTR)RxBufferPtr, MAX_PKT_LEN);

Initiate the transfer

// S2MM: Device to DMA (Memory)
Status = XAxiDma_SimpleTransfer(&AxiDma,(UINTPTR) RxBufferPtr,
                                MAX_PKT_LEN, XAXIDMA_DEVICE_TO_DMA);

// MM2S: DMA (Memory) to Device
Status = XAxiDma_SimpleTransfer(&AxiDma,(UINTPTR) TxBufferPtr,
                                MAX_PKT_LEN, XAXIDMA_DMA_TO_DEVICE);

// Polling for completion
while ((XAxiDma_Busy(&AxiDma,XAXIDMA_DEVICE_TO_DMA)) ||
			(XAxiDma_Busy(&AxiDma,XAXIDMA_DMA_TO_DEVICE))) {
  /* Wait */
}
// Invalidate the cache to ensure data coherency
Xil_DCacheInvalidateRange((UINTPTR)RxPacket, MAX_PKT_LEN);