Home > News information

News information

Rockchip RK3308 Smart Speaker Solution Using PSRAM as Replacement for DDR3 RAM

The DDR3 market continues to face persistent shortages and high prices. As a mainstream processor for smart speakers and voice control panels (quad-core ARM Cortex-A35, integrated hardware VAD, supports 8-microphone arrays), the Rockchip RK3308 requires external memory to run Linux and complex audio applications. Traditional 128–512MB DDR3 solutions are under dual pressure from supply constraints and cost. PSRAM (Pseudo SRAM) features a simple interface, stable supply, and obvious cost advantages, with a capacity range from 64Mbit to 512Mbit. It is perfectly suitable for audio decoding and complex application scenarios in smart speakers, making it an ideal alternative amid the DDR3 shortage.

PSRAM vs DDR3: Adaptability Comparison for Smart Speaker Applications

1. Core Feature Comparison (RK3308 Application Scenario)

Comparison Item

DDR3 (Traditional Solution)

PSRAM (Alternative Solution)

Conclusion for Smart Speakers

 Interface

Parallel DDR interface requiring PHY and complex clock/routing

QSPI/OPI serial interface with only 6–8 pins, no PHY needed

PSRAM enables extremely simple hardware design, lower PCB cost, and easier routing

Bandwidth

12.8GB/s (DDR3-1600)

QSPI: ~532Mbps; OPI: ~1.6Gbps

Sufficient for audio decoding, microphone array buffering, and AEC/NS algorithms; meets lossless audio and multi-mic scenarios

Capacity

64MB–512MB (mainstream)

8MB–64MB (64Mbit–512Mbit)

Covers entry-level to mid-range memory requirements for smart speakers; high-end configurations can use multiple chips in parallel

Refresh

Requires periodic refresh by the host, high software complexity

Built-in self-refresh, transparent to the system, no software intervention

PHigher development efficiency with PSRAM, enabling fast Linux porting

Supply & Cost

Shortages, large price fluctuations, long lead times

Stable supply, lower cost

PSRAM resolves supply shortages and significantly reduces BOM cost

Power Consumption

High dynamic power consumption, average standby power

Low dynamic power consumption, ultra-low standby power

Ideal for battery-powered and low-power speaker designs


Adaptability to Key Smart Speaker Scenarios

  • Far-field voice capture (4/6/8-mic arrays): PSRAM can buffer multi-channel audio sampling data and meet the historical frame buffering requirements of AEC/NS and dual-wakeword algorithms.

  • Local + cloud dual wakeword / VAD: Hardware VAD with PSRAM buffering eliminates frequent Flash access, providing the same wake response speed as DDR3 solutions.

  • Multi-format audio decoding (MP3/FLAC/AAC/APE): PSRAM bandwidth supports high-spec lossless decoding, with decoding buffers resident in PSRAM for smooth playback.

  • Linux + voice SDK (Baidu / Alibaba / iFlyTek): PSRAM acts as system cache and algorithm memory pool. After Linux kernel optimization, the voice system runs stably.

  • Bluetooth / Wi-Fi / DLNA / AirPlay: Network protocol stacks and audio streams can be buffered in PSRAM, ensuring smooth multitasking and supporting wireless audio casting.

RK3308 + PSRAM Replacement Solution: Hardware & Software Implementation

1. Hardware Design: RK3308 with PSRAM (QSPI/OPI)

(1) Host Selection

The solution is based on the standard RK3308 or industrial-grade RK3308B; RK3308G/H are not applicable.The RK3308 integrates a QSPI/OPI controller with direct PSRAM support, requiring no additional adapter chips.

(2) Key Hardware Connections

  • QSPI mode: Connect to RK3308 QSPI_CLK, CS, IO0–IO3 (6 pins total), no extra control signals required.

  • OPI mode: Connect IO0–IO7 (8 pins), doubling bandwidth for high-bitrate audio scenarios.

  • Power supply: 1.8V, compatible with RK3308 I/O voltage, no level shifter needed.

  • PCB design: Simple serial routing allows 2-layer boards, greatly reducing cost compared to DDR3’s 4–6-layer boards.

(3) Capacity Expansion: Multi-chip Parallel Operation

For capacities of 128MB and above, multiple PSRAM chips can be used in parallel (distinguished by CS chip select). The RK3308 supports QSPI/OPI multi-chip expansion for high-end speaker applications.

2. Software Adaptation: Linux System & PSRAM Driver Porting

(1) Kernel Configuration

Enable the RK3308 QSPI/OPI controller driver and configure PSRAM in memory-mapped mode (MMU-mapped as system memory).Optimize the Linux kernel by disabling unnecessary services and graphics modules to reduce kernel size and reserve more PSRAM for audio/voice algorithms.

(2) Memory Allocation Strategy

  • PSRAM dedicated area: Allocate the majority of PSRAM for audio buffers, mic array buffering, and AEC/NS algorithm memory pools.

  • System runtime area: Reserve remaining space for the Linux kernel, processes, and network protocol stacks.

  • Disable swap partition: Due to limited PSRAM bandwidth, swap is disabled to avoid performance degradation.

(3) Voice SDK Adaptation

Adjust memory allocation interfaces of mainstream voice SDKs to point algorithm variables and model buffers to the PSRAM area.Optimize VAD and wakeword detection: hardware VAD results are written directly to PSRAM, with CPU handling only post-processing to reduce load.

(4) Performance Optimization

  • Enable DMA data transfer: I2S audio streams and mic array data are written directly to PSRAM via DMA with zero CPU intervention, improving real-time performance.

  • Optimize audio decoding libraries: Decoding buffers reside in PSRAM to reduce Flash access and ensure controllable decoding latency.

PSRAM Selection Guidelines for Speakers with Different Configurations

1. Entry-level Smart Speaker (2 mics, local wakeword, basic audio decoding)

  • Host: Standard RK3308

  • PSRAM capacity: Low-capacity PSRAM

  • System: Optimized Linux + lightweight voice SDK

  • Advantage: Lowest cost, most stable supply, meets basic entry-level needs

2. Mid-range Smart Speaker (4 mics, AEC/NS, local + cloud dual wakeword)

  • Host: Industrial RK3308B (with CAN, suitable for gateway scenarios)

  • PSRAM capacity: Mid-capacity PSRAM, OPI interface recommended

  • System: Full Linux + mainstream voice SDK

  • Advantage: Sufficient bandwidth, stable operation, ideal for mass-production mainstream configurations

3. High-end Smart Speaker (8 mics, lossless decoding, multi-protocol concurrency)

  • Host: Industrial RK3308B

  • PSRAM capacity: High-capacity PSRAM or multiple chips in parallel

  • System: Full Linux + full-featured voice SDK + high-definition audio decoding

  • Advantage: Performance close to DDR3 solutions, with clear supply and cost benefits

Core Advantages of PSRAM Over DDR3

  • Supply chain relief: Solves DDR3 shortages and long lead times; PSRAM supply is stable to ensure continuous mass production.

  • Significant cost reduction: Lower BOM and PCB costs, ideal for high-volume smart speaker products.

  • Perfect scenario matching: Bandwidth and capacity fully cover audio decoding and complex application needs of RK3308 smart speakers and voice control panels.

  • Low migration cost: Native QSPI/OPI PSRAM support on RK3308, mature Linux drivers, and minimal software adaptation effort.

Amid the ongoing DDR3 shortage, the RK3308/RK3308B + PSRAM combination has become a mature alternative for smart speakers and voice control products. It delivers stability, cost efficiency, and mass-producibility, representing the optimal solution at this stage.