Home > News information

News information

Rockchip RK3308 Smart Speaker PSRAM Replacement for DDR3 Memory Solution

Currently, the DDR3 market is facing persistent supply shortages and high prices. As the mainstream main control chip for smart speakers and voice central controls (quad-core A35, integrated hardware VAD, supporting 8-microphone array), the RK3308 requires external memory to run Linux + complex audio applications. The traditional 128–512MB DDR3 solution is under dual pressure of supply and cost. PSRAM (Pseudo-Static RAM) features a simple interface, stable supply, obvious cost advantages, and a capacity range of 64 to 512Mbit, which can perfectly adapt to smart speaker audio decoding and complex application scenarios, making it an ideal alternative solution in the context of DDR3 shortage.

PSRAM vs DDR3: Adaptability Comparison in Smart Speaker Scenarios

1. Core Feature Comparison (RK3308 Scenario)

Comparison Dimension
DDR3 (Traditional Solution)
PSRAM (Alternative Solution)
Smart Speaker Adaptability Conclusion
Interface
Parallel DDR interface, requiring PHY, complex clock/wiring
QSPI/OPI serial interface, only 6–8 pins, no PHY required
PSRAM features extremely simple hardware design, lower PCB cost and easier wiring
Bandwidth
12.8GB/s (DDR3-1600)
QSPI: ~532Mbps; OPI: ~1.6Gbps
Meets the requirements of audio decoding, microphone array buffering, and AEC/NS algorithms; bandwidth is sufficient for lossless audio/multi-microphone scenarios
Capacity
64MB–512MB (mainstream)
8MB–64MB (64Mbit–512Mbit)
Covers the basic to mid-range memory needs of smart speakers; high-end models can use multi-chip parallel connection
Refresh
Requires regular refresh by the main control, high software complexity
Built-in self-refresh, transparent to the outside, no software intervention required
PSRAM offers higher development efficiency and supports rapid Linux porting
Supply & Cost
Shortage, high price volatility, long delivery time
Stable supply, lower cost
PSRAM solves the shortage pain point and significantly reduces BOM cost
Power Consumption
High dynamic power consumption, average standby power consumption
Low dynamic power consumption, extremely low standby power consumption
Suitable for battery-powered, low-power speaker solutions

Smart Speaker Key Scenario Adaptability

  • Far-field voice pickup (4/6/8-microphone array): PSRAM can carry multi-channel audio sampling buffering, meeting the historical frame buffering needs of AEC/NS/dual wake-up algorithms.

  • Local + cloud dual wake-up / VAD: Hardware VAD + PSRAM buffering eliminates the need for frequent Flash access, and the wake-up response speed is the same as that of the DDR3 solution.

  • Multi-protocol audio decoding (MP3/FLAC/AAC/APE): PSRAM bandwidth can support high-spec lossless decoding, and the decoding buffer resides in PSRAM to ensure smooth performance.

  • Linux + voice SDK (Baidu/Alibaba/Xunfei): PSRAM can be used as system cache and algorithm memory pool; the voice system can run stably after cutting the Linux kernel.

  • Bluetooth/Wi-Fi/DLNA/AirPlay: Network protocol stacks and audio stream buffers can be placed in PSRAM, enabling smooth multi-task concurrency and adapting to wireless audio casting scenarios.

RK3308 + PSRAM Alternative Solution: Hardware + Software Practice

1. Hardware Design: RK3308 Connection to PSRAM (QSPI/OPI)

(1) Main Control Selection

The main control is mainly RK3308 standard version / RK3308B industrial version; RK3308G/H is not applicable.
The RK3308 has a built-in QSPI/OPI controller, which directly supports PSRAM without additional adapter chips.

(2) Key Hardware Connection Points

  • QSPI mode: Connect to QSPI_CLK, CS, IO0–IO3 of RK3308 (a total of 6 pins), no additional control signals required.

  • OPI mode: Connect to IO0–IO7 (8 pins), double the bandwidth, suitable for high-bitrate audio scenarios.

  • Power supply: 1.8V power supply, compatible with RK3308 I/O level, no level conversion required.

  • PCB design: Serial wiring is simple and can be realized with a 2-layer board; compared with the 4–6 layer board of DDR3, the cost is significantly reduced.

(3) Capacity Expansion: Multi-Chip Parallel Connection

When a capacity of 128MB or more is required, multiple PSRAM chips can be connected in parallel (distinguished by CS chip selection). The RK3308 supports QSPI/OPI multi-chip expansion to meet the needs of high-end speakers.

2. Software Adaptation: Linux System + PSRAM Driver Porting

(1) Kernel Configuration

  • Enable the RK3308 QSPI/OPI controller driver, and configure PSRAM as memory mapping mode (MMU mapped to system memory).

  • Cut the Linux kernel: Turn off unnecessary services and graphics modules to reduce the kernel size and reserve more PSRAM for audio/voice algorithms.

(2) Memory Allocation Strategy

  • PSRAM dedicated area: Allocate most of the PSRAM to the audio buffer, microphone array buffer, and AEC/NS algorithm memory pool.

  • System operation area: Allocate the remaining part to the Linux kernel, processes, and network protocol stack.

  • Disable swap partition: PSRAM has limited bandwidth; disabling swap avoids performance degradation.

(3) Voice SDK Adaptation

  • Adjust the memory allocation interface of mainstream voice SDKs, and point algorithm temporary variables and model caches to the PSRAM area.

  • Optimize VAD and wake-up word detection processes: Hardware VAD results are directly written to PSRAM, and the CPU only performs subsequent processing to reduce load.

(4) Performance Optimization

  • Enable DMA data transfer: I2S audio streams and microphone array data are directly written to PSRAM through DMA with zero CPU participation, improving real-time performance.

  • Optimize the audio decoding library: The decoding buffer resides in PSRAM to reduce Flash access and ensure controllable decoding delay.

PSRAM Selection Ideas for Speakers with Different Configurations

1. Entry-Level Smart Speaker (2-microphone, local wake-up, basic audio decoding)

  • Main control: RK3308 standard version

  • PSRAM capacity: Small-capacity PSRAM

  • System: Cut Linux + lightweight voice SDK

  • Advantages: Lowest cost, most stable supply, meeting entry-level needs

2. Standard Smart Speaker (4-microphone, AEC/NS, local + cloud dual wake-up)

  • Main control: RK3308B industrial version (with CAN, suitable for gateway scenarios)

  • PSRAM capacity: Medium-capacity PSRAM, OPI interface recommended

  • System: Complete Linux + mainstream voice SDK

  • Advantages: Sufficient bandwidth, stable operation, adapting to mainstream mass production configurations

3. High-End Smart Speaker (8-microphone, lossless decoding, multi-protocol concurrency)

  • Main control: RK3308B industrial version

  • PSRAM capacity: Large-capacity PSRAM or multi-chip parallel connection

  • System: Complete Linux + full-featured voice SDK + high-definition audio decoding

  • Advantages: Performance close to the DDR3 solution, obvious supply and cost advantages

Core Advantages of PSRAM Replacing DDR3

  • Solving supply chain pain points: Addresses DDR3 shortage and long delivery time; PSRAM has stable supply to ensure continuous mass production.

  • Significant cost optimization: Reduces both BOM cost and PCB cost, suitable for large-batch smart speaker products.

  • Perfect scenario adaptation: Bandwidth and capacity fully cover the audio decoding + complex application needs of RK3308 smart speakers/voice central controls.

  • Low migration cost: RK3308 natively supports QSPI/OPI PSRAM, with mature Linux drivers and low software adaptation workload.

In the current market environment of persistent DDR3 shortage, RK3308/RK3308B + PSRAM has become a mature alternative solution for smart speakers and voice central control products. It combines stability, cost advantages and mass production feasibility, making it the optimal choice at this stage.