Comparison Dimension | DDR3 (Traditional Solution) | PSRAM (Alternative Solution) | Smart Speaker Adaptability Conclusion |
|---|---|---|---|
Interface | Parallel DDR interface, requiring PHY, complex clock/wiring | QSPI/OPI serial interface, only 6–8 pins, no PHY required | PSRAM features extremely simple hardware design, lower PCB cost and easier wiring |
Bandwidth | 12.8GB/s (DDR3-1600) | QSPI: ~532Mbps; OPI: ~1.6Gbps | Meets the requirements of audio decoding, microphone array buffering, and AEC/NS algorithms; bandwidth is sufficient for lossless audio/multi-microphone scenarios |
Capacity | 64MB–512MB (mainstream) | 8MB–64MB (64Mbit–512Mbit) | Covers the basic to mid-range memory needs of smart speakers; high-end models can use multi-chip parallel connection |
Refresh | Requires regular refresh by the main control, high software complexity | Built-in self-refresh, transparent to the outside, no software intervention required | PSRAM offers higher development efficiency and supports rapid Linux porting |
Supply & Cost | Shortage, high price volatility, long delivery time | Stable supply, lower cost | PSRAM solves the shortage pain point and significantly reduces BOM cost |
Power Consumption | High dynamic power consumption, average standby power consumption | Low dynamic power consumption, extremely low standby power consumption | Suitable for battery-powered, low-power speaker solutions |
Far-field voice pickup (4/6/8-microphone array): PSRAM can carry multi-channel audio sampling buffering, meeting the historical frame buffering needs of AEC/NS/dual wake-up algorithms.
Local + cloud dual wake-up / VAD: Hardware VAD + PSRAM buffering eliminates the need for frequent Flash access, and the wake-up response speed is the same as that of the DDR3 solution.
Multi-protocol audio decoding (MP3/FLAC/AAC/APE): PSRAM bandwidth can support high-spec lossless decoding, and the decoding buffer resides in PSRAM to ensure smooth performance.
Linux + voice SDK (Baidu/Alibaba/Xunfei): PSRAM can be used as system cache and algorithm memory pool; the voice system can run stably after cutting the Linux kernel.
Bluetooth/Wi-Fi/DLNA/AirPlay: Network protocol stacks and audio stream buffers can be placed in PSRAM, enabling smooth multi-task concurrency and adapting to wireless audio casting scenarios.
QSPI mode: Connect to QSPI_CLK, CS, IO0–IO3 of RK3308 (a total of 6 pins), no additional control signals required.
OPI mode: Connect to IO0–IO7 (8 pins), double the bandwidth, suitable for high-bitrate audio scenarios.
Power supply: 1.8V power supply, compatible with RK3308 I/O level, no level conversion required.
PCB design: Serial wiring is simple and can be realized with a 2-layer board; compared with the 4–6 layer board of DDR3, the cost is significantly reduced.
Enable the RK3308 QSPI/OPI controller driver, and configure PSRAM as memory mapping mode (MMU mapped to system memory).
Cut the Linux kernel: Turn off unnecessary services and graphics modules to reduce the kernel size and reserve more PSRAM for audio/voice algorithms.
PSRAM dedicated area: Allocate most of the PSRAM to the audio buffer, microphone array buffer, and AEC/NS algorithm memory pool.
System operation area: Allocate the remaining part to the Linux kernel, processes, and network protocol stack.
Disable swap partition: PSRAM has limited bandwidth; disabling swap avoids performance degradation.
Adjust the memory allocation interface of mainstream voice SDKs, and point algorithm temporary variables and model caches to the PSRAM area.
Optimize VAD and wake-up word detection processes: Hardware VAD results are directly written to PSRAM, and the CPU only performs subsequent processing to reduce load.
Enable DMA data transfer: I2S audio streams and microphone array data are directly written to PSRAM through DMA with zero CPU participation, improving real-time performance.
Optimize the audio decoding library: The decoding buffer resides in PSRAM to reduce Flash access and ensure controllable decoding delay.
Main control: RK3308 standard version
PSRAM capacity: Small-capacity PSRAM
System: Cut Linux + lightweight voice SDK
Advantages: Lowest cost, most stable supply, meeting entry-level needs
Main control: RK3308B industrial version (with CAN, suitable for gateway scenarios)
PSRAM capacity: Medium-capacity PSRAM, OPI interface recommended
System: Complete Linux + mainstream voice SDK
Advantages: Sufficient bandwidth, stable operation, adapting to mainstream mass production configurations
Main control: RK3308B industrial version
PSRAM capacity: Large-capacity PSRAM or multi-chip parallel connection
System: Complete Linux + full-featured voice SDK + high-definition audio decoding
Advantages: Performance close to the DDR3 solution, obvious supply and cost advantages
Solving supply chain pain points: Addresses DDR3 shortage and long delivery time; PSRAM has stable supply to ensure continuous mass production.
Significant cost optimization: Reduces both BOM cost and PCB cost, suitable for large-batch smart speaker products.
Perfect scenario adaptation: Bandwidth and capacity fully cover the audio decoding + complex application needs of RK3308 smart speakers/voice central controls.
Low migration cost: RK3308 natively supports QSPI/OPI PSRAM, with mature Linux drivers and low software adaptation workload.
Telephone: +86-150-1290-5940
Mobile phone: +86-150-1290-5940
Mailbox: sales@manduic.com
Address: Room 618, 6th Floor, Derun Building, No. 366 Chaofeng Road, Fenghuang Street, Guangming District, Shenzhen, China