How do I optimize code for faster execution?
Optimizing code for faster execution—especially in embedded systems like with STM32—means improving how efficiently the CPU runs your program. This involves algorithmic improvements, compiler settings, hardware-aware coding, and memory access optimization.
Here’s a breakdown of strategies you can apply:
1. Algorithmic Optimization (Most Important)
Use efficient algorithms and data structures:
-
Replace O(n²) loops with O(n log n) when possible (e.g., sorting).
-
Minimize nested loops and redundant computations.
-
Cache reused calculations.
Bad:
Better:
2. Use Compiler Optimization Flags
If you're using GCC (like in STM32CubeIDE):
-
-O0
– No optimization (default for debugging). -
-O1
,-O2
,-O3
– Increasing levels of optimization. -
-Os
– Optimize for size (useful in constrained environments). -
-Ofast
– Aggressive optimizations, might ignore strict IEEE rules.
Recommended:
Set your Release build to -O2
or -O3
in Project > Properties > C/C++ Build > Settings > Tool Settings > Optimization.
3. Use Fixed-Point Instead of Floating-Point (If No FPU)
-
Many STM32 chips (e.g., STM32F103) don’t have an FPU.
-
Floating-point math (
float
,double
) is slow if emulated in software. -
Use integers or fixed-point libraries.
Slow:
Faster:
4. Avoid Dynamic Memory Allocation
-
Avoid using
malloc()
andfree()
inside performance-critical loops. -
Prefer static or stack allocation.
5. Inline Functions and Loop Unrolling
-
Use
inline
for small, frequently called functions. -
Unroll loops if they are small and constant-bounded.
Example:
6. Memory Optimization
-
Place time-critical functions in RAM (use
__attribute__((section(".ramfunc")))
) if flash wait states are an issue. -
Minimize cache misses or bus contention (if using DMA or peripherals).
-
Optimize flash-to-RAM access when using constant tables.
7. Use DMA (Direct Memory Access)
Offload data transfer (e.g., ADC, UART, SPI) to DMA so the CPU can focus on processing instead of moving data.
8. Profile and Benchmark
-
Use cycle counters or SysTick timer to measure execution time of functions.
-
STM32CubeIDE includes SWV (Serial Wire Viewer) and ITM trace for profiling (on supported MCUs).
9. Avoid Expensive Operations
Operation | Faster Alternative |
---|---|
pow(x, 2) | x * x |
Division / | Bit-shift (for powers of 2) |
% (modulo) | Bitmask (for powers of 2) |
Summary
Technique | Benefit |
---|---|
Use -O2 or -O3 compiler flags | Basic speedup via compiler |
Optimize algorithms | Huge speed gain |
Replace float with int | Much faster on MCUs |
Use DMA for data movement | Frees CPU cycles |
Avoid malloc in real-time code | Reduces fragmentation |
Profile and time-critical code | Targets bottlenecks |
评论
发表评论