Rant incoming.
Back in my uni days, I was a professor’s assistant teaching practical courses. A student came by to ask for a second power supply (each desk had one). Instead of giving him one which would be the easy thing, I asked why he needed it.
“I need split rail supplies” he answered.
Round 2. Why do you need split power rails?
“I need to supply an opamp”
(Yes, I know you can do that just fine with single supply. But I don’t think we’re there yet.)
Round 3… Why do you need an opamp?
“I want to light up an LED.”
Right.
Obviously, he walked away with the transistor he needed rather than the supply he asked for. Probably not happy with that annoying professor’s assistant and his pesky questioning. That was a lesson for him and for me: don’t expect gratitude even when you try to help people.
Fast-forward 20 years. Hello, grey hairs. Goodbye, slim waist.
What is better for [insert algorithm], a PC or a DSP?
The term DSP gets thrown around very easily because the world has gone digital and everything needs processing. We live in such a gluttony of resources that I’m sitting here with an 8-core i7 machine running 3.6 GHz just so that I can stare at a cursor blinking 0.5 Hz and enter some text. How this is legal still baffles me. And somehow we have even convinced ourselves this is normal. In fact, we have come to expect it and this is why people who work on embedded systems (like me) are running into problems we never expected – unreasonably unreasonable requirements.
“So, why use a puny SigmaDSP, Blackfin, Sharc, Cortex M4/FPU, XMOS, TMS320, etc etc when you have a great PC standing on your desk?”
PC | embedded DSP | |
Raw clock cycles | several GHz | 1 – 500 MHz |
Cores | 1-32 | 1-2 with dedicated peripherals for specialized functions |
Memory | cheap and plenty | expensive and limited |
Programming | almost too easy Design abstracting tools such as Pure data, MATLAB (R)(TM)(WTF), python widely available | at least knowledge of algorithmic optimization required for optimal results knowledge of software layers, C coding for bleeding edge intimate knowledge of target processor and machine assembly coding. |
Timing accuracy and latency | not much below 5 ms, results vary with hardware and OS load | guaranteed sample accurate, <1 ms, down to a few ns on specialized processors |
Code density (aka MIPS per MHz) | mostly NOPs | medium on a framework / HAL high when bare metal C insane when assembly optimized |
Code power (aka MIPS per mW) | ~1 GFlops for 100 W = 10 MFLops/W | 450 MFlops/W (ADSP-SC589 excluding HW accelerators) 1893 DMIPS/W (STM32F446) |
Code cost (aka MIPS per $) | ~ 1 GFlops for $1000 = 1MFlops/$ | 900 MFlops / $36 = 25 MFlops/$ (ADSP-SC589excluding HW accelerators) 225 DMIPS / $6 = 37.5 DMIPS/$ (STM32F446) |
The table above should yield a bit of insight. Even though I’ve pulled some of these numbers out of thin air, at least it indicates to roughly order of magnitude. It appears that when you need an algorithm application optimized for cost, power and efficiency (aka a product), you are going to need a embedded system of some sort. If you want to quickly develop a proof of concept and you have a PC standing around (who doesn’t?) then that is the obvious way to go.
Now comes the inevitable. Fancy schmancy algorithm XYZ is quickly developed on a PC and demonstrated successfully to customers, product managers and perhaps worst of all, investors. And the story is sold on the byline ‘“we just need a swift porting onto a DSP to go-to-market” and that is when the consultant’s phone rings.
“Can you just quickly throw this 16,000 tap FIR running at 192 kHz on a $5 DSP? Can you have it done in two weeks? That’s what we promised our investors. “
If I haven’t hung up on you at this point, I will try to understand what you want before trying to figure out how to give you what you are asking for. Because after resolving the physical impossibilities and getting you to make up your mind whether to go over budget, spec, time planning or all of the above you still will end up with a very sub-optimal solution.
Embedded DSP designs are fundamentally different from PC algorithms.
That means, there is no porting an sich but rather a redesign. Your proof of concept is just that, and as smoothly as it runs on a quad i9-9900K, it needs 2GB of memory and 80% CPU, latency is anywhere from 200-800 ms and why on God’s green earth do you need 16,000 FIR taps at 192k to make a shelving filter at 85 Hz? The answer is without fault the same every time:
“We are doing something unique that we can’t disclose, so it has to be this way.”
OK, good luck building it yourself.
However if you want, I can give you the transistor and I will explain why that is by far a superior solution for your LED.
Here’s the takeaway: there is nothing wrong with running DSP on a PC. The thing is there anyway, easy to configure, no programming required so the creative souls can run free, sip their lattes and point-and-click something together. But when the end target is an embedded product at a certain [power, price, performance latency, project time line] constellation, you need to involve an embedded expert from the start. A non-optimized “secret sauce” algorithm, running on hardware tweaked out to the point of counting assembly instructions just to make it work, is at best sub-optimal and an affront to the hard-working DSP designer, at worst a nightmare from hell for the developers, project managers, customers, investors and in the end, the promising company. But hey- you get to keep your unique idea… all to yourself.