Monday, October 18, 2010

Reduce NRE, save big bucks

Hardware designers will shriek when you propose adding processors just to accelerate the software schedule. Though they know transistors have little or no cost, the EE zeitgeist is to always minimize the bill of materials. Yet since the dawn of the microprocessor age, it has been routine to add parts just to simplify the code. No one today would consider building a software UART, though it's quite easy to do and wasn't terribly uncommon decades ago. Implement asynchronous serial I/O in code and the structure of the entire program revolves around the software UART's peculiar timing requirements. Consequently, the code becomes a nightmare. So today we add a hardware UART without complaint. The same can be said about timers, pulse-width modulators, and more. The hardware and software interact as a synergistic whole orchestrated by smart designers who optimize both product and engineering costs.

Sum the hardware component prices, add labor and overhead, and you still haven't properly accounted for the product's cost. Non-recurring engineering (NRE) is just as important as the price of the printed circuit board. Detroit understands this. It can cost more than $2 billion to design and build the tooling for a new car. Sell one million units and the consumer must pay $2,000 above the component costs to amortize those NRE bills.

Similarly, when we in the embedded world save NRE dollars by delivering products faster, we reduce the system's recurring cost. Everyone wins.

Sure, there's a cost to adding processors, especially when doing so means populating more chips on the printed circuit board. But transistors are particularly cheap inside of an ASIC. A full 32-bit CPU can consume as little as 20,000 to 30,000 gates. Interestingly, users of configurable processor cores average about six 32-bit processors per ASIC, with at least one customer's chip using more than 180 processors! So if time to market is really important to your company (and when isn't it?), if the code naturally partitions well, and if CPUs are truly cheap, what happens when you break all the rules and add lots of processors? Using the COCOMO numbers a one million LoC program divided over 100 CPUs can be completed five times faster than using the traditional monolithic approach, at about one-fifth the cost.

Extreme partitioning is the silver bullet to solving the software productivity crisis.

Intel's pursuit of massive performance gains has given the industry a notion that processors are big, ungainly, transistor-consuming, heat-dissipating chips. That's simply not true; the idea only holds water in the strange byways of the desktop world. An embedded 32-bitter is only about an order of magnitude larger than the 8080, the first really useful processor introduced in 1974.

Obviously, not all projects partition as cleanly as those described here. Some do better, when handling lots of fast data in a uniform manner. The same code might live on many CPUs. Others aren't so orthogonal. But only a very few systems fail to benefit from clever partitioning.

0 comments:

Post a Comment

free counters
Twitter Delicious Facebook Digg Stumbleupon Favorites More

 
Design by Free WordPress Themes | Bloggerized by Lasantha - Premium Blogger Themes | Powerade Coupons