Dan Gardner's Blog

Dan Gardner's Blog RSS

C Synthesis is actually quite easy, comparatively

August 1st, 2009, by | Permalink | 3 Comments

For those of you ASIC and FPGA hardware designers who have been dismissing C++ as a hardware description language (HDL), this blog is for you.  I’ve been thinking back to some of my first hardware design projects to prove to myself that it is really much easier to learn C++ for hardware design than it was to figure out VHDL or Verilog for the first time.

Maybe, it is all the gray hair showing my decade and a half of experience in hardware design, but I don’t think so.  Coming out of college in the early 90′s, things were in flux in hardware design.  Many designers were still using schematic entry while some were using special languages like ABEL and Palasm for CPLDs and early FPGAs.  Cutting-edge ASIC designers moved to VHDL or Verilog depending on their geographic location or end application requirements.  VHDL gave users more control of the hardware description by being a stricter language while Verilog allowed more flexibility and higher level of abstraction, often being compared to C.  I didn’t pick sides in this language battle and just learned both.

Doing stuff with schematic entry was really painful for someone like me that has trouble drawing stick figures, much less hooking up complex circuits from low level primitives.  I tended to think of things top down in terms of functionality.  My interests growing up may have influenced this approach.  As a kid, I didn’t play around with circuit boards, except for learning how to solder one summer in my dad’s physics lab.  I grew up wanting to write games for my TRS-80 Color computer, so I learned BASIC.  Later, I learned Turbo PASCAL while getting my BSEE at UCSD, plus a little Modula from the Math department.  Somewhere down the line, I had to learn TCL and Perl for scripting. C/C++ followed naturally from all these other things.  I didn’t use much C/C++ in college, but I did later when doing embedded stuff at Lattice, as well as some console applications.

When I started my first projects for PALs, GALs and CPLDs, I struggled mightily with the tight constraints of thinking in low level primitives and library macros.  As I mentioned, I didn’t growing up thinking in pictures, so I quickly moved to ABEL.  It was like programming except this darn tendency of hardware to run in parallel.  By the mid-90s, VHDL synthesis had broken in CPLDs and FPGAs.  It allowed a much higher level of abstraction, but it was really tough to figure what you were going to get from synthesis tools.  A ton of stuff in the language wasn’t supported for synthesis, or had different effects in different tools.  You really had to experiment, follow examples or comb through the RTL schematic viewers to figure out what the hardware was going to do.  Simulation tools were a rarity for most of us, plus there was all that extra work of writing a test bench and getting another entire tool flow to work.

The growth of ASIC designs has been slowed by the problems with verification.  Mistakes can kill start ups or careers.  The growth of FPGAs has been limited by having to learn VHDL or Verilog, where disciplines like physicists or algorithm researchers find the programming too difficult to endure the pain to get access to a fast, programmable hardware platform and just live with slow, general purpose processors running software.  On the other hand, it is really easy to hack stuff together in C code to make things functional at the software level with MSVC++ or GNU gcc.  The early knocks on high-level synthesis (HLS) tools were that they couldn’t match hand coded RTL results or couldn’t synthesize the entire chip.  John Cooley has recently found many engineers are interested in other engineers using HLS in real designs.  Recent announcements show how HLS tools are maturing in a move towards full chip synthesis while maintaining the flexibility of coding in C. 

While learning a new language has its challenges, C is not a new language for most of us.  Applying it to hardware requires learning some new tricks.  The restrictions for synthesis are much less than between the full VHDL or Verilog languages and the register transfer level (RTL) specifications supported by synthesis.  Basically, memory requirements need to be statically determinable (no malloc or new statements allowed) and modeling concurrency in a sequential language requires some extra care.  Still, if you can get the basic functionality working in C++, then HLS is a step of refining the C code to build more efficient hardware.  In my experience, debugging the functionality in a C debugger is so much easier than in VHDL or Verilog simulation, although the HDL simulators are really useful to check that the hardware is working the way you expect.

To continue on the topic from my previous blog posts; I wanted to spend some more time on my experiment with serial communication.  Getting the transmitter to send a static character string had proven to be pretty easy last time, but implementing the receive and transmit functions together had to tougher.  Didn’t it?  It was a bit, but not nearly as bad as I had imagined it would be.  I still don’t have a full functioning UART, but it does work as a terminal now.  My goal was to echo back characters typed in HyperTerminal from my Altera NIOS II development system used in my previous blog examples (pictured below) at 115,000 baud.

Altera NIOS II board

I kept things pretty simple.  For the receive side, I waited for the start bit (receiving a 0 on RXD).  Then, I’d delay for half a baud rate to get to middle of transmission cycle.  Next, I’d read each of the eight data bits and pack them into a byte.  Once the stop bit was read, I’d transmit the character.  I left off any error checking on RXD as an exercise for anyone with more time on their hands than me.

My top-level design is just:

#pragma hls_design top
void uart(bool *txd, bool *rxd)
{
 static unsigned char rcv_byte; //8-bit storage of incoming character, with a copy sent to transmit once complete.
 static bool byte_rcvd=false; //shared variable between transmit and receive

 get_byte(rxd, &rcv_byte, &byte_rcvd);
 send_byte(txd, rcv_byte, &byte_rcvd);

 return;
}

The static declaration designates that the values will be stored across calls to the function, resulting in registers.  Initially, I had a conditional call to send_byte() at the top level based on byte_rcvd, but I found pushing the conditional execution inside the send_byte() function call gave me smaller area and more flexibility to build hardware where the receive and transmit functions can run sequentially or in parallel, depending on constraints in the synthesis tool.

Notice that the byte_rcvd flag is passed by reference to both, so only the top-level shared variable is created since it is only written by get_byte() and read by send_byte().  On the other hand, rcv_byte is passed by reference to get_byte() while it is passed by value to send_byte(), so send_byte() creates a local copy that it shifts out a bit at a time.

I reused my constant transmit function from my previous design as my test bench to send down “Hello World” to the top-level function I planned to synthesize.  I verified the functionality that the test bench was sending down the bits of each character, then getting them back with some well placed printf statements and running things in the MSVC++ debugger (picture below)

bits_in_debugger

By setting the top-level pipeline to II=1 in Catapult’s constraint editor, I evaluate things every clock cycle, which gives me the proper baud delay count I calculated for 115,200 baud from a 50MHz clock (434). After generating the RTL, I use Catapult’s SCVerify flow to check that the timing looks right.  I launch Precision RTL synthesis in batch from Catapult, and then run Altera’s Quartus II to generate the FPGA programming file.

After programming the FPGA, I launch HyperTerminal and start typing away.  Imagine that, the characters echo back to the screen as fast as I can type (picture below).

easy

For more stuff on Catapult C Synthesis, go to our product page to grab the datasheet  or view more videos.

If you made it this far, take a few seconds to drop me a comment.

Thanks,
Dan

Tags: , , ,

Hardware Engineers need to Communicate

July 22nd, 2009, by | Permalink | 1 Comment

Again, I found myself staring out the window, pondering my sophomoric attempt at a blog post.  Summer had finally arrived in Oregon with temperatures in the mid-90s.  Children from the onsite daycare wandered by looking really hot.  My own kids are spending the week in a day camp at a local park, having a really good time splashing in the water fountains and playing with other kids their age.  I had better make this good to keep all of you interested.

While my first blog attempt about blinking an LED on a demo board seemed to generate some amount of interest from a few of you, I’d like to move on to the next step in explaining how to use C/C++ for hardware design.  This may be scary for some of you since engineers are notorious introverts.  We’d rather send an email than walk to the office two doors down the hall and actually talk to someone.  Okay, maybe that is just me, but I’ve got a lot of stuff to do before I can go home and hang with the wife and kids.  Who has time for all this chit-chat stuff?  Okay, I know your time is valuable, so I’m trying to make this informative and at least a little entertaining, so drop me a comment and let me know what you think.  What do you want me to ponder next?

Back to what I was meaning to say in the first place, once you can blink an LED, then you want to make sure you can communicate to the outside world.  Unless you are going to communicate via Morse code with an LED, a simple serial interface is the easiest interface to build.  Of course, much more complicated interfaces like USB, Bluetooth, PCI and AMBA are more common now, but I’m only doing this in my spare time, so give me a break.

The Altera NIOS II demo board I used in the last blog post had a RS-232 serial port interface to the FPGA, including level shifting buffers, so I could plug right into my laptop’s COM1 port.  I wonder how hard it would be to send “Hello World” from the board to my PC through a standard serial interface.  It’s been while since I’ve played around with this stuff, so I leaned on Wikipedia to refresh what I needed to design.  I remembered that a standard protocol back in the days of dialup modems was 115,200 BAUD with 8 data bits and 1 stop bit, but I couldn’t remember what that waveform actually was supposed to look like – Wikipedia to the rescue.

Most embedded microcontrollers and microprocessors include a standard, built-in universal asynchronous receiver/transmitter (UART), but I remembered there was always something tricky about picking the correct oscillator value for an 8051 to get the exact baud rates you wanted.  Since I’m building a custom circuit with any clock divider resolution I need, this didn’t turn out to be a problem.  Once I figured out that the baud rate was just the time interval between each bit transmission, I was well on my way.  With a 50MHz clock, I just need to build a counter approximately equal to 50e6/115200.  Using a clock divider similar to the blinking LED example in my last blog post, I just counted to 434 clocks between bit transmissions.

My aging memory banks also remembered something about a master clock needing to be 16x the transmission rate.  It has to do with the asynchronous part of receiving a serial communication stream, so I’ll worry about that later if my loyal readers want to see me receive input as well as dish it out.  The transmission protocol is pretty simple.  A start bit signals the start of a transmission, followed by 8-bits of data and terminated with a stop bit.  The start bit is active low or a logic 0, while the stop bit is a logic 1.  To test whether I was on the right track, I just toggled the TXD signal every 434 clock cycles and watched what came across the HyperTerminal screen in Windows.  I received a whole bunch of “U” characters.  Looking up the ASCII code in Wikipedia, I confirmed that was what I was expecting.  The first logic 0 was the start bit, then a pattern “10101010″, followed by a logic 1 as the stop bit.  I might be able to pull this off.

At this point, I had little more than a blinking LED transmitting over the serial port.  Thinking back to the previous blog post title, I decided the message should be “Hello World” back to the PC.  To send an arbitrary character, I needed to send a start bit, followed by the 8-bit binary value for ASCII character, and then a stop bit.  My constant send buffer is basically the characters in “Hello World” followed by a new line and carriage return to make the HyperTerminal display look pretty.  In VHDL or Verilog, I’d have to create a look up table to convert characters to ASCII, but in C it is as simple as the following:

unsigned const char send_buffer[BUF_LEN]={‘H’, ‘e’, ‘l’, ‘l’, ‘o’, ‘ ‘, ‘W’, ‘o’, ‘r’, ‘l’, ‘d’, ‘\n’, ‘\r’};

To send the bits from each character, I had to mask off the least significant bit and then shift after sending each bit until all eight were sent.  Again, in C, this is pretty easy:

*txd=(byte & 0×01);
byte = byte >> 1;

Running the debugger in MSVC++ seemed to be giving me what I wanted (figure 1), so I took the code through Catapult C Synthesis.  I was a little worried that I had the timing correct, so I used Catapult’s SCVerify flow to run the RTL using the same C test bench I used in MSVC++.  I received confirmation that the RTL and C matched, plus a quick look at Questa’s waveform convinced me I was ready to try the real thing (figure 2).

Figure 1: Printf debug

Figure 1: Printf debug

Figure 2: Questa waveform

Figure 2: Questa waveform

I just needed to run through Precision RTL Synthesis and Quartus II to get the bitstream for the Altera Stratix II device on the board.  What do you know, it actually worked (figure 3). 

Figure 3: Proof in HyperTerminal

Figure 3: Proof in HyperTerminal

Just to prove it, I took another video (http://www.youtube.com/watch?v=9K8X__I2O2A).  If you want to check out the full C code for this circuit and the test bench, go to the public ESL Communities discussion forum thread.

For more stuff on Catapult C Synthesis, go to our product page to grab the datasheet  or view more videos.

If you made it this far, take a few seconds to drop me a comment.

Thanks,
Dan

Tags: , ,

“Hello World” equivalent for hardware engineers

July 8th, 2009, by | Permalink | 12 Comments

I was staring out my office window at some towering Douglas fir trees, trying to figure out how to introduce non-DSP hardware engineers to how cool C synthesis is.  Okay, I did say engineers since a majority of people would not find anything to do with C or synthesis cool.  I’m sure I’d get a lot more Google hits on my blog if I talked about the Dallas Cowboys or the Portland Trailblazers, my big interests outside of work and family.

Anyway, back to my dilemma.  The first adopters of C synthesis have been mostly DSP gurus developing wireless or video hardware, so you see a lot of FIR filter demos that show how you can get lots of different implementations from the same C code simply by adjusting the loop constraints for latency and throughput.  What about someone coming from a pure RTL background that is writing non-DSP ASIC or FPGA hardware.  How can I show them something quick and easy that anyone can try?

I drifted back to when I was first starting at Lattice Semiconductor in the mid-90s.  VHDL was just making its way into programmable logic.  Schematics were losing out to ABEL for the simple PLDs, but the CPLD and FPGA devices needed something more abstract.  Yeah, I know, I’m dating myself.  Anyway, our basic training for new field application engineers was based around building a clock dividing counter and driving some seven segment LEDs on a demo board.  There, I have it.  The first test of a hardware synthesis tool is to make some LEDs blink on a demo board.  This would be the equivalent of writing C code to print out “Hello World” for the first time coder.

As a second check, one of my standard questions during job interviews to get a baseline for someone’s hardware knowledge was to have them design a simple binary up counter with asynchronous reset.  That’s about as basic a hardware problem as you can get, but it uncovered whether someone understood synthesis or just the language semantics.  Of course, if you didn’t write VHDL everyday, remembering the library use statements with the standard IEEE synthesis libraries was hard to fake.  Modeling the reset and clock with the proper sensitivity list was pretty straightforward.  However, the counter registers were read and written, so the common mistake of declaring them as outputs on the interface caused an error unless you used internal signals.  I expected them to write something like:

LIBRARY ieee;
USE ieee.std_logic_1164.all;
USE ieee.std_logic_unsigned.all;

ENTITY cnt IS
        PORT (
                clk     : IN            STD_LOGIC;
                rst     : IN            STD_LOGIC;
                qout : OUT      STD_LOGIC);
END cnt;

ARCHITECTURE rtl OF cnt IS
        SIGNAL  cnt_reg : STD_LOGIC_VECTOR(15 downto 0);
BEGIN
        PROCESS (clk, rst)
        BEGIN
                IF rst = ’0′ THEN
                        cnt_reg <= (others => ’0′);
                ELSIF (rising_edge(clk)) THEN
                        cnt_reg <= cnt_reg + ’1′;
                END IF;
        END PROCESS;
                       
        qout <= cnt_reg(15);
       
END rtl;
 

Now, you can pass that part of the interview, but I digress.  Back to using C for hardware design…  I happened to have an older Altera NIOS II Development Kit with a Stratix II FPGA, some LEDs and buttons on it.  After downloading the correct reference manual, I had pinouts for the devices on the board.  You have to love the new BlackBerry Pearl with built in camera for stuff just like this blog.

Altera NIOS II board

This should be pretty easy to do.  The crystal oscillator is running at 50MHz, so I decided to blink an LED every second with a 50/50 duty cycle.  The first step is to write a C program to do this function.  Basically, I just want to toggle the LED value every 25M clock cycles to have it blink once per second (on for half a second, off for half a second).

#define MAX_COUNT 50000000
bool clk_div()
{
        static bool toggle_val=false;

        for (int i=0; i<MAX_COUNT/2; i++)
        {
                if (i==MAX_COUNT/2-1)
                        toggle_val = !toggle_val;
        }
        return toggle_val;
}

The static statement creates the register that will drive the LED.  This program will compile with any C compiler.  On Windows, I use the free Microsoft Visual C++ Express version.  On Linux, gcc is most common.

A simple testbench will show that it is working in the debugger:

#include <stdio.h>
bool clk_div();

int main()
{
        printf(“Starting clock divider\n”);
        for (int i=0; i<9; i++)
        {
            printf(“Clock value = %i\n”, clk_div());
        }
        return 0;
}

Screen capture of test

Now, I’m ready to synthesize this down to the board and see if it works as I expect.  A common question that comes up from hardware designers is “how do I control the size of the registers?”  Remember, the VHDL for the counter defined std_logic and std_logic_vector types.  In this case, the static bool toggle can obviously be implemented as a single bit.  The loop counter “i” needs to hold MAX_COUNT/2-1, so for the 50MHz clock, this is a 25-bit unsigned number.  The standard C int data type is at least 32-bits, so it can hold this number, but I can model this exactly in C and test it using the Mentor Graphics Algorithmic Datatypes, a free download from http://www.mentor.com/products/esl/high_level_synthesis/ac_datatypes.

By replacing the definition of the loop variable to a uint25, I’m guaranteed to use a 25-bit unsigned number and see that it still works.  Better yet, I can try a uint24 and see that the loop never makes it to MAX_COUNT/2-1.  I just have to include “ac_int.h” and change the for loop to:

for (uint25 i=0; i<MAX_COUNT/2; i++)

To synthesize this design, I bring up Catapult C Synthesis, add the input file to the project, set up the clock at 50MHz, enable active low asynchronous reset and set the technology to an Altera EP2S60F672C-3.  To tell Catapult that I want this design to be free running, I pipeline the top-level design with an initiation interval of 1 (II=1).  Then, I generate RTL (VHDL or Verilog).  After running through Precision RTL and Altera Quartus, I have a bitstream I can program onto the board.

Voila – I have a blinking LED and reset from a push button.  If you have QuickTime, you can watch the 8 second video I took, again with my BlackBerry, of my board blinking (blinking_led.3GP).  Unfortunately, I can’t seem to include the video as an attachment in this post.  Here’s link to it on YouTube, http://www.youtube.com/watch?v=mNKEvC8P3Zc.

Thanks for reading.  Comments and questions welcome.

Dan

Tags: , ,

ESL has been a key marketing term for around a decade, but Catapult C Synthesis has emerged as a real ESL product for hardware engineers designing today's cutting-edge SOCs by implementing from C++ to RTL for ASIC or FPGA platforms. I want to talk about high-level synthesis and all the cool things you do with it from a hardware engineering perspective.