There's no performance difference between using SDL hardware surfaces and software surfaces!
You are probably not getting hardware surfaces. Check to see if the surfaces you are creating actually have the SDL_HWSURFACE flag set the surface flags after they are created. Note that most of the time you need to run your program in fullscreen mode to get hardware surfaces.
You could also try using software surfaces. It's counter-intuitive, but if you're directly accessing the pixels on a surface or the screen, it's actually faster to do all your work in system memory and then send the final result to the screen than it is to push individual pixels over the system bus to the video card. This is especially important when doing alpha blending because very few drivers expose 2D alpha blending in hardware. This means that the alpha blending is performing a read operation on the destination surface, and reading from video memory is especially slow on modern graphics hardware. If this is a serious problem for you, and you know your target audience has 3D hardware, you might consider Using_OpenGL_With_SDL for 3D acceleration.
An excellent article by Bob Pendleton covering SDL hardware surfaces is available at the O'Reilly Network