EA1DDO, HK1H, M0HAO, Dubus, amateur radio, ham equipment, ham radio, ham radio antennas, ham radio equipment, ham radio receivers, radio equipment, radioaficion, antenas, antena cubica, quad antenna, yaesu, ft dx9000, drake, tr4, tr7

Máximo Martín
QTH Locator IN53ui
EA1DDO@HoTMaiL.com

Esta es una copia de la página original ya no disponible

The fastest multithreaded NEC engine
for antenna modeling apps

Welcome to my personal webpage. This is home of the NEC/MP, a heavily modified, refactored and hand optimized multithreaded version of NEC (Numerical Electromagnetics Code). This ultra-fast engine is available in binary executables form only, compatible with Microsoft Windows and x86/x64 class processors. It comes in two license flavors. A totally free downloadable version based on NEC-2 algorithm (NEC2/MP) and a commercial one based on NEC-4 (NEC4/MP). A license must be purchased from the NEC-4 copyright holder before ordering NEC4/MP.

Project background

The Numerical Electromagnetics Code (NEC) is a computer program for analyzing the electromagnetic responses of wire antennas or any arbitrary structure consisting of wires and surfaces, in free space or over a ground plane. NEC second edition (NEC2) Fortran source code developed by G. Burke and A. Poggio at Lawrence Livermore National Lab on early eighties has been declared public domain ( no-license) and is freely available.

Many commercial and free antenna modeling applications use NEC2 original or slightly modified sources as their analysis and computational engine. Developers all around the world have ported NEC2 ancient source code to modern Fortran dialects or other computer languages. However, the most widely used form of NEC2 remains the double precision version of the original Fortran 77 source code.

I used this same source code as the basis for creating one of the fastest NEC2D engines ever made. Actually, I did that with NEC4 in mind. I'm using NEC4 at my work lab and I wanted to speed things up. You know, even though NEC4 and NEC2 conceptually are great algorithms, code performance is a different and sad story.

My first move was to choose the right tools for the job. All I wanted was some really good native Windows Fortran compiler able to create highly optimized code and standard Intel object linkables. GNU slow and buggy compilers were, of course, out of the question so I picked the excellent Intel XE Fortran compiler and Windows Platform SDK.

It took me several weeks just to convert from fixed-form F77 to free-form F90/F2xxx, clean up the messy code and have it compile properly under Intel compiler. Next step was to refactor loops and branches so they can be vectorized, and that was actually the hardest part. I respect G.Burke as a scientist but his coding style is absolutely atrocious.

Somewhere at this point I decided to make my life harder. As I mentioned, I'm using NEC4 - Matlab combination at work but NEC2 is more than enough for my hobby home projects. Especially when combined with the excellent (and free) modeling front-end application 4NEC2. Arie Voors did really great job developing this application. Yeah I know; 4NEC2 is a bit ugly, coded in ancient VB, the user interface is far from ergonomic etc. All these might be true, but 4NEC2 works fine and includes more features than its closest competitor EZNEC; actually more than even the $500 EZNEC PRO.

Unfortunately, even though 4NEC2 is very fast it suffers from subpar performance when running antenna analysis through the included NEC2DXSxxx engine. Running a multi-frequency sweep even with the simplest form of an antenna is taking several seconds to finish, even minutes with complex structures. By default, field patterns are computed and plotted at coarse resolution (5deg) simply because a nything better could cause an increase of processing lag up to ten times or even more.

So, even before finishing with the first steps of refactoring NEC4, I started working in parallel with NEC2 source tree too. A few weeks later I had rewritten in x86 assembly the time-consuming polynomial and cubic interpolation procedures. That alone gave NEC/MP a huge boost in performance! Next move was to use Intel's math kernel library, specifically the LAPACK matrix transformation functions in place of NEC's ones. Thanks to Intel's well-tuned BLAS and LAPACK implementations that delivered significant performance increase.

All these efforts were made with a goal in mind; to make NEC not only faster but able to properly make use of multi-threading on today's multicore processors. However, converting legacy sequential source code to SMP proved to be an extremely hard task. I had to rewrite whole parts of the code and refactor the rest to successfully achieve what I had originally in mind.

The final step was tuning the compiler's optimizations. Intel's excellent autovectorizing compilers are already great at squeezing the maximum power out of x86 processors. The only hard task was to ensure that optimization and tuning changes did not cause any undesirable side effects like performance decrease, floating point rounding issues, etc. Thankfully all went fine, and NEC/MP first version was released on late 2011.

NEC/MP is still an ongoing process, so you could expect even more performance improvements and better SMP in the future. My future plans also include a native 64bit version and a developer's edition in form of compiled dynamic-link library, but I don't have a firm schedule for these right now.

Features

As you would expect, performance is the main feature of NEC/MP. Take a look to the following performance tables to see why.

  • Performance gain up to 1000% (10x) or even more
    (Typical boost is 500-800% on a dual/four core processor)
  • True multicore - multithreaded SMP support
  • Vectorized loops across the whole program
  • Built against the highly accurate Intel Math Kernel (MKL)
    and BLAS/LAPACK matrix function library
  • Dual precision FP supporting up to 11.000 segments
    (even more in the upcoming commercial x64 editions)
  • SIMD hardware support (SSE2, SSE3, SSSE3, SSE4.1, SSE4.2 and AVX)
  • Asynchronous buffered I/O disk writes
  • Command line interface for easy integration with third party tools and end user workflow
  • Self-contained Windows executables (XP-Windows 8)
  • Fully compatible with 4NEC2 input & output file format
    (executables can replace the original NEC2DXSxxx files)
  • Fully compatible with standard NEC data files
  • FREE as free beer (NEC2/MP x86 edition)
  • Affordable NEC-2 native 64-bit edition (Due 3rd quarter 2012)
  • Commercial NEC-4 (4.1 or 4.2) versions available for purchase upon presenting proof of valid NEC-4 license ownership
  • Custom made builds upon request

Performance

Although the actual numbers vary depending on system setup and workload, a massive performance increase should be expected even using a humble dual core processor.

Performance Comparison

FAQ

Q. How to use NEC2/MP with 4NEC2?
A. Download zip archive from the link at the right column of this page and extract its contents in a temporary disk folder. If you want to replace 4NEC2 own engine with NEC/MP, copy all files from the temporary folder where you have them extracted to the \4nec2\exe folder overwriting any existed NEC2DXSxxx files. Confirm overwriting when prompted.
No further action is required. You can operate 4NEC2 as usually, enjoying the increased performance. Please don't forget to report any bugs you might encounter.

Q. How to use NEC/MP from command line?
A. Download zip archive from the link at the right column of this page and extract its contents in a temporary disk folder. Choose the executable you want to use depending on the number of maximum segments it supports. For example the NEC2DXS500.EXE supports 500 segments, NEC2DXS1K5.EXE 1500 and so on.
Run the executable from inside the command prompt using the -? command line option (eg: NEC2DXS1K5 -?). This prints usage instructions like the syntax and available options.

Q. How to call NEC/MP from a third party tool?
A. It depends. Some applications (eg. 4NEC2) use their own batch file to call and pass arguments to NEC, others allow users to set their own command line arguments that are passed to an external NEC processor. In the later case, please note that NEC/MP now supports passing file arguments from the command line using the following syntax.
NEC2DXSxxx inpfilename outfilename

Q. What about warranty?
A. NEC2/MP is licensed free of charge, provided "AS-IS" without any warranty of any kind, express or implied. You expressly agree that downloading the software and any use of it is at your own risk!

Q. Can I distribute my own application bundled with NEC2/MP?
A. Of course you can as long as you follow two simple rules: (1) You do keep NEC2/MP original copyright strings intact, and (2) you send me a free copy of your application.

Q. What about NEC4/MP?
A. NEC4/MP is not a free application! It is a commercial edition of NEC/MP based on copyrighted NEC-4 code. A license must be purchased from the NEC-4 copyright holder before ordering NEC4/MP. Please contact me for pricing and more information.

Download NEC2/MP (v1.50)

Release History:

1.50 (05-12-2012): Major rewrite of factorization code. In addition to SSE2 and SSE3 NEC4/MP now fully supports SSSE3, SSE4.1, SSE4.2 and AVX streaming instructions on multicore SIMD capable processors.
1.05 (05-12-2012): Minor performance improvement in factorization
1.04 (05-04-2012): Fixed incompatibility with older AMD processors
1.03 (05-01-2012): Close and delete output file on CTRL-C
1.02 (04-28-2012): Threading of triangular matrices
1.01 (01-12-2012): Improved factorization performance
1.00 (12-02-2011): Initial release

Contact

Please feel free to contact me if you have any questions, comments, suggestions or bugs to report.
Athan Papadimitriou (SV2HZF)