All in all, the project took ten months to complete for the first usable version, with more or less daily work on it. Of course, the work is ongoing. It was by far not “just” taking the baseline version and re-compile it with the ARM compiler. So here are some of the necessary efforts.
Since writing an engine from scratch was out of question, there had to be a baseline – open source, strong play, but still manageable resource requirements. E.g. the chess engine Crafty is way above what could be ported to the target, but NG-Play qualified.
The initial static RAM usage was about 400 kB, plus dynamically allocated memory for the hash tables, and plus the stack. The available RAM on the target platform is 192 kB, which was in reach, but it required a lot of work to get there.
Due to the recursive search algorithm, the stack size was a major point. Initially, it needed 35 kB for 12 plies search depth, which I got down to 26 kB for 20 plies.
Replacing the dynamic memory allocation with static allocation was easy enough. For embedded systems, malloc is a no-go. The RAM on the target platform is not continuous, but split into two areas of 64 kB and 128 kB. That required carefully placing the variables and the stack.
None of the configuration options was there, and the timekeeping only existed as time per move.
The user interface had to be written from scratch, including dialogue and menu system. The position editor was a particularly laborious module because it has to deal with a lot of user input.
Quite some bugs and quirks had to be hunted down and fixed in the original source text.
Considerable parts of the software were rewritten, e.g. the static evaluation function and the whole opening book handling.
The target platform does not have a file system, and that required eliminating the file operations for the opening book and replacing them with an initialised binary array, included as a C file.
The opening book grew from a 1,000 moves to more than 22,000 moves, all of them entered manually, checked with a professional chess program and judged whether they would suit the playing style of the CT800.
Since the opening book was line based, but the CT800 should also recognise transpositions, the opening book compiler became necessary to do this conversion.
The software had grown to a point where some code refactoring was inevitable with respect to future maintainability, especially in case other people might want to take up the project. Originally, everything was in one single source file of 7,000 lines. Now it is 15 C source files with 33,000 lines.
On all the software, I constantly performed reviews for finding possible bugs or design shortcomings. I do not trust fresh source code, not even my own one.
Besides code reviews, bottom-up tests helped to prepare the integration tests; since the scope of the low-level tests was more limited, potential bugs were easier to locate. The integration tests were about possible incorrect interaction of correct modules.
CppCheck was used for static source code analysis. Besides, all GCC compiler warnings were enabled because source code should always compile without any warnings.
Coverity Scan provides a sophisticated service for static source code analysis. It is free of charge for open source projects, so the choice was easy.
The sanitiser features of the GCC compiler were useful for runtime code checking of the application part – not only the address sanitiser, but also the one for undefined behaviour. Unfortunately, Cygwin does not yet support this feature, but booting up a Linux live distro is easy enough these days.
Hundreds of test games had to be done manually, including constant enhancements. It turned out as a good idea to first get the application ready, including the RAM reduction, getting to a stable and feature complete version – and only then going for the ARM port.
This way, I had the nice debug possibilities of the PC during the complex changes. Mostly, I used the renowned “printf-debugger”; but a few times, the command line operated GNU debugger (GDB) was very helpful. Besides, I could be pretty sure that problems on the target platform would be mainly located in the driver layer.
Concerning debugging, I think this is quite comfortable because it is not uncommon for the embedded world to only have an unused pin where you can solder some LED to, and that is the whole debugger. In this project, I even had the possibility of doing cycle and stack counting on the target while displaying the result in a dialogue box. That felt like pure luxury.
The hardware driver for the ARM version had to be written, mainly as per the datasheets.
The hardware setup itself had to be carefully designed since the Olimex H405 board was not the only component. While I am better at software, I still can work with hardware datasheets. Next, the actual electrical and mechanical buildup also took quite some time.
Originally, the engine was running under Windows using the Winboard protocol. After the CT800 was running, I backported the engine first to Windows, then also to Linux and Android, plus that the UCI protocol replaced the Winboard one. Testing became even easier.
Code comments are nice for understanding the details, but summary documentation helps to get the bigger picture of a project. The chess application and the opening book compiler have top-level flowcharts, and the hardware documentation explains every component in detail. Besides, the user manual shows how to actually use the CT800.
Last but not least, this website took also some time, despite its minimalism. Speed has been a primary objective; this website is just as much of a hand-coded “high performance hack” as the CT800 that it is about. It works at various resolutions and takes accessibility into account.