This post is in reference to Celestial Intellect Cybernetics's AGI (Artificial General Intelligence) Platform called Examachine. Examachine's core is a "parallel AGI kernel" which presents a generic programming API implementing parallel incremental general-purpose machine learning algorithms for heterogeneous supercomputers, and the platform provides automation of basic data science tasks such as regression and clustering intended for human-level performance. We call it an AGI Unification Architecture, and it constitutes a fresh and relatively new approach to AI. AGI Unification research is an attempt to unify many disparate aspects of AGI as explained by Alexey Potapov in AGI-2017, and our approach is to build an architecture that uses common building blocks and integrates universal induction, deep learning, and data mining approaches in the Examachine research program, the roadmap in the linked blog entry summarizes the future releases of the platform.
I had initially estimated the required code size to be around 1 million physical source lines (SLOC) of OCaml code for the entire platform, however, it seems that impressive results can be obtained with far fewer lines. The entire Teramachine MkII platform, excluding the libraries and other (advanced) subsystems used, of which there are many and would blow the code size by 10x or more, should be around 10.000 physical lines of code. That might turn out to be quite impressive given what it achieves. It has multiple reference machines, a complete memory system and scalable parallel universal search. That feat would be impossible in C++, of course, but it is still quite interesting to me personally. Current code size of the kernel is around 6000 OCaml + 1500 C++ SLOC, which contains some redundancy; the final product will be achieved in around 10000 OCaml/C++ SLOC. This excludes any interface and application code -- a typical finance application with interfaces amounted to around 10K lines in R, python, ML code.
On the other hand, I expect this code size to increase substantially for Teramachine MkIII and Petamachine. Teramachine MkIII would go up to 20K and Petamachine could reach 50K because of more extensive support code for GPU and FPGA ports, and more reference machines and algorithms. I had estimated Petamachine to be 100K initially. In conclusion, ML might be even more code efficient than I initially projected. Equivalent C++ code would be almost impossible to write and maintain; C++ is terrible for generic programming or any kind of advanced symbolic processing and/or advanced data structures and algorithms anyhow. I suspect only Haskell would be comparably effective for writing code such as this, but then the performance would not be so great. OCaml was certainly the right choice for the job. In later posts about this commercial project, I will try to summarize the programming lessons learned from developing sophisticated generic algorithms with OCaml. If you would excuse my enthusiasm, welcome to the future where choosing OCaml is not merely a matter of preference.