This is the short story of how I became an AI hacker, or how I came to realize I had become one. I have felt some people tell varieties of my story as if their own, so I thought you might want to know the original. Also because, some people in the startup space ask us to reminisce a technical feat we are proud of, I guess this is it, although it is one of the simpler things I have done as a kid.
When I first saw a computer program in a book, I was in primary school. I immediately understood what the program did, it was printing a triangle out of asterisks, I had learnt from a single example. I started programming when I was 12. I first learnt BASIC variants very well, and even developed sophisticated applications that included a UI toolkit of my own (It had pretty much everything!). By 16, I was able to write optimized assembly on my trusty Amiga 500, then Amiga 1200. I was in love with AmigaOS then, and I was coding demo effects. I didn’t have any computer graphics textbooks, or anything that explained how to make advanced computer graphics. I had only access to an OS and processor manual. I did not even know that demo programmers used CS textbooks to develop their programs. I learnt assembly by hacking around, and figuring out how it all ties together. I learnt demo coding entirely on my own, occasionally asking some remote friends how they went about this and that, but I learnt about the hardware architecture and how to write at the machine level, sending instructions to co-processors and optimizing assembly entirely on my own. The constraints of the Amiga architecture required the programmer to innovate constantly.
I wrote an interrupt-driven single-tasking OS and a 3D engine all by myself. I did take a look at how others structured their demos, but I didn’t try to reverse engineer them, although at one point, I could convert them reliably to source code — even if it were a little opaque. I marveled at how optimized some of the code the best demo coders had written were, they were achieving quite amazing feats, although most demo effects were straightforward, some of it was excellently optimized 3D code, and things that are even a little like scientific computing, advanced stuff like ray tracing and so forth, all on the humble MC 680000 and 68020. I was in love with the 68020 instruction set, those processors had a true CISC instruction set, and you could write very complicated code that expresses imperative code compactly. I knew how to use the Blitter, but I did not know how to make animated 3D graphics with filled faces at first. I had to figure out the entire rendering pipeline by myself. I only knew some trigonometrics from my high school classes, but that was 2D; I had to extend the 2D operations by myself to 3D, and it somehow worked. It was a little slower than the usual affine transformations you can find in a graphics textbook, but it did work. I did not know how to do the final projection though, I think I had already figured out how to do culling for my model — I was working on a cube model of course. I imagined how perspective would work. And suddenly the idea of dividing by z coordinate occurred to me. I wondered if that was the right way to solve this problem. I had made a mathematical induction, I knew, but would it work? I set out to prove that it did, and could easily prove it was the right approach by applying the geometric model of a screen. Then, I quickly coded it and in a few hours I had my first rotating cube. This insight had baffled me, how was it that my brain was capable of inferring the formula so quickly? I understood that I had intuitively inferred a (simple) formula that fit what I saw. There, I had glimpsed the true power of inductive inference, and I knew this to be the basis of human intelligence at once. That would be exactly the kind of machine intelligence I would set out to demonstrate much later in my career: single-shot universal learning, I knew I had to reproduce this level of intelligence in a machine. Of course, I did not know of quaternions and many other things about 3D graphics, and would not know until I took a proper graphics course at the computer science department. Nevertheless, the thrill of accomplishing this on my own was so much better than taking a course. Later, I upgraded that code to display a 96-face “glenz”, a regular geometric shape with transparent faces, this was considered somewhat of an accomplishment on the Amiga.
I remember lying down on the bed and imagining things after this incident. How would AI work? I had very few answers, but I knew it was important. I tried to imagine how a parallel computer would work. I imagined a 3D grid of processors, and I could understand they could communicate by sending messages among themselves. But how would they synchronize? What kind of an OS would work on this machine? These seemed like insurmountable problems, and I did not know enough to tackle them. I was both excited and frustrated. Later on at the university, these two problems would shape my specialization focus, as I went on to study parallel computing and AI/data mining.
The next challenging project I undertook was making a zoomer rotator in HAM mode where you could show 4096 colors simultaneously, that’s a somewhat difficult kind of real-time coding as you have to watch out to program the graphics co-processor Copper correctly in real-time. I wrote the most ambitious loop of my demo programming career then. It had a lot of tricks, such as loop unrolling, instruction scheduling, register optimizations, look-up tables, vector alignment, and so forth, all this so that you can zoom and rotate hypnotically for a demo effect. I made it after days of coding non-stop. I could never seem to get the required efficiency, and I was mad at myself, after fixing alignment and memory access patterns a little, however, it did work, and wonderfully so.
Later on, in the parallel scientific computing course, I would get to learn that FORTRAN compilers performed all of the optimizations we performed by hand automatically. I was quite surprised that I had attained the program optimization understanding of a graduate-level computer science course at the top CS department in my country, when I was only 16.
By the time I entered the computer science department, I already knew Basic, and both Amiga and C64 assembly perfectly. I could also understand and write C/C++ code, which means I was already a better programmer than most CS graduates. Of course, I never told any of this to my possibly autistic peers and instructors at the university, I knew they would not believe and/or try to mob on me. But I thought I should share this with you, especially with investors on my feed, so that you might appreciate the kind of love and dedication that makes one a true hacker. I try not to dismiss people who can only write web programs, of course, but to my mind, someone who can’t write advanced code in assembly, does not even qualify as a programmer. That stuff is like pop music, and hackers write rockstar stuff, the rumors are true, great programmers must have a programming experience that exceeds degree requirements tenfold, and they are so much better because they have evolved these sophisticated machine models in their brains for so many years, that’s why we can write much more complex code and make it beautiful and super fast at the same time. It’s a rare ability, but it does exist. That’s how I could easily master 17+ programming languages, and write advanced code on many OS’s and platforms with no difficulty, I was even able to design a few PL’s myself.
About the AI part, I later got to take classes from a philosophy genius called Varol Akman at our university department. I later met Marvin Minsky on the internet, and founded a mailing list called ai-philosophy with him because I was so interested in the philosophical aspects. I knew we could not code a human-level AI straight away, but we could inquire very deeply. Marvin had written great books, and those taught us a lot about what capabilities a true AI would have to have. I quickly concluded that only a program that can rewrite itself can have real intelligence, I remember asserting that at a SIGART meeting, and I began recognizing that ability in my intelligence that allows me to invent optimal algorithms. From then on, I would always pay close attention to how I am accomplishing intelligence-intensive tasks, this allowed me to formulate cognitive functions required by a human-level AI. My early experience with programming and induction showed me clearly what I had to work on: Solomonoff’s universal induction theory, and the associated field algorithmic information theory after I explored the space of hypothetical solutions satisfactorily. I even convinced Minsky that this was the right path of research at one point I believe, who had initially dismissed this line of research as infeasible. Unbeknownst to my advisors who felt they had to give me boring, arduous, and mostly unnecessary research tasks, I began venturing into the vast theoretical space of cognitive sciences. I read everything related that I could find in the library and on the internet that relates to philosophy of mind, artificial intelligence, linguistics, neuroscience, psychology, as it relates to the great challenge of AI which is building a human-level AI system. This concentrated autodidactic effort supported by the machine learning and data mining related courses I took, the algorithm design experience I mastered through developing state-of-the-art parallel code, computer architecture experience I garnered through collaborating with one of the greatest microarchitecture experts in my country, and the official stamp-making data mining research I conducted at the department led to my infinitely more interesting work on AI. Along the way, I even met both Marvin Minsky and Ray Solomonoff in person, and it was most certainly the inspiration I got from Marvin’s mind-opening books, and Solomonoff’s suggestions for future research that helped my research path become clear.
That’s briefly how I became an AI hacker. And that is why I have always been interested in AI methods to synthesize programs. I would like to show that my machine learning models are capable of the kind of quick learning that I had myself experienced when I was younger, acquiring a complex programmatic representation from a single or very few examples, which I have claimed to be the core distinguishing feature of human-level AI in my publications. I have had other “AI epiphany” cases where I subjectively experienced the inner workings of even more interesting intelligence phenomena, as well, and I will explain their impact on my theoretical development in due time, however, I believe my learning of program semantics from a single input/output example and the inference of simple perspective projection are interesting enough to merit mention.