The first “Bayesian Program Learning” paper used a strange term called “Bayesian Program Merging”. A colleague sent me the link of the paper about 2-3 years ago, he had found the paper while he was searching for one of my Heuristic Algorithmic Memory papers that introduce my long-term design for universal machine learning algorithms. Those papers were published at AGI-10, AGI-11, and AGI-14 conferences. I noticed that there was a great deal of similarity between that paper and a paper I submitted and published more than 1.5 years ago before this offending paper was published on the arxiv. At first, I thought that was not a problem, because the paper seemed like a technical report, and not published anywhere. I was wrong.
I had invented Heuristic Algorithmic Memory as a response to a challenge that Ray Solomonoff personally posed to me in 2005. He had said:
We can use stochastic context-free grammar to guide the search [in a Solomonoff induction approximation]. How are we going to update it if we already have a grammar?
It was Solomonoff that recommended me to play with stochastic context-free grammars, but he did not tell me how to solve the update problem. I solved the problem during 2006-2007, and I also solved a major theoretical problem which nobody else had noticed, and still do not understand, by the way. Solomonoff himself at AGI-10 had recommended using simpler heuristic compression tools like PAQ, PPM, etc., to capture the regularities therein. However, during my brief collaboration with Solomonoff, I had already tested that idea, and seen that it unfortunately did not work. Heuristic Algorithmic Memory was a partial solution to his challenge. It used a very powerful and fast update logic that captured algorithmic regularities in the solution corpus, and updated the stochastic context free grammar which acts as a guiding probability distribution of programs. Combined with a rigorous Levin search implementation, I obtained a powerful AGI prototype called first gigamachine (sequential version) and then teramachine (parallel version). Gigamachine was tested in early 2010 as part of the research of my government-funded AGI startup, and the results affirmed what Solomonoff was looking for. We could now solve a long training sequence and measure information transfer between problems.
I submitted to the AGI conference, because that is what Solomonoff was submitting to and he recommended me to. He reviewed my paper in private, and told me in an e-mail that out of the four update algorithms I proposed, the last two were completely new. However, he sadly passed away before AGI-2010, due to an aneurism, and was unable to give further feedback to the program committee. My reviews went to others, who tried very hard to kill my paper. My paper was not killed, but it was shortened to two pages from ten, and the program chair Dr. Ben Goertzel told me to link to my 10-page submission as a webpage. I also mistakenly added my advisor as a co-author in the 2-page summary, thinking we had to do that at my PhD program — I had heard stories of people being expelled because they did not write their advisor as a co-author. I was afraid, of course. At any rate, I published at AGI-2010, despite these adversarial reviews. In AGI-2010, my work was greeted with some enthusiasm and I was allowed to present it. I knew I was going to receive adversarial reviews in AGI-2011 as well, therefore I submitted to arxiv simultaneously, to avoid plagiarism in the future. Some people from Stanford (and I guess MIT) were too interested in my work, I remember. Sometimes, your “rivals” reject your paper with a poor excuse, then scoop the work, and appropriate it themselves without giving you proper credit. I had fully anticipated that to happen.
I did not wish that to happen, however. Unfortunately, that seems to have happened with the MIT DARPA PPL project’s group members, at least some of them. The paper I found had a different, much simpler, pattern inference phase, and a transfer learning phase that was very similar to my algorithm. It was great to see the algorithm work in a simple pattern discovery setting. However, it was not great to see a transfer learning algorithm that is virtually identical to the HAM algorithms, without citation. Noah Goodman, the last author of that paper that has an uncanny similarity to my transfer learning papers had given a tutorial session at AGI-2011. Before they published their work at arxiv, he had most likely seen my paper in the proceedings. It would be nigh impossible for him to miss a paper with the same subject as his own work. It also seems rather unreasonable to suggest that they made an adequate literature survey but failed to find my paper. My paper was in the most visible venue for a study on universal induction. That is where luminaries like Marcus Hutter published, I published in exactly the right place to cement my contribution. There it was, as a notable improvement over Juergen Schmidhuber’s OOPS that fixes a fundamental shortcoming in it and adding entirely new capabilities, and taking Ray Solomonoff’s work on incremental machine learning one step further. My first paper also boasted that I had achieved one-shot learning as Solomonoff had predicted. Yet, in the end, these three authors would feign ignorance, because they cannot possibly admit that they had seen and had gotten “inspired” by my paper. But the similarities are there, and I have cryptographic timestamps for everything.
The paper titles, authors, and dates on arxiv are as follows:
Inducing Probabilistic Programs by Bayesian Program Merging
Irvin Hwang, Andreas Stuhlmüller, Noah D. Goodman
(Submitted on 25 Oct 2011)
Teraflop-scale Incremental Machine Learning
(Submitted on 5 Mar 2011)
This is quite unsettling for me. The transfer learning algorithms are virtually identical. As far as I can tell, only terminology has changed. They used terminology from PL literature instead of my (quite informative) inductive programming / inductive inference nomenclature. Their Bayesian Program Learning / Merging term is inaccurate anyhow, as all learning is Bayesian. There is no such thing as non-Bayesian learning. It is Algorithmic Memory. It solves the incremental machine learning problem, or transfer learning problem and is a model of long-term memory; it is not program learning as usual, as we often refer to the inference task by that, not the transfer learning task. I used the additional term Heuristic, because inductive systems invent heuristic solutions.
The term Bayesian Program Merging is also inaccurate, we are updating the guiding probability distribution. It is called the “update problem”. Not program merging. Anyhow, it is mostly a change in terms. They also failed to cite Ray Solomonoff’s seminal 1989 and 2002 papers that introduced the update problem, and Schmidhuber’s 2002-2005 OOPS papers, of which my work is a continuation. This does not just seem like neglecting to cite. It looks like neglecting the whole artificial general intelligence (AGI) and algorithmic information theory (AIT) research community. Therefore, you should know that this is a situation that is beyond my personal contributions. I do not intend anyone else to appropriate Solomonoff’s original, ground-breaking research, for one. I am fed up with blatant violations of academic integrity. The authors should not be allowed to “deep learning” their way out of this as others did to Juergen, appropriating research they have not done themselves. It is nigh impossible for them not to know Ray Solomonoff’s aforementioned papers if they did something about program induction, even if we agree that they do not like me because they do not know my name, they do not like my “ethnicity”, or because they dislike the AGI community because we are not the AAAI community and we are not completely US-centered. None of which makes sense, and no such thing could excuse failing to cite a very visible and very relevant paper. I do not believe that their paper could be accepted by any venue as enthusiastically if they cited my work, and they have naturally failed to mention the large similarities, and tiny differences. The authors should still revise their papers, cite our earlier work, and compare their methods.
The transfer learning algorithm seems about the same, it is merely applied to pattern recognition instead of universal induction, so it is much more limited in scope of course. They did not cite my prior work, and it looks like my algorithms found their way into at least one PhD thesis, and several nice publications at coveted venues, but my pioneering contribution was neglected. Because according to Prof. Tenenbaum, AGI is a “minor” conference — he told me so in private e-mail to shut down our conversation — and that is normal — failing to cite relevant work that is. How can these allegedly “minor” conferences become “major” if everyone neglects the work published therein? This is a logical contradiction, and that argument is invalid. That argument is why I now have to let everyone in the AGI community know about the appropriation of the innovative research published in AGI conferences. I know this alone does not help, but I at least have to get this off my chest as my work has been treated unjustly.
For the record, HAM is more general than BPL method, it precedes it by 1.5 years, and it should have been cited by their papers. They cannot say they did not see the paper, because it was on the arxiv 6 months before they put theirs on arxiv. That is exactly why I put it on the arxiv in March 2011. And I published and presented pretty much the same thing in AGI 2010 conference, and later published in AGI 2011 conference. I put it on arxiv for exactly this reason, I knew that it would appear somewhere else uncannily. I am not accusing anyone with a particular charge, I do not know what happened, but that does not matter. I think they simply will not cite papers from AGI conference, which looks to me like a serious ethical error. That attitude does not sit well with my understanding of academic integrity and peer review process. Even if they forgot, the reviewers should have detected this neglect. Therefore, this situation cannot be excused by either the authors or the publishers. They had 5 years to do their literature survey. Why have not they noticed it in the intervening years when they could have found it easily on google scholar or arxiv search using common technical terms like “arxiv context-free grammar machine learning”? My paper will come up in the first google results page with many such queries — third in that one. What is the probability of that happening, all else being equal? That is how the “informant” found the paper, he was apparently searching for my paper on arxiv, and found theirs, as well. If I did not tell anyone, people would likely keep pretending this did not happen. I am bringing this matter to your attention, for some “clever” people might attempt to erase my existence from the net even if with a small probability (<5%). I am not going to pretend that I am not worried about this. The papers downloaded from arxiv are archived and timestamped elsewhere in case anyone attempts to delete my arxiv account or destroy my contribution to AGI-10 and AGI-11 proceedings by some means I cannot yet imagine.
What is similar in the two algorithms, and what are different?
- The four update algorithms and their sequence looks about the same.
- Even the abstracts read very similarly.
- Their update is used in an evolutionary pattern recognition phase, something much weaker than universal induction, merely pattern space search. They did program space, later, in another paper, though, which is even more similar to HAM circa AGI-2010. HAM is still much more general.
- My algorithms use a stochastic CFG as the guiding distribution, theirs use my updates in a heuristic, mathematically incomplete way to merge patterns.
- The names and terms used in the update algorithms are different. However, their semantics are the same.
- HAM precedes the first BPL paper by 1.5 years and is a mathematically complete model of long-term memory for AGI. Whereas their paper is merely an application of the four update algorithms in a pattern search algorithm, which is of course much, much weaker.
- The third author Noah D. Goodman and the MIT PPL team know Moshe Looks and Google Brain group in person, to whom I explained my algorithms in person during AGI-2010. Goodman also presented a tutorial at AGI-2011.
- The first author Irvin Hwang was a Solomonoff student prize winner at AGI-2011. They likely had other people as attendees in AGI-2010.
Since two of the three authors were definitely at AGI-2011, it is likely they had a keen interest in the subject matter. It might be then that they got the idea in AGI-2010 or AGI-2011 that they would scoop some of the work in AGI conferences and nobody would notice because their professors think it is a “minor conference”. That two out of three authors were at the same conference where I published my teramachine paper that included the full formulation of the HAM design cannot be a coincidence. This all happened before they published their pattern search paper. I must stress that my paper is not the only paper that was scooped from AGI conference by the respectable MIT Computational Cognitive Science group. I believe they simply might not be citing any research that comes out of people they do not know, or endorse, personally. That is a serious violation of academic integrity in itself.
I thought that they would eventually cite me. They had probably just missed my paper! How could they not know my work? It was so obviously similar, and my work preceded theirs by a very long period (for the Internet age!) at the premiere conference for general-purpose machine learning! How come? I could not know at the time that some discriminative biases of some academicians or supervisors might be at play. Who knows, perhaps somebody suggested to them that they should do this, because they could simply disregard the work of a non-American researcher. If it happened to Prof. Juergen Schmidhuber who is regarded as a genius in the AGI community, it can happen to anyone. And that, is something I do take an issue with. How can they not cite both Juergen’s and my work? How can they not know Solomonoff who invented the very concept of machine learning and incremental machine learning which I made a proper algorithm for? That is offensive to my sense of academic integrity, and I told as much to Prof. Tenenbaum. He told me had not yet reviewed the two papers, and only suggested me to rush with subsequent publications. However, what good would such publications be if people kept neglecting my work? In the end, they would make their work look more important, although I invented the basic method, and I cannot in good conscience cite their work positively. Currently, I will have to cite them as researchers who likely got inspired by my papers but failed to cite prior work that is essentially identical. How can that look good? That would probably be just used as another excuse to reject my subsequent research , and scoop it as well, which is why I am not keen on publishing much more. If people think they can lift work from conferences they label “minor conferences”, what is the point of publishing anything? If the scientific community has forsaken academic integrity, and has turned into an anti-intellectual sort of celebrity culture (?), what is the point of contributing? What is the point of authorship? Can they simply erase my research from history? These troubling questions upset me a lot, as you can tell.
I even tried to communicate this politely last year, by mentioning the incredibly high relevance of my work on Heuristic Algorithmic Memory to the sibling PPL group that Dr. Frank Wood leads at Oxford in a sincere postdoc application to their project — as a friendly gesture, not because I really wanted to work there, I do get offers from prominent faculties. I had assumed that the members of the DARPA PPL group would see my statement letter, and would eventually start citing my papers, and perhaps even make it up to me, but that did not happen either, which is why I was finally forced to make my case that the most important contribution in Goodman et. al’s paper seems to be derived from my prior 2010 and 2011 publications which I can document and prove the existence of. Their later publications are even closer to inductive inference papers that incorporate transfer learning as I did in 2010, 2011, and 2014 and their newer journal and conference papers, quite lengthy ones, still fail to cite our work that originally introduced this method (Solomonoff 1989, Schmidhuber 2002, Ozkural 2010). Have a look yourselves, and let me know what you think. Whose work is prior, and whose work is more general? This is quite important to me, because this invention is the cornerstone of my research in AI field. Everything in my career, my potential academic appointments, funding of my startup, research grants, depends on the proper recognition of my pioneering work in transfer learning. I have yet to receive any credit for it, and I have already been massively damaged financially as well as temporally because of this willful neglect, not to mention how my motivation might have been hampered.
PS: At the time of writing this tirade, I was emotionally charged and at the beginning of a phase of slightly suicidal depression because of the events that transpired. I had found out that a large group of researchers were systematically plagiarizing my work, and this discouraged me from contributing further to AI. I was pondering quitting Computer Science for good. It is a form of academic mobbing.