Inspired by Hinton’s recent update to the capsule networks model, we shared a research program with SingularityNET a few months ago (end of 2017).

I decided to open the proposal to public for comments before we revise and proceed with some actual work. Last we talked, we decided to apply the approach to the image segmentation problem, however, I think there is a lot more to it. I have been quite sick for a few months due to my erratic allergic asthma condition, and I have only recovered a few weeks ago, it looks like I should be much more careful with allergens in my life from now on.  The general idea is to widen the model space for image processing problems via consideration of things like a mereological organization as Capsule Networks does, and then apply this stuff to a non-trivial image processing problem such as segmentation, loosely inspired by the actual architecture of the human visual cortex. I believe the general approach to be quite viable, and I was glad to see that Alexey did some intriguing work on the EM Matrix stuff.

Here is the google drive link

Any tangible contributions will be acknowledged, and I’m looking forward to contributions that will lead to co-authorship. I am hoping that academic integrity can incentivize even open research. I am quite interested in the idea of open research networks and protocols, so I thought we could give this a shot. 

I am expecting your unhinged comments on the proposal. Why is this so short? Are you even a computer vision researcher? What do you know about seeing? Do you wear glasses? Bring them on. Ben Goertzel asked me if I could put the proposal in a form that anybody can use, so here it is. Note that while Alexey’s results are encouraging, there does not seem to be a breakthrough just because of the use of capsule networks as I hoped, neither did he follow this particular plan, so there might be considerable challenges to realizing my idea of a brain-like vision architecture. It could very well be worthless, or could lead to a deep insight about vision in superposition, it’s Heisen-research.

And here is the proposal in its entirety:

Singularity.Net Research Proposal for Capsule Networks


We propose an extensive investigation into advanced applications of modular deep networks based on Hinton’s capsule networks formalism. Whereas usual deep learning applications are based on hierarchical pattern recognizers, capsules can fuse information across different specialized networks. The goal is to apply the modular approach to problems of interest in robotics such as object detection, object tracking, semantic image segmentation and scene description. These tasks usually require an inordinate amount of manually labelled or marked examples, and they might not have great generalization power beyond the domain of training. Although there exist some primitive means of transfer learning by storing trained nets and updating them, a modular network would be harder to fool by adversarial examples, and it would carry over more readily across domains. It might also require fewer examples.


The desired end goal of the research project is to obtain an end-to-end semantic image segmentation system that accepts video input, and works in unsupervised mode, producing high quality segmentations with accurate labelling. The proposed approach is to venture beyond texture recognition and formulate unsupervised approaches for feature domains of relevance for the tasks mentioned. The unsupervised deep learning modules are thereafter bound in a capsule network fashioned ensemble model. The resulting segmentation may be obtained either via energy formulations of the segmentation task, or via supervised learning, which makes the system semi-supervised. Here is the research plan:


  1. A survey of all relevant recent publications is written.
  2. A theoretical description of the models are written in reference to relevant research.
  3. Baseline systems and relevant capsule candidates are implemented.
  4. Initially, the approach will be tried on 2D object recognition, and the solution of CAPTCHAs. 3 different kinds of capsules are fused. After the effectiveness of the modularity approach is demonstrated we move to harder problems.
  5. The capsule number is increased to 5-6, and scalability experiments are trialled for supervised object recognition benchmarks.
  6. When the scalability is sufficient, we move on to video domain and add video-relevant capsule kinds, total number 10, actual number of capsules much higher. Benchmarks are repeated.
  7. Semi-supervised and unsupervised segmentation tasks are addressed.
  8. Competence on a non-trivial dataset such as a medical imaging dataset is demonstrated.
  9. A sample application for on-line, semi-supervised semantic image segmentation is developed.


Some additional methods to be trialled include hierarchical organization of capsules, using GA’s on capsule training, and Hebbian learning approaches, time permitting.


The proposed research is considered a low hanging fruit by the researchers, however it is a highly empirical kind of research. It is estimated that the proposed plan should take about a year to complete the first major release of the capsule network based vision architecture.


Original proposal to Singularity.Net (12/21/17):

There is something we’ve been designing that might be of interest for vision agents in your framework. We have a new approach to modularity in deep learning similar to capsule networks but it is an alternative approach. It is not GAN. We are considering a neuroscience inspired vision architecture. The architecture has substantial novelties both in terms of architecture and algorithms. It is suitable for robotics and it should improve performance in both unsupervised and supervised learning.
A colleague of mine from the Celestial Intellect team found the general approach and he’s applying the method to NLP, but I extended the method and found that it’s also applicable to vision. This is research-grade material, which should make nice NIPS or equivalent publications. Even if our approach fails to produce amazing new results, we would at least have a state-of-the-art vision stack based on capsule networks which would be a baseline for this kind of architecture. Google is also doing something similar and we feel it is generally the winning approach. This was mostly confidential work, and we’ve disclosed almost nothing about it, and it’s at design stage yet. I asked him and he told me he can contribute if it works out. It can also be merged with other vision code. However, it is not straight-up development work, it is R&D work, it would require us to do more design, and write up the design, and then we can go on to experiments.
The same approach is likely applicable to NLP. Among other things, we can translate expressions to PLN.
As a follow-up to this, I’ve also designed an autonomous ANN agent architecture inspired by Friston’s work. The agent is fairly general-purpose and can work in unsupervised mode. It is similar to some of the more ambitious robotics architectures of 80’s/90’s, but using our approach (with much much larger networks) instead. It could probably be adapted to the SingularityNET kind of decomposition but I have not checked. The robotics architecture is quite ambitious.
Given Hinton has already published a new iteration of capsule net, I thought it might be a good thing to try. I was looking at Hinton’s paper today again, it looks like the right path to improving computer vision. The basic idea seems to be similar, we need modularity, I’m sure we could do something more improved per implementation and even showing something that adds a little to what he’s already done would be publishable. His paper is more like proving the general principle and it’s analogous to Hawkins’s team’s success (it can work with recognition in occluded, twisted conflicting objects, can recognize things like CAPTCHAs). People are already experimenting with capsule nets.
A Research Proposal for Capsule Networks


Eray Özkural has obtained his PhD in computer engineering from Bilkent University, Ankara. He has a deep and long-running interest in human-level AI. His name appears in the acknowledgements of Marvin Minsky's The Emotion Machine. He has collaborated briefly with the founder of algorithmic information theory Ray Solomonoff, and in response to a challenge he posed, invented Heuristic Algorithmic Memory, which is a long-term memory design for general-purpose machine learning. Some other researchers have been inspired by HAM and call the approach "Bayesian Program Learning". He has designed a next-generation general-purpose machine learning architecture. He is the recipient of 2015 Kurzweil Best AGI Idea Award for his theoretical contributions to universal induction. He has previously invented an FPGA virtualization scheme for Global Supercomputing, Inc. which was internationally patented. He has also proposed a cryptocurrency called Cypher, and an energy based currency which can drive green energy proliferation. You may find his blog at and some of his free software projects at

One thought on “A Research Proposal for Capsule Networks

  • May 15, 2018 at 10:32 am

    I would be most interested to explore areas related to AI other than mechanisms to achieve certain levels of performance. This I consider the practical delivery of AI agents capable of managing systems
    Taking note of the tag line, turning Science Fiction to Science Fact, and also taking into account some of the articles written expressing concern about the future impact of AI, what if we could find ways to embed the underlying behaviour found in people whose principles are based on compassion, empathy caring and societal benefit rather than greed and self importance?
    Imagine a fiction where AI agents were aligned with being better human beings, how much could we learn and improve? I would be interested to hear arguments against this fiction, because it seems to be one that is both worth doing and within reach of the AI path?


Leave a Reply

Your email address will not be published. Required fields are marked *

Translate »