Inspired by Hinton's recent update to the capsule networks model, we proposed a research program to SingularityNET a few months ago.
I decided to open the proposal to public for comments before we revise and proceed with some actual work. 🙂 Last we talked, we decided to apply the approach to the image segmentation problem, however, I think there is a lot more to it. I have been quite sick for a few months due to my erratic allergic asthma condition, and I have only recovered a few weeks ago, it looks like I should be much more careful with allergens in my life from now on. The general idea is to widen the model space for image processing problems via consideration of things like a mereological organization as Capsule Networks does, and then apply this stuff to a non-trivial image processing problem such as segmentation. I believe the general approach to be quite viable, and I was glad to see that Alexey did some intriguing work on the EM Matrix stuff:
Here is the google drive link
I am expecting your unhinged comments on the proposal. Why is this so short? Are you even a computer vision researcher? What do you know about seeing? Bring them on.
And here is the proposal in its entirety:
Singularity.Net Research Proposal for Capsule Networks
We propose an extensive investigation into advanced applications of modular deep networks based on Hinton’s capsule networks formalism. Whereas usual deep learning applications are based on hierarchical pattern recognizers, capsules can fuse information across different specialized networks. The goal is to apply the modular approach to problems of interest in robotics such as object detection, object tracking, semantic image segmentation and scene description. These tasks usually require an inordinate amount of manually labelled or marked examples, and they might not have great generalization power beyond the domain of training. Although there exist some primitive means of transfer learning by storing trained nets and updating them, a modular network would be harder to fool by adversarial examples, and it would carry over more readily across domains. It might also require fewer examples.
The desired end goal of the research project is to obtain an end-to-end semantic image segmentation system that accepts video input, and works in unsupervised mode, producing high quality segmentations with accurate labelling. The proposed approach is to venture beyond texture recognition and formulate unsupervised approaches for feature domains of relevance for the tasks mentioned. The unsupervised deep learning modules are thereafter bound in a capsule network fashioned ensemble model. The resulting segmentation may be obtained either via energy formulations of the segmentation task, or via supervised learning, which makes the system semi-supervised. Here is the research plan:
- A survey of all relevant recent publications is written.
- A theoretical description of the models are written in reference to relevant research.
- Baseline systems and relevant capsule candidates are implemented.
- Initially, the approach will be tried on 2D object recognition, and the solution of CAPTCHAs. 3 different kinds of capsules are fused. After the effectiveness of the modularity approach is demonstrated we move to harder problems.
- The capsule number is increased to 5-6, and scalability experiments are trialled for supervised object recognition benchmarks.
- When the scalability is sufficient, we move on to video domain and add video-relevant capsule kinds, total number 10, actual number of capsules much higher. Benchmarks are repeated.
- Semi-supervised and unsupervised segmentation tasks are addressed.
- Competence on a non-trivial dataset such as a medical imaging dataset is demonstrated.
- A sample application for on-line, semi-supervised semantic image segmentation is developed.
Some additional methods to be trialled include hierarchical organization of capsules, using GA’s on capsule training, and Hebbian learning approaches, time permitting.
The proposed research is considered a low hanging fruit by the researchers, however it is a highly empirical kind of research. It is estimated that the proposed plan should take about a year to complete the first major release of the capsule network based vision architecture.