Autoencoders have attracted a lot of attention as a building block of Deep Learning and modeling textual data.
The present technology addresses the aforementioned problem by introducing supervision via the loss function of autoencoders. In particular, a linear classifier is first trained on the labeled data, then define a loss for the autoencoder with the weights learned from the linear classifier. To reduce the bias brought by one single classifier, a posterior probability distribution is defined on the weights of the classifier, and derive the marginalized loss of the autoencoder with Laplace approximation. Choice of loss function can be rationalized from the perspective of Bregman Divergence, which justifies the soundness of the model. The effectiveness of the model is evaluated on six sentiment analysis datasets, and shows that the model significantly outperforms all the competing methods with respect to classification accuracy. The model takes advantage of unlabeled dataset and get improved performance. Finally, the model successfully learns highly discriminative feature maps, which explains its superior performance.
• Faster and more accurate over competing technologies.
• Outperforms “Bag of Words”, a traditional Denoising Autoencoder, and other competing methods.
• Learns highly discriminative feature maps.
• System can learn the orthogonal concepts, using traditional machine learning technologies.
• System provides integrity, comprehensiveness and universality (entire series of applications accessible).
• Minimal training data needed.