This present technology provides a Bernoulli Process Topic (BPT) model which models the corpus at two levels: document level and citation level.
Each document has two different representations in the latent topic space associated with its roles. Moreover, the multilevel hierarchical structure of the citation network is captured by a generative process involving a Bernoulli process. The comparisons against other methods demonstrate a very promising performance.
• Explicitly differentiates two different roles of citation networks: document itself and citations of other documents.
• Model can be used in several data mining tasks which cannot be achieved by alternative technologies, such as: literature recommendation, novel research topics detection, and research areas trend discovery.