Classifies research papers and predicts publishability using SciBERT, Sentence-BERT, and self-training pipelines
Go to projectPublication Classifier is a hybrid NLP system that analyzes research papers to determine their publishability and recommend the most suitable conference. It combines modern transformer models like SciBERT and Sentence-BERT with classic machine learning and self-training strategies to generate interpretable, high-quality predictions.
The classifier uses text embeddings from SciBERT to evaluate whether a paper should be published. If deemed publishable, a semantic similarity approach using Sentence-BERT matches the paper with top-tier conferences such as CVPR, NeurIPS, EMNLP, KDD, and TMLR.
SciBERT for Content Understanding
Generates deep contextual embeddings from full paper content for classification tasks.
Self-Training Publishability Classifier
Uses a small set of labeled papers to iteratively train on larger unlabeled datasets with pseudo-labels.
Sentence-BERT for Conference Matching
Identifies the most semantically relevant conference based on similarity with topic prototypes.
Plug-and-Play with CSV Input
Accepts labeled and unlabeled CSVs, and outputs results to results/output.csv
—ready for evaluation or submission.
Expandable & Modular Codebase
Easy to fine-tune, extend with new conferences, or upgrade with better models like Longformer or GPT.
Developed by Ayush Sharma & Rishabh Kothari.
Check out the full project on Here.
Check out more projects on GitHub or reach out via LinkedIn.