Project Publication Classifier

About Publication Classifier

Publication Classifier is a hybrid NLP system that analyzes research papers to determine their publishability and recommend the most suitable conference. It combines modern transformer models like SciBERT and Sentence-BERT with classic machine learning and self-training strategies to generate interpretable, high-quality predictions.

The classifier uses text embeddings from SciBERT to evaluate whether a paper should be published. If deemed publishable, a semantic similarity approach using Sentence-BERT matches the paper with top-tier conferences such as CVPR, NeurIPS, EMNLP, KDD, and TMLR.

SciBERT for Content Understanding
Generates deep contextual embeddings from full paper content for classification tasks.
Self-Training Publishability Classifier
Uses a small set of labeled papers to iteratively train on larger unlabeled datasets with pseudo-labels.
Sentence-BERT for Conference Matching
Identifies the most semantically relevant conference based on similarity with topic prototypes.
Plug-and-Play with CSV Input
Accepts labeled and unlabeled CSVs, and outputs results to results/output.csv—ready for evaluation or submission.
Expandable & Modular Codebase
Easy to fine-tune, extend with new conferences, or upgrade with better models like Longformer or GPT.

Tech Stack

Embeddings & Transformers: SciBERT, Sentence-BERT, Hugging Face Transformers
Modeling: scikit-learn, self-training classifier
Data Handling: pandas, numpy
Evaluation: precision, recall, F1, confusion matrix
Execution: Python 3.8+, CLI-compatible scripts

Credits

SciBERT: For domain-specific contextual embeddings of scientific content.
Sentence-BERT: For high-quality semantic similarity computation.
scikit-learn: For baseline classification pipelines and metrics.
pandas/numpy: For efficient CSV handling and data preprocessing.

Author

Developed by Ayush Sharma & Rishabh Kothari.
Check out the full project on Here.
Check out more projects on GitHub or reach out via LinkedIn.

AI-Powered Platforms to Automate, Analyze & Accelerate Growth

AI-Driven Automation

Custom AI Development

Intelligent Platforms

MLOps & Deployment

Showcasing Our Innovations: Real-World AI Solutions

Automation & Efficiency

Intelligent Platforms

Advanced AI Applications

Building a Unified Intelligent Ecosystem for People, Agents, and Robots

Education

Telecommunication

Home Automation

IoT

Publication Classifier

About Publication Classifier

Tech Stack

Credits

Author