Dataset Category

NLP & Text Datasets for AI Training

Labeled text datasets for sentiment analysis, NER, classification, LLM fine-tuning, and more. PII-scanned, quality-verified, ready to train on.

Browse NLP Datasets → Sell Your Dataset

NLP Tasks Covered

From classic classification tasks to modern LLM instruction datasets, find exactly what your model needs.

💬

Sentiment Analysis

Positive/negative/neutral labeled text at sentence and aspect level. Reviews, social media, support tickets, and more.

🏷️

Named Entity Recognition

Token-level span annotations for people, organizations, locations, dates, and custom entity types.

🤖

LLM Fine-Tuning

Instruction-following, chat, and RLHF preference datasets formatted for GPT, LLaMA, Mistral, and Falcon.

Question Answering

Extractive and abstractive Q&A pairs with context passages. SQuAD-style and conversational formats.

📋

Text Classification

Single-label and multi-label text datasets for topic categorization, spam detection, and intent detection.

🌐

Translation & Summarization

Parallel corpora for machine translation and reference summaries for abstractive summarization training.

Frequently Asked Questions

Sentiment analysis, NER, text classification, Q&A pairs, summarization datasets, intent detection, dialogue, machine translation pairs, and LLM instruction fine-tuning datasets.
CSV, JSONL (newline-delimited JSON), Parquet, Arrow, and plain JSON. JSONL is the most common format and is natively supported by Hugging Face Datasets, pandas, and most fine-tuning frameworks.
Yes. Every dataset goes through automated PII scanning before publication. Datasets that pass display a "PII Scanned" badge. Sellers are required to remove or anonymize personal information before uploading.
Yes. Many sellers offer instruction-following, chat, and domain-specific datasets specifically formatted for fine-tuning open-weight models like LLaMA 3, Mistral, and Phi.
Upload your CSV or JSONL file, our pipeline validates the structure and scans for PII, you set a price, and buyers can purchase instantly. Sellers keep 85% of every sale with no listing fees.

Ready to build better NLP models?

Browse verified NLP datasets — or monetize your text data today.

Browse NLP Datasets → Sell Your Dataset