Nlp

Word Embedding

Dense vector representation of a word in low-dimensional space, capturing semantic and syntactic relationships.
Text Classification

Assigning one or more categories to text; includes sentiment analysis, topic classification, spam detection, and intent recognition.
Sequence-to-Sequence

Encoder-decoder architecture mapping input sequences to output sequences; used for translation, summarisation, and dialogue.
Sentence Boundary Detection

Identifying sentence boundaries in text; handles ambiguous punctuation (periods in abbreviations, decimal points, URLs) and enables sentence-level processing.
RLHF

Reinforcement Learning from Human Feedback; uses human preference comparisons to fine-tune language models for safety and alignment.
Part-of-Speech Tagging

Assigning grammatical roles (noun, verb, adjective, etc.) to tokens in text; fundamental for syntax analysis and downstream NLP tasks.
Named Entity Recognition

Identifying and classifying named entities (persons, locations, organisations) in text; fundamental NLP task for information extraction.
Language Model

Probability distribution over sequences of tokens; predicts next token given context. Foundation of NLP from n-grams to large language models.
Instruction Tuning

Fine-tuning language models on diverse (instruction, response) pairs to improve generalization and follow natural language instructions.
Dependency Parsing

Analysing grammatical structure by identifying directed dependency relations between tokens; output is a dependency tree.
Coreference Resolution

Linking mentions of the same entity across a document; resolves pronouns and nominal references to their antecedents.
Cloze Task

Predicting masked tokens from context; unsupervised pre-training objective where random words are hidden and must be inferred.
Chunking

Grouping tokens into phrases or chunks; shallow syntactic analysis that segments noun phrases, verb phrases, and prepositional phrases.