Nlp
-
Word Embedding
Dense vector representation of a word in low-dimensional space, capturing semantic and syntactic relationships.
-
Text Classification
Assigning one or more categories to text; includes sentiment analysis, topic classification, spam detection, and intent recognition.
-
Sequence-to-Sequence
Encoder-decoder architecture mapping input sequences to output sequences; used for translation, summarisation, and dialogue.
-
Sentence Boundary Detection
Identifying sentence boundaries in text; handles ambiguous punctuation (periods in abbreviations, decimal points, URLs) and enables sentence-level processing.
-
RLHF
Reinforcement Learning from Human Feedback; uses human preference comparisons to fine-tune language models for safety and alignment.
-
Part-of-Speech Tagging
Assigning grammatical roles (noun, verb, adjective, etc.) to tokens in text; fundamental for syntax analysis and downstream NLP tasks.
-
Named Entity Recognition
Identifying and classifying named entities (persons, locations, organisations) in text; fundamental NLP task for information extraction.
-
Language Model
Probability distribution over sequences of tokens; predicts next token given context. Foundation of NLP from n-grams to large language models.
-
Instruction Tuning
Fine-tuning language models on diverse (instruction, response) pairs to improve generalization and follow natural language instructions.
-
Dependency Parsing
Analysing grammatical structure by identifying directed dependency relations between tokens; output is a dependency tree.
-
Coreference Resolution
Linking mentions of the same entity across a document; resolves pronouns and nominal references to their antecedents.
-
Cloze Task
Predicting masked tokens from context; unsupervised pre-training objective where random words are hidden and must be inferred.
-
Chunking
Grouping tokens into phrases or chunks; shallow syntactic analysis that segments noun phrases, verb phrases, and prepositional phrases.