Information Retrieval, Search and Ranking

Contents

Information Retrieval, Search and Ranking#

Information retrieval (IR) is the science of finding relevant material — usually documents or data records — from a large collection in response to a query. It underpins virtually every product that helps users find things: web search engines, e-commerce product search, recommendation feeds, enterprise knowledge bases, and code search tools.

Why it matters in Applied Data Science#

Modern data science teams encounter IR problems constantly:

Internal search — employees querying document stores, wikis, or data catalogs.
Customer-facing search — product discovery, content recommendation, support ticket routing.
RAG pipelines — retrieval-augmented generation systems that fetch context before passing it to a language model.
Ad targeting & ranking — selecting and ordering items from a candidate pool to maximise a business objective.

Getting retrieval right is often more impactful than tuning the downstream model, because garbage-in ⟹ garbage-out applies at the retrieval stage too.

Chapter structure#

This chapter is organised into three sections:

Section	Topics
Search Fundamentals	Inverted indexes, TF-IDF, BM25, evaluation metrics (NDCG, MAP, MRR)
Semantic Search	Dense retrieval, embeddings, approximate nearest-neighbour search, vector databases
Learning to Rank	Pointwise / pairwise / listwise LTR, LambdaMART, neural rankers, custom loss functions

Each section combines conceptual explanations with practical code examples. Resource articles and notebooks will be added to the information_retrieval/ subfolder and synthesised into the relevant section pages.