Common Functions — Cheat-Sheet

Common Functions (Cheat-Sheet)

Frequently used functions for NumPy, pandas, Matplotlib, Seaborn, and scikit-learn — with concise examples and what each example is actually doing.

NumPy pandas Matplotlib Seaborn scikit-learn

NumPy

numpy

Function	Example	What this example does
np.array	`np.array([1, 2, 3])`	Constructs a NumPy 1-D array from a Python list so you can run fast, vectorised numeric operations on the values.
np.arange	`np.arange(0, 10, 2)`	Creates integers 0..8 in steps of 2 (0,2,4,6,8). Useful for index ranges, ticks, or generating simple synthetic data.
np.linspace	`np.linspace(0, 1, 5)`	Generates five evenly spaced points between 0 and 1 (inclusive), ideal for smooth sampling on an interval.
np.zeros / np.ones	`np.zeros((2, 3))` \| `np.ones(4)`	Initialises arrays with 0s (placeholders/masks) or 1s (weights/bias starts), without manual loops.
np.random.rand / randn	`np.random.rand(2, 3)` \| `np.random.randn(3)`	Draws uniform(0,1) or standard normal samples for simulations, bootstraps, and quick model experiments.
np.reshape	`np.reshape(a, (3, 2))`	Reinterprets the same data as a 3×2 view without copying — the canonical way to prepare matrices/tensors.
a.T / np.transpose	`a.T`	Transposes rows and columns (matrix transpose), commonly used before dot products or plotting.
np.dot	`np.dot(a, b)`	Computes a dot product / matrix multiplication — the backbone of linear algebra and ML internals.
np.mean / median / std	`np.mean(a, axis=0)`	Aggregates along the chosen axis (here: column-wise mean), often as quick quality checks or feature scaling.
np.sum / min / max	`np.sum(a, axis=1)`	Row-wise sum (here across columns); min/max variants give bounds for validation and sanity checks.
np.argmax / argmin	`np.argmax(a)`	Returns the index of the maximum value (e.g., predicted class index from probabilities).
np.concatenate	`np.concatenate([a, b], axis=0)`	Stacks arrays along an existing axis (here adds rows), handy when batching data.
np.stack	`np.stack([a, b], axis=0)`	Creates a new axis (here: shape becomes [2, …]) — useful for building mini-batches.
np.where	`np.where(a > 0, 1, 0)`	Vectorised conditional: converts positives to 1, others to 0, without Python loops.
np.unique	`np.unique(a, return_counts=True)`	Finds unique values and how often they appear — great for class balance checks.

pandas

pd

Function	Example	What this example does
pd.read_csv	`pd.read_csv("data.csv")`	Loads a CSV into a DataFrame so you can filter, join, aggregate, and visualise tabular data.
pd.DataFrame	`pd.DataFrame({"a":[1,2], "b":[3,4]})`	Constructs a DataFrame from dict/arrays; great for small examples or unit tests.
df.head	`df.head(5)`	Previews the first five rows to quickly verify columns, parsing, and obvious issues after loading.
df.info	`df.info()`	Prints schema: columns, dtypes, non-null counts — your first stop for data quality checks.
df.describe	`df.describe()`	Generates numeric summary stats (count/mean/std/min/percentiles/max) to profile distributions.
df.loc / df.iloc	`df.loc[df["cat"]=="A", ["x","y"]]`	Label-based selection (or position-based with `iloc`) to filter rows and pick columns explicitly.
df.assign	`df.assign(ratio = df["a"]/df["b"])`	Adds a computed column (here, a/b) in a chainable, immutable style that keeps pipelines tidy.
df.groupby	`df.groupby("cat").agg({"x":"mean", "y":"sum"})`	Aggregates by category (mean of x, sum of y) — classic split-apply-combine pattern.
df.merge	`df.merge(dim, on="id", how="left")`	SQL-style join to attach lookup/dimension data; specify `how` to control inclusion.
df.pivot_table	`df.pivot_table(values="sales", index="month", columns="region", aggfunc="sum")`	Summarises long data to a matrix by month × region with summed sales — perfect for dashboards.
df.dropna / df.fillna	`df.dropna(subset=["age"])` \| `df.fillna({"age":0})`	Removes rows with missing ages or imputes with a specified value; use thoughtfully to avoid bias.
df.astype	`df.astype({"age":"int32"})`	Converts column dtypes for memory/performance or to satisfy model/visualisation requirements.
df.sort_values	`df.sort_values(["date","sales"], ascending=[True, False])`	Orders rows by multiple keys (date ascending, then sales descending) for ranked outputs.
pd.to_datetime	`pd.to_datetime(df["timestamp"], utc=True)`	Parses strings to timezone-aware datetimes so you can resample, window, and plot time series correctly.
df.set_index / resample	`df.set_index("date").resample("W").sum()`	Promotes the date to index and groups data weekly, aggregating numeric columns — ideal for trends.
df.value_counts	`df["cat"].value_counts(normalize=True)`	Computes class distribution (with `normalize` for proportions) to check balance or drift.
df.to_csv	`df.to_csv("out.csv", index=False)`	Writes a clean CSV (without index) for sharing, downstream tools, or Git-tracked artefacts.

Matplotlib

plt

Function	Example	What this example does
plt.figure / subplots	`fig, ax = plt.subplots(figsize=(6,4))`	Creates a figure and a single axes object to follow the OO style, which scales best for complex charts.
plt.plot	`ax.plot(x, y, linewidth=2)`	Draws a line series on the axes; thicker line for emphasis in time-series or trend views.
plt.scatter	`ax.scatter(x, y, alpha=.7)`	Plots points with transparency to reveal dense regions; use for relationships/outliers.
plt.bar	`ax.bar(cats, vals)`	Creates a vertical bar chart for categorical comparisons.
plt.hist	`ax.hist(x, bins=30)`	Shows distribution shape; choose bins to reveal modality while avoiding noise.
Styling	`ax.set_title("Sales"); ax.set_xlabel("Month"); ax.grid(True)`	Sets title/labels and enables a grid for readability; central to professional-looking plots.
Layout & export	`fig.tight_layout(); fig.savefig("fig.png", dpi=300)`	Fixes overlaps and exports a crisp image suitable for slides/reports.

Seaborn

sns

Function	Example	What this example does
sns.scatterplot	`sns.scatterplot(data=df, x="x", y="y", hue="cat")`	Encodes category by colour, making clusters or class separation instantly visible.
sns.lineplot	`sns.lineplot(data=df, x="date", y="value")`	Plots time-series with confidence intervals (if present), ideal for trends with uncertainty.
sns.barplot / countplot	`sns.barplot(data=df, x="cat", y="val")` \| `sns.countplot(x="cat", data=df)`	Shows category means with error bars (barplot) or raw counts (countplot).
sns.boxplot / violinplot	`sns.boxplot(data=df, x="cat", y="val")`	Compares distributions across categories; violinplot reveals full density shape.
sns.heatmap	`sns.heatmap(df.corr(), annot=True, cmap="Blues")`	Visualises correlation structure; annotation shows exact coefficients for quick readouts.
sns.pairplot	`sns.pairplot(df, hue="species")`	Creates a scatter-matrix to explore relationships and separability across many variables.

scikit-learn

sklearn

Function	Example	What this example does
train_test_split	`X_tr, X_te, y_tr, y_te = train_test_split(X, y, test_size=0.2, random_state=42)`	Splits features/labels into training and hold-out sets to evaluate generalisation fairly.
StandardScaler	`scaler = StandardScaler().fit(X_tr); Xs = scaler.transform(X_tr)`	Centers and scales features (mean 0, var 1) so many models converge faster and perform better.
ColumnTransformer	`ct = ColumnTransformer([("num", StandardScaler(), num_cols), ("cat", OneHotEncoder(), cat_cols)])`	Applies the right preprocessing to each column subset within one object — robust and reusable.
Pipeline	`pipe = Pipeline([("prep", ct), ("clf", LogisticRegression(max_iter=1000))])`	Chains preprocessing and model so CV and grid search treat everything as one unit (no leakage).
RandomForestClassifier	`rf = RandomForestClassifier(n_estimators=300, random_state=0).fit(X_tr, y_tr)`	Fits an ensemble of decision trees that reduce variance; strong baseline for tabular data.
GridSearchCV	`GridSearchCV(pipe, {"clf__C":[0.1,1,10]}, cv=5, n_jobs=-1)`	Searches hyperparameters via cross-validation while respecting the full preprocessing pipeline.
classification_report	`print(classification_report(y_te, y_pred))`	Prints precision/recall/F1 per class + macro/weighted averages to understand model trade-offs.
confusion_matrix	`ConfusionMatrixDisplay.from_predictions(y_te, y_pred)`	Shows counts per true/predicted class so you can spot where errors concentrate.
mean_squared_error	`rmse = mean_squared_error(y_te, y_hat, squared=False)`	Computes RMSE for regression (same units as target), easier to interpret than raw MSE.
model.score	`pipe.score(X_te, y_te)`	Returns default score (accuracy/R²) for quick checks — still validate with richer metrics above.