Common Functions (Cheat-Sheet)

Frequently used functions for NumPy, pandas, Matplotlib, Seaborn, and scikit-learn — with concise examples and what each example is actually doing.

NumPy

numpy
Function Example What this example does
np.array np.array([1, 2, 3]) Constructs a NumPy 1-D array from a Python list so you can run fast, vectorised numeric operations on the values.
np.arange np.arange(0, 10, 2) Creates integers 0..8 in steps of 2 (0,2,4,6,8). Useful for index ranges, ticks, or generating simple synthetic data.
np.linspace np.linspace(0, 1, 5) Generates five evenly spaced points between 0 and 1 (inclusive), ideal for smooth sampling on an interval.
np.zeros / np.ones np.zeros((2, 3))  |  np.ones(4) Initialises arrays with 0s (placeholders/masks) or 1s (weights/bias starts), without manual loops.
np.random.rand / randn np.random.rand(2, 3)  |  np.random.randn(3) Draws uniform(0,1) or standard normal samples for simulations, bootstraps, and quick model experiments.
np.reshape np.reshape(a, (3, 2)) Reinterprets the same data as a 3×2 view without copying — the canonical way to prepare matrices/tensors.
a.T / np.transpose a.T Transposes rows and columns (matrix transpose), commonly used before dot products or plotting.
np.dot np.dot(a, b) Computes a dot product / matrix multiplication — the backbone of linear algebra and ML internals.
np.mean / median / std np.mean(a, axis=0) Aggregates along the chosen axis (here: column-wise mean), often as quick quality checks or feature scaling.
np.sum / min / max np.sum(a, axis=1) Row-wise sum (here across columns); min/max variants give bounds for validation and sanity checks.
np.argmax / argmin np.argmax(a) Returns the index of the maximum value (e.g., predicted class index from probabilities).
np.concatenate np.concatenate([a, b], axis=0) Stacks arrays along an existing axis (here adds rows), handy when batching data.
np.stack np.stack([a, b], axis=0) Creates a new axis (here: shape becomes [2, …]) — useful for building mini-batches.
np.where np.where(a > 0, 1, 0) Vectorised conditional: converts positives to 1, others to 0, without Python loops.
np.unique np.unique(a, return_counts=True) Finds unique values and how often they appear — great for class balance checks.

pandas

pd
Function Example What this example does
pd.read_csv pd.read_csv("data.csv") Loads a CSV into a DataFrame so you can filter, join, aggregate, and visualise tabular data.
pd.DataFrame pd.DataFrame({"a":[1,2], "b":[3,4]}) Constructs a DataFrame from dict/arrays; great for small examples or unit tests.
df.head df.head(5) Previews the first five rows to quickly verify columns, parsing, and obvious issues after loading.
df.info df.info() Prints schema: columns, dtypes, non-null counts — your first stop for data quality checks.
df.describe df.describe() Generates numeric summary stats (count/mean/std/min/percentiles/max) to profile distributions.
df.loc / df.iloc df.loc[df["cat"]=="A", ["x","y"]] Label-based selection (or position-based with iloc) to filter rows and pick columns explicitly.
df.assign df.assign(ratio = df["a"]/df["b"]) Adds a computed column (here, a/b) in a chainable, immutable style that keeps pipelines tidy.
df.groupby df.groupby("cat").agg({"x":"mean", "y":"sum"}) Aggregates by category (mean of x, sum of y) — classic split-apply-combine pattern.
df.merge df.merge(dim, on="id", how="left") SQL-style join to attach lookup/dimension data; specify how to control inclusion.
df.pivot_table df.pivot_table(values="sales", index="month", columns="region", aggfunc="sum") Summarises long data to a matrix by month × region with summed sales — perfect for dashboards.
df.dropna / df.fillna df.dropna(subset=["age"])  |  df.fillna({"age":0}) Removes rows with missing ages or imputes with a specified value; use thoughtfully to avoid bias.
df.astype df.astype({"age":"int32"}) Converts column dtypes for memory/performance or to satisfy model/visualisation requirements.
df.sort_values df.sort_values(["date","sales"], ascending=[True, False]) Orders rows by multiple keys (date ascending, then sales descending) for ranked outputs.
pd.to_datetime pd.to_datetime(df["timestamp"], utc=True) Parses strings to timezone-aware datetimes so you can resample, window, and plot time series correctly.
df.set_index / resample df.set_index("date").resample("W").sum() Promotes the date to index and groups data weekly, aggregating numeric columns — ideal for trends.
df.value_counts df["cat"].value_counts(normalize=True) Computes class distribution (with normalize for proportions) to check balance or drift.
df.to_csv df.to_csv("out.csv", index=False) Writes a clean CSV (without index) for sharing, downstream tools, or Git-tracked artefacts.

Matplotlib

plt
Function Example What this example does
plt.figure / subplots fig, ax = plt.subplots(figsize=(6,4)) Creates a figure and a single axes object to follow the OO style, which scales best for complex charts.
plt.plot ax.plot(x, y, linewidth=2) Draws a line series on the axes; thicker line for emphasis in time-series or trend views.
plt.scatter ax.scatter(x, y, alpha=.7) Plots points with transparency to reveal dense regions; use for relationships/outliers.
plt.bar ax.bar(cats, vals) Creates a vertical bar chart for categorical comparisons.
plt.hist ax.hist(x, bins=30) Shows distribution shape; choose bins to reveal modality while avoiding noise.
Styling ax.set_title("Sales"); ax.set_xlabel("Month"); ax.grid(True) Sets title/labels and enables a grid for readability; central to professional-looking plots.
Layout & export fig.tight_layout(); fig.savefig("fig.png", dpi=300) Fixes overlaps and exports a crisp image suitable for slides/reports.

Seaborn

sns
Function Example What this example does
sns.scatterplot sns.scatterplot(data=df, x="x", y="y", hue="cat") Encodes category by colour, making clusters or class separation instantly visible.
sns.lineplot sns.lineplot(data=df, x="date", y="value") Plots time-series with confidence intervals (if present), ideal for trends with uncertainty.
sns.barplot / countplot sns.barplot(data=df, x="cat", y="val") | sns.countplot(x="cat", data=df) Shows category means with error bars (barplot) or raw counts (countplot).
sns.boxplot / violinplot sns.boxplot(data=df, x="cat", y="val") Compares distributions across categories; violinplot reveals full density shape.
sns.heatmap sns.heatmap(df.corr(), annot=True, cmap="Blues") Visualises correlation structure; annotation shows exact coefficients for quick readouts.
sns.pairplot sns.pairplot(df, hue="species") Creates a scatter-matrix to explore relationships and separability across many variables.

scikit-learn

sklearn
Function Example What this example does
train_test_split X_tr, X_te, y_tr, y_te = train_test_split(X, y, test_size=0.2, random_state=42) Splits features/labels into training and hold-out sets to evaluate generalisation fairly.
StandardScaler scaler = StandardScaler().fit(X_tr); Xs = scaler.transform(X_tr) Centers and scales features (mean 0, var 1) so many models converge faster and perform better.
ColumnTransformer ct = ColumnTransformer([("num", StandardScaler(), num_cols), ("cat", OneHotEncoder(), cat_cols)]) Applies the right preprocessing to each column subset within one object — robust and reusable.
Pipeline pipe = Pipeline([("prep", ct), ("clf", LogisticRegression(max_iter=1000))]) Chains preprocessing and model so CV and grid search treat everything as one unit (no leakage).
RandomForestClassifier rf = RandomForestClassifier(n_estimators=300, random_state=0).fit(X_tr, y_tr) Fits an ensemble of decision trees that reduce variance; strong baseline for tabular data.
GridSearchCV GridSearchCV(pipe, {"clf__C":[0.1,1,10]}, cv=5, n_jobs=-1) Searches hyperparameters via cross-validation while respecting the full preprocessing pipeline.
classification_report print(classification_report(y_te, y_pred)) Prints precision/recall/F1 per class + macro/weighted averages to understand model trade-offs.
confusion_matrix ConfusionMatrixDisplay.from_predictions(y_te, y_pred) Shows counts per true/predicted class so you can spot where errors concentrate.
mean_squared_error rmse = mean_squared_error(y_te, y_hat, squared=False) Computes RMSE for regression (same units as target), easier to interpret than raw MSE.
model.score pipe.score(X_te, y_te) Returns default score (accuracy/R²) for quick checks — still validate with richer metrics above.