Frequently used functions for NumPy, pandas, Matplotlib, Seaborn, and scikit-learn — with concise examples and what each example is actually doing.
| Function | Example | What this example does |
|---|---|---|
| np.array | np.array([1, 2, 3]) |
Constructs a NumPy 1-D array from a Python list so you can run fast, vectorised numeric operations on the values. |
| np.arange | np.arange(0, 10, 2) |
Creates integers 0..8 in steps of 2 (0,2,4,6,8). Useful for index ranges, ticks, or generating simple synthetic data. |
| np.linspace | np.linspace(0, 1, 5) |
Generates five evenly spaced points between 0 and 1 (inclusive), ideal for smooth sampling on an interval. |
| np.zeros / np.ones | np.zeros((2, 3)) | np.ones(4) |
Initialises arrays with 0s (placeholders/masks) or 1s (weights/bias starts), without manual loops. |
| np.random.rand / randn | np.random.rand(2, 3) | np.random.randn(3) |
Draws uniform(0,1) or standard normal samples for simulations, bootstraps, and quick model experiments. |
| np.reshape | np.reshape(a, (3, 2)) |
Reinterprets the same data as a 3×2 view without copying — the canonical way to prepare matrices/tensors. |
| a.T / np.transpose | a.T |
Transposes rows and columns (matrix transpose), commonly used before dot products or plotting. |
| np.dot | np.dot(a, b) |
Computes a dot product / matrix multiplication — the backbone of linear algebra and ML internals. |
| np.mean / median / std | np.mean(a, axis=0) |
Aggregates along the chosen axis (here: column-wise mean), often as quick quality checks or feature scaling. |
| np.sum / min / max | np.sum(a, axis=1) |
Row-wise sum (here across columns); min/max variants give bounds for validation and sanity checks. |
| np.argmax / argmin | np.argmax(a) |
Returns the index of the maximum value (e.g., predicted class index from probabilities). |
| np.concatenate | np.concatenate([a, b], axis=0) |
Stacks arrays along an existing axis (here adds rows), handy when batching data. |
| np.stack | np.stack([a, b], axis=0) |
Creates a new axis (here: shape becomes [2, …]) — useful for building mini-batches. |
| np.where | np.where(a > 0, 1, 0) |
Vectorised conditional: converts positives to 1, others to 0, without Python loops. |
| np.unique | np.unique(a, return_counts=True) |
Finds unique values and how often they appear — great for class balance checks. |
| Function | Example | What this example does |
|---|---|---|
| pd.read_csv | pd.read_csv("data.csv") |
Loads a CSV into a DataFrame so you can filter, join, aggregate, and visualise tabular data. |
| pd.DataFrame | pd.DataFrame({"a":[1,2], "b":[3,4]}) |
Constructs a DataFrame from dict/arrays; great for small examples or unit tests. |
| df.head | df.head(5) |
Previews the first five rows to quickly verify columns, parsing, and obvious issues after loading. |
| df.info | df.info() |
Prints schema: columns, dtypes, non-null counts — your first stop for data quality checks. |
| df.describe | df.describe() |
Generates numeric summary stats (count/mean/std/min/percentiles/max) to profile distributions. |
| df.loc / df.iloc | df.loc[df["cat"]=="A", ["x","y"]] |
Label-based selection (or position-based with iloc) to filter rows and pick columns explicitly. |
| df.assign | df.assign(ratio = df["a"]/df["b"]) |
Adds a computed column (here, a/b) in a chainable, immutable style that keeps pipelines tidy. |
| df.groupby | df.groupby("cat").agg({"x":"mean", "y":"sum"}) |
Aggregates by category (mean of x, sum of y) — classic split-apply-combine pattern. |
| df.merge | df.merge(dim, on="id", how="left") |
SQL-style join to attach lookup/dimension data; specify how to control inclusion. |
| df.pivot_table | df.pivot_table(values="sales", index="month", columns="region", aggfunc="sum") |
Summarises long data to a matrix by month × region with summed sales — perfect for dashboards. |
| df.dropna / df.fillna | df.dropna(subset=["age"]) | df.fillna({"age":0}) |
Removes rows with missing ages or imputes with a specified value; use thoughtfully to avoid bias. |
| df.astype | df.astype({"age":"int32"}) |
Converts column dtypes for memory/performance or to satisfy model/visualisation requirements. |
| df.sort_values | df.sort_values(["date","sales"], ascending=[True, False]) |
Orders rows by multiple keys (date ascending, then sales descending) for ranked outputs. |
| pd.to_datetime | pd.to_datetime(df["timestamp"], utc=True) |
Parses strings to timezone-aware datetimes so you can resample, window, and plot time series correctly. |
| df.set_index / resample | df.set_index("date").resample("W").sum() |
Promotes the date to index and groups data weekly, aggregating numeric columns — ideal for trends. |
| df.value_counts | df["cat"].value_counts(normalize=True) |
Computes class distribution (with normalize for proportions) to check balance or drift. |
| df.to_csv | df.to_csv("out.csv", index=False) |
Writes a clean CSV (without index) for sharing, downstream tools, or Git-tracked artefacts. |
| Function | Example | What this example does |
|---|---|---|
| plt.figure / subplots | fig, ax = plt.subplots(figsize=(6,4)) |
Creates a figure and a single axes object to follow the OO style, which scales best for complex charts. |
| plt.plot | ax.plot(x, y, linewidth=2) |
Draws a line series on the axes; thicker line for emphasis in time-series or trend views. |
| plt.scatter | ax.scatter(x, y, alpha=.7) |
Plots points with transparency to reveal dense regions; use for relationships/outliers. |
| plt.bar | ax.bar(cats, vals) |
Creates a vertical bar chart for categorical comparisons. |
| plt.hist | ax.hist(x, bins=30) |
Shows distribution shape; choose bins to reveal modality while avoiding noise. |
| Styling | ax.set_title("Sales"); ax.set_xlabel("Month"); ax.grid(True) |
Sets title/labels and enables a grid for readability; central to professional-looking plots. |
| Layout & export | fig.tight_layout(); fig.savefig("fig.png", dpi=300) |
Fixes overlaps and exports a crisp image suitable for slides/reports. |
| Function | Example | What this example does |
|---|---|---|
| sns.scatterplot | sns.scatterplot(data=df, x="x", y="y", hue="cat") |
Encodes category by colour, making clusters or class separation instantly visible. |
| sns.lineplot | sns.lineplot(data=df, x="date", y="value") |
Plots time-series with confidence intervals (if present), ideal for trends with uncertainty. |
| sns.barplot / countplot | sns.barplot(data=df, x="cat", y="val") | sns.countplot(x="cat", data=df) |
Shows category means with error bars (barplot) or raw counts (countplot). |
| sns.boxplot / violinplot | sns.boxplot(data=df, x="cat", y="val") |
Compares distributions across categories; violinplot reveals full density shape. |
| sns.heatmap | sns.heatmap(df.corr(), annot=True, cmap="Blues") |
Visualises correlation structure; annotation shows exact coefficients for quick readouts. |
| sns.pairplot | sns.pairplot(df, hue="species") |
Creates a scatter-matrix to explore relationships and separability across many variables. |
| Function | Example | What this example does |
|---|---|---|
| train_test_split | X_tr, X_te, y_tr, y_te = train_test_split(X, y, test_size=0.2, random_state=42) |
Splits features/labels into training and hold-out sets to evaluate generalisation fairly. |
| StandardScaler | scaler = StandardScaler().fit(X_tr); Xs = scaler.transform(X_tr) |
Centers and scales features (mean 0, var 1) so many models converge faster and perform better. |
| ColumnTransformer | ct = ColumnTransformer([("num", StandardScaler(), num_cols), ("cat", OneHotEncoder(), cat_cols)]) |
Applies the right preprocessing to each column subset within one object — robust and reusable. |
| Pipeline | pipe = Pipeline([("prep", ct), ("clf", LogisticRegression(max_iter=1000))]) |
Chains preprocessing and model so CV and grid search treat everything as one unit (no leakage). |
| RandomForestClassifier | rf = RandomForestClassifier(n_estimators=300, random_state=0).fit(X_tr, y_tr) |
Fits an ensemble of decision trees that reduce variance; strong baseline for tabular data. |
| GridSearchCV | GridSearchCV(pipe, {"clf__C":[0.1,1,10]}, cv=5, n_jobs=-1) |
Searches hyperparameters via cross-validation while respecting the full preprocessing pipeline. |
| classification_report | print(classification_report(y_te, y_pred)) |
Prints precision/recall/F1 per class + macro/weighted averages to understand model trade-offs. |
| confusion_matrix | ConfusionMatrixDisplay.from_predictions(y_te, y_pred) |
Shows counts per true/predicted class so you can spot where errors concentrate. |
| mean_squared_error | rmse = mean_squared_error(y_te, y_hat, squared=False) |
Computes RMSE for regression (same units as target), easier to interpret than raw MSE. |
| model.score | pipe.score(X_te, y_te) |
Returns default score (accuracy/R²) for quick checks — still validate with richer metrics above. |