Python Notes: 100 Commands, Tips & Tricks for Data Analysis

Python is a versatile language, and its power in data analysis is unmatched. Here are 100 commands, tips, and tricks to help you master data analysis with Python.

1. Basic Python Commands

print(): Output data to the console.
type(): Check the data type of a variable.
len(): Get the length of a string, list, or dictionary.
input(): Get user input.
int(), float(), str(): Convert between data types.

2. Lists and Dictionaries

list.append(): Add an item to the end of a list.
list.extend(): Add multiple items to a list.
list.pop(): Remove and return the last item.
dict.keys(): Get all keys in a dictionary.
dict.values(): Get all values in a dictionary.
dict.items(): Get all key-value pairs.

3. NumPy Essentials

import numpy as np: Import the NumPy library.
np.array(): Create a NumPy array.
np.zeros(): Create an array of zeros.
np.ones(): Create an array of ones.
np.arange(): Create an array with a range of values.
np.linspace(): Create evenly spaced values over a range.
np.reshape(): Change the shape of an array.
np.mean(): Calculate the mean of an array.
np.median(): Calculate the median.
np.std(): Calculate the standard deviation.

4. Pandas for Data Manipulation

import pandas as pd: Import the Pandas library.
pd.read_csv(): Load a CSV file into a DataFrame.
df.head(): View the first 5 rows of a DataFrame.
df.tail(): View the last 5 rows.
df.info(): Summary of the DataFrame.
df.describe(): Statistical summary of numerical columns.
df.shape: Get the number of rows and columns.
df.columns: Get the column names.
df.dtypes: Get the data types of columns.
df['column_name']: Select a single column.
df[['col1', 'col2']]: Select multiple columns.
df.iloc[]: Select rows and columns by index.
df.loc[]: Select rows and columns by label.
df.drop(): Remove rows or columns.
df.rename(): Rename columns.
df.sort_values(): Sort the DataFrame by a column.
df.groupby(): Group data by a column.
df.merge(): Combine two DataFrames (SQL-like join).
df.concat(): Concatenate DataFrames.
df.isnull(): Check for missing values.
df.fillna(): Fill missing values.
df.dropna(): Remove rows with missing values.
df.duplicated(): Check for duplicate rows.
df.drop_duplicates(): Remove duplicate rows.
df.apply(): Apply a function to rows or columns.
df.value_counts(): Count unique values in a column.
df.unique(): Get unique values in a column.
df.pivot_table(): Create a pivot table.
df.sample(): Get a random sample of rows.

5. Data Visualization (Matplotlib & Seaborn)

import matplotlib.pyplot as plt: Import Matplotlib.
plt.plot(): Create a line plot.
plt.scatter(): Create a scatter plot.
plt.bar(): Create a bar chart.
plt.hist(): Create a histogram.
plt.xlabel(), plt.ylabel(): Set axis labels.
plt.title(): Set the plot title.
plt.show(): Display the plot.
import seaborn as sns: Import Seaborn.
sns.boxplot(): Create a box plot.
sns.heatmap(): Create a heatmap.
sns.pairplot(): Plot pairwise relationships.

6. Advanced Tips & Tricks

List Comprehension: [x**2 for x in range(10)].
Lambda Functions: sum = lambda x, y: x + y.
Map Function: list(map(str.upper, ['a', 'b'])).
Filter Function: list(filter(lambda x: x > 5, [1, 6, 2, 8])).
Zip Function: Combine two lists: list(zip(names, scores)).
Enumerate: Get index and value: for i, v in enumerate(list):.
F-strings: f"Value: {val}" for easy formatting.
Handling Large Files: Use chunksize in pd.read_csv().
Setting with Copy Warning: Use .copy() to avoid it.
Optimize Memory: Convert columns to category dtype.
Query Method: df.query('age > 25').
String Methods: df['name'].str.lower().
Datetime Conversion: pd.to_datetime(df['date']).
Extract Date Parts: df['date'].dt.year.
Rolling Windows: df['val'].rolling(window=7).mean().
Shift Function: df['val'].shift(1) for lag analysis.
Correlation Matrix: df.corr().
Save to CSV: df.to_csv('output.csv', index=False).
Save to Excel: df.to_excel('output.xlsx').
Using .at and .iat: Faster than .loc for single values.
Explode: Turn list-like columns into rows: df.explode('col').
Crosstab: pd.crosstab(df['A'], df['B']).
Pipe Method: Chain operations: df.pipe(func1).pipe(func2).
Isin: Filter by multiple values: df[df['A'].isin([1, 2, 3])].
Where Method: df['A'].where(df['A'] > 0, 0).
Cut and Qcut: Bin data: pd.cut(df['age'], bins=3).
Get Dummies: One-hot encoding: pd.get_dummies(df['col']).
Melt: Unpivot a DataFrame: pd.melt(df).
Stack and Unstack: Reshape by index levels.
Lookup with Map: df['cat'] = df['id'].map(mapping_dict).
Chain Methods: df.dropna().groupby('A').mean().
Progress Bar: Use tqdm for long loops.
Profile Your Data: Use pandas_profiling (now ydata-profiling).
Interactive Plots: Use plotly for interactive charts.
Style DataFrames: df.style.background_gradient().
Read from SQL: pd.read_sql(query, engine).
Read from Clipboard: pd.read_clipboard().
Help Command: help(pd.DataFrame) to see documentation.

Conclusion

These 100 commands are just the tip of the iceberg. Python’s ecosystem is vast, and continuous practice is key to becoming a pro at data analysis. Happy coding!