Python Notes: 100 Commands, Tips & Tricks for Data Analysis
AS
Gladson
read
#python
#data-analysis
#pandas
#numpy
#commands
#tips-and-tricks
Python Notes: 100 Commands, Tips & Tricks for Data Analysis
Python is a versatile language, and its power in data analysis is unmatched. Here are 100 commands, tips, and tricks to help you master data analysis with Python.
1. Basic Python Commands
print(): Output data to the console.type(): Check the data type of a variable.len(): Get the length of a string, list, or dictionary.input(): Get user input.int(),float(),str(): Convert between data types.
2. Lists and Dictionaries
list.append(): Add an item to the end of a list.list.extend(): Add multiple items to a list.list.pop(): Remove and return the last item.dict.keys(): Get all keys in a dictionary.dict.values(): Get all values in a dictionary.dict.items(): Get all key-value pairs.
3. NumPy Essentials
import numpy as np: Import the NumPy library.np.array(): Create a NumPy array.np.zeros(): Create an array of zeros.np.ones(): Create an array of ones.np.arange(): Create an array with a range of values.np.linspace(): Create evenly spaced values over a range.np.reshape(): Change the shape of an array.np.mean(): Calculate the mean of an array.np.median(): Calculate the median.np.std(): Calculate the standard deviation.
4. Pandas for Data Manipulation
import pandas as pd: Import the Pandas library.pd.read_csv(): Load a CSV file into a DataFrame.df.head(): View the first 5 rows of a DataFrame.df.tail(): View the last 5 rows.df.info(): Summary of the DataFrame.df.describe(): Statistical summary of numerical columns.df.shape: Get the number of rows and columns.df.columns: Get the column names.df.dtypes: Get the data types of columns.df['column_name']: Select a single column.df[['col1', 'col2']]: Select multiple columns.df.iloc[]: Select rows and columns by index.df.loc[]: Select rows and columns by label.df.drop(): Remove rows or columns.df.rename(): Rename columns.df.sort_values(): Sort the DataFrame by a column.df.groupby(): Group data by a column.df.merge(): Combine two DataFrames (SQL-like join).df.concat(): Concatenate DataFrames.df.isnull(): Check for missing values.df.fillna(): Fill missing values.df.dropna(): Remove rows with missing values.df.duplicated(): Check for duplicate rows.df.drop_duplicates(): Remove duplicate rows.df.apply(): Apply a function to rows or columns.df.value_counts(): Count unique values in a column.df.unique(): Get unique values in a column.df.pivot_table(): Create a pivot table.df.sample(): Get a random sample of rows.
5. Data Visualization (Matplotlib & Seaborn)
import matplotlib.pyplot as plt: Import Matplotlib.plt.plot(): Create a line plot.plt.scatter(): Create a scatter plot.plt.bar(): Create a bar chart.plt.hist(): Create a histogram.plt.xlabel(),plt.ylabel(): Set axis labels.plt.title(): Set the plot title.plt.show(): Display the plot.import seaborn as sns: Import Seaborn.sns.boxplot(): Create a box plot.sns.heatmap(): Create a heatmap.sns.pairplot(): Plot pairwise relationships.
6. Advanced Tips & Tricks
- List Comprehension:
[x**2 for x in range(10)]. - Lambda Functions:
sum = lambda x, y: x + y. - Map Function:
list(map(str.upper, ['a', 'b'])). - Filter Function:
list(filter(lambda x: x > 5, [1, 6, 2, 8])). - Zip Function: Combine two lists:
list(zip(names, scores)). - Enumerate: Get index and value:
for i, v in enumerate(list):. - F-strings:
f"Value: {val}"for easy formatting. - Handling Large Files: Use
chunksizeinpd.read_csv(). - Setting with Copy Warning: Use
.copy()to avoid it. - Optimize Memory: Convert columns to
categorydtype. - Query Method:
df.query('age > 25'). - String Methods:
df['name'].str.lower(). - Datetime Conversion:
pd.to_datetime(df['date']). - Extract Date Parts:
df['date'].dt.year. - Rolling Windows:
df['val'].rolling(window=7).mean(). - Shift Function:
df['val'].shift(1)for lag analysis. - Correlation Matrix:
df.corr(). - Save to CSV:
df.to_csv('output.csv', index=False). - Save to Excel:
df.to_excel('output.xlsx'). - Using
.atand.iat: Faster than.locfor single values. - Explode: Turn list-like columns into rows:
df.explode('col'). - Crosstab:
pd.crosstab(df['A'], df['B']). - Pipe Method: Chain operations:
df.pipe(func1).pipe(func2). - Isin: Filter by multiple values:
df[df['A'].isin([1, 2, 3])]. - Where Method:
df['A'].where(df['A'] > 0, 0). - Cut and Qcut: Bin data:
pd.cut(df['age'], bins=3). - Get Dummies: One-hot encoding:
pd.get_dummies(df['col']). - Melt: Unpivot a DataFrame:
pd.melt(df). - Stack and Unstack: Reshape by index levels.
- Lookup with Map:
df['cat'] = df['id'].map(mapping_dict). - Chain Methods:
df.dropna().groupby('A').mean(). - Progress Bar: Use
tqdmfor long loops. - Profile Your Data: Use
pandas_profiling(nowydata-profiling). - Interactive Plots: Use
plotlyfor interactive charts. - Style DataFrames:
df.style.background_gradient(). - Read from SQL:
pd.read_sql(query, engine). - Read from Clipboard:
pd.read_clipboard(). - Help Command:
help(pd.DataFrame)to see documentation.
Conclusion
These 100 commands are just the tip of the iceberg. Python’s ecosystem is vast, and continuous practice is key to becoming a pro at data analysis. Happy coding!