import pandas as pd
from nba_api.stats.static import players
from nba_api.stats.endpoints import commonplayerinfo, playercareerstats
from requests import ReadTimeout
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
all_players = players.get_players()
player_names = ['LeBron James', 'Kevin Durant', 'Stephen Curry', 'Giannis Antetokounmpo', 'Kawhi Leonard',
'Luka Doncic', 'James Harden', 'Anthony Davis', 'Joel Embiid', 'Nikola Jokic',
'Damian Lillard', 'Kyrie Irving', 'Russell Westbrook', 'Chris Paul', 'Bradley Beal',
'Donovan Mitchell', 'Zach LaVine', 'Jayson Tatum', 'Devin Booker', 'Trae Young',
'Jimmy Butler', 'Khris Middleton', 'Paul George', 'Ben Simmons', 'Deandre Ayton']
selected_players = [player for player in all_players if player['full_name'] in player_names]
player_data = []
for player in selected_players:
player_id = player['id']
try:
player_info = commonplayerinfo.CommonPlayerInfo(player_id=player_id, timeout=10).common_player_info.get_dict()
player_career_stats = playercareerstats.PlayerCareerStats(player_id=player_id, timeout=10).season_totals_regular_season.get_dict()
except ReadTimeout:
print("Request Timed Out")
continue
player_stats = {**player_info, **player_career_stats}
player_data.append(player_stats)
player_df = pd.DataFrame(player_data)
player_df.to_csv("files/players.csv")
X = player_df[['PTS', 'REB', 'AST', 'STL', 'BLK']]
y = player_df['ALL_STAR']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# create a logistic regression model and fit it to the training data
lr_model = LogisticRegression(max_iter=1000)
lr_model.fit(X_train, y_train)
y_pred = lr_model.predict(X_test)
print(classification_report(y_test, y_pred))
Hacks
Early Seed award
- Add this Blog to you own Blogging site.
- Have all lecture files saved to your files directory before Tech Talk starts. Have data.csv open in vscode. Don't tell anyone. Show to Teacher.
AP Prep
- Add this Blog to you own Blogging site. In the Blog add notes and observations on each code cell.
- In blog add College Board practice problems for 2.3.
The next 4 weeks, Teachers want you to improve your understanding of data. Look at the blog and others on Unit 2. Your intention is to find some things to differentiate your individual College Board project.
- Create or Find your own dataset. The suggestion is to use a JSON file, integrating with your PBL project would be Fambulous.
When choosing a data set, think about the following:- Does it have a good sample size? - Is there bias in the data?
- Does the data set need to be cleaned?
- What is the purpose of the data set?
- ...
- Continue this Blog using Pandas extract info from that dataset (ex. max, min, mean, median, mode, etc.)
Hack Helpers
Here is how Mort started on this assignment by asking ChatGPT ... Regarding Python Pandas, what are some data sets that would be good for learning Pandas?
There are many data sets that are suitable for learning pandas, depending on your interests and the skills you want to develop. Here are some suggestions:- Titanic Dataset: This is a classic dataset for data analysis and machine learning, and is often used as an introduction to pandas. The dataset contains information about passengers on the Titanic, including their demographics, ticket class, and survival status.
Iris Dataset: This is another classic dataset that is often used in machine learning and data analysis courses. The dataset contains information about iris flowers, including their sepal length, sepal width, petal length, and petal width.
NBA Player Stats: This dataset contains information about NBA players, including their stats for various games, their positions, and their salaries. It is a great dataset for learning how to clean and manipulate data using pandas.
Housing Prices Dataset: This dataset contains information about housing prices in a particular city or region, including variables such as the number of bedrooms, square footage, and price. It is a great dataset for learning how to manipulate and analyze numerical data using pandas.
COVID-19 Dataset: This dataset contains information about the COVID-19 pandemic, including the number of cases, deaths, and recoveries in various countries and regions. It is a great dataset for learning how to manipulate and analyze time-series data using pandas.
World Bank Dataset: This dataset contains information about economic indicators from various countries, such as GDP, population, and poverty rates. It is a great dataset for learning how to manipulate and analyze large datasets using pandas.
These are just a few examples of the many datasets that are suitable for learning pandas. The best dataset for you will depend on your interests and the specific skills you want to develop.
Follow up question, "where can I find Titanic data set?"
The Titanic dataset is a popular dataset for data analysis and machine learning, and it can be found on various websites and data repositories. Here are a few places where you can find the Titanic dataset:- Kaggle: The Titanic dataset is available on Kaggle, which is a popular platform for data scientists and machine learning enthusiasts. You can download the dataset from the Kaggle website after creating an account.
UCI Machine Learning Repository: The Titanic dataset is also available on the UCI Machine Learning Repository, which is a collection of datasets that are commonly used for machine learning research. You can download the dataset from the UCI Machine Learning Repository website.
Seaborn library: If you have the Seaborn library installed in your Python environment, you can load the Titanic dataset directly from the library using the following code:
Use Pandas to clean the data. Most analysis, like Machine Learning or even Pandas in general like data to be in standardized format. This is called 'Training' or 'Cleaning' data.