Suppose an association of real estate professionals has reported home sales for 2011. The table contains the current sales by region and the inventory for existing homes.
Rows: 104
Columns: 4
Sales Price:
The price that the home sold at
Region:
Region of the United States. NE= Northeast, MW= Midwest, S= South, W=West
Home Type:
The type of home described by ownership of the property and land surrounding the property
Inventory:
The amount of money that all the items inside the house are currently valued at
2016 data for jockeys as North American Thoroughbred Racing Starters.
Rows: 1,444
Columns: 11
Jockey Name:
Name of the jockey
Starts:
Number of starts in races by the jockey
1st:
Number of times that the jockey won first place
2nd:
Number of times that the jockey won second place
3rd:
Number of times that the jockey won third place
Total $:
Total amount of money won by the jockey for the year
Per Start $:
Amount of money that the jockey makes per start
Win %:
The percentage of time that the jockey wins the race
Top 3:
The total number of times that the jockey wins first, second, or third
Top 3%:
Percentage of times that the jockey wins first, second, or third from their total number of starts
Racing Year:
Year in which these jockeys’ statistics are applicable
http://www.equibase.com/
2016 data for trainers as North American Thoroughbred Racing Starters.
Rows: 5,446
Columns: 11
Trainer Name:
Name of the trainer
Starts:
Number of starts in races by the trainer
Starters:
The amount of times the trainer was a starter
1st:
Number of times that the trainer won first place
2nd:
Number of times that the trainer won second place
3rd:
Number of times that the trainer won third place
Total $:
Total amount of money won by the trainer for the year
Per Start $:
Amount of money that the trainer makes per start
Win %:
The percentage of time that the trainer wins the race
Top 3%:
Percentage of times that the trainer wins first, second, or third from their total number of starts
Racing Date:
Year in which these trainers’ statistics are applicable
http://www.equibase.com/
2017 data for jockeys as North American Thoroughbred Racing Starters.
Rows: 1,418
Columns: 11
Jockey Name:
Name of the jockey
Starts:
Number of starts in races by the jockey
1st:
Number of times that the jockey won first place
2nd:
Number of times that the jockey won second place
3rd:
Number of times that the jockey won third place
Total $:
Total amount of money won by the jockey for the year
Per Start $:
Amount of money that the jockey makes per start
Win %:
The percentage of time that the jockey wins the race
Top 3:
The total number of times that the jockey wins first, second, or third
Top 3%:
Percentage of times that the jockey wins first, second, or third from their total number of starts
Racing Year:
Year in which these jockeys’ statistics are applicable
http://www.equibase.com/
2017 data for trainers as North American Thoroughbred Racing Starters.
Rows: 5,266
Columns: 11
Trainer Name:
Name of the trainer
Starts:
Number of starts in races by the trainer
Starters:
The amount of times the trainer was a starter
1st:
Number of times that the trainer won first place
2nd:
Number of times that the trainer won second place
3rd:
Number of times that the trainer won third place
Total $:
Total amount of money won by the trainer for the year
Per Start $:
Amount of money that the trainer makes per start
Win %:
The percentage of time that the trainer wins the race
Top 3%:
Percentage of times that the trainer wins first, second, or third from their total number of starts
Racing Date:
Year in which these trainers’ statistics are applicable
http://www.equibase.com/
Data regarding the price per share of Amazon stock from its initial public offering (IPO) in May 1997 to September 2017.
Rows: 5,121
Columns: 7
Date:
Date of observation
Open:
Price per share when the stock market opened on the specified date
High:
The maximum price per share reached on the specified date
Low:
The minimum price per share reached on the specified date
Close:
Price per share when the stock market closed on the specified date
Log Close:
The log transformed value of the close variable
Volume:
The number of shares that changed hands on the specified date
http://www.macrotrends.net/stocks/charts/AMZN/prices/amazon-inc-stock-price-history
https://opendatacommons.org/licenses/odbl/1.0/
Information about several canned beers brewed in the U.S. and the breweries where they were brewed.
Rows: 2,397
Columns: 4
id:
The ID number of each individual beer
abv:
Alcohol by volume
ibu:
International Bitterness Units
name:
Name of the beer
style:
Style of the beer
ounces:
Volume of beer in the can, in ounces
brewery id:
The ID number of each brewery brewery
name:
Name of the brewery that the specified beer was brewed
city:
City where the brewery is located
state:
State where the brewery is located
https://www.kaggle.com/nickhould/craft-cans
Data compiled by Dr. Thomas W. Schoener on bill size ratios for hundreds of different bird species.
Rows: 410
Columns: 2
Species:
Species name
Bill Ratio:
Ratio of the largest to smallest bill in each population
https://www.seattlecentral.edu/qelp/sets/034/034.html
Data regarding the distribution of expenditures for the California Department of Developmental Services (DDS).
Rows: 1,000
Columns: 6
Id:
ID number of the consumer
Age Group:
The age range the consumer falls into(6 pre - defined Age Groups)
Age:
The age of the consumer
Gender:
The gender of the consumer
Expenditures:
The amount of funding allocated to the consumer from the DDS
Ethnicity:
The ethnicity of the consumer
http://www.amstat.org/publications/jse/v22n1/mickel/paradox_data.csv
Data regarding campus crime.
Rows: 20
Columns: 4
Number of Crimes:
The number of crimes as applicable to the school.
Number of Police:
The number of police officers at the school.
Total Enrollment:
Total number of students enrolled at the school.
Private school:
If yes =1 and no=0.
https://ope.ed.gov/campussafety/#/
Nutritional data for 77 different breakfast cereals.
Note that this is a real data set and contains missing data or cells for some of the variables.
Rows: 77
Columns: 15
Name:
Name of cereal
mfr:
Manufacturer of cereal
A:
American Home Food Products;
G:
General Mills
K:
Kelloggs
N:
Nabisco
P:
Post
Q:
Quaker Oats
R:
Ralston
Purinatype:
Cold or Hot
calories:
calories per serving
protein:
grams of protein
fat:
grams of fat
sodium:
milligrams of sodium
fiber:
grams of dietary fiber
carbo:
grams of complex carbohydrates
sugars:
grams of sugars
potass:
milligrams of potassium
vitamins:
vitamins and minerals - 0, 25, or 100, indicating the typical percentage of FDA
recommendedshelf:
display shelf (1, 2, or 3, counting from the floor)
weight:
weight in ounces of one serving
cups:
number of cups in one serving
https://www.kaggle.com/jeandsantos/breakfast-cereals-data-analysis-and-clustering/data
Information regarding the CO2 emissions in thousands of metric tons from nearly every country for the years 1960—2014.
Rows: 54
Columns: 266
Year:
The year the CO2 emissions were recorded
Country Name:
The name of the country being measured
http://data.un.org/Data.aspx?q=emissions&d=WDI&f=Indicator_Code%3aEN.ATM.CO2E.PC
Data regarding the satisfaction of employees at a company.
Rows: 15,000
Columns: 12
employee_id:
The ID number associated with the individual employee
satisfaction_level:
How satisfied the employee is in their position (scale of 0 to 1)
last_evaluation_score:
How management rated employee performance during the last evaluation (scale of 0 to 1)
number_of_projects:
The number of projects an employee is currently working on
average_monthly_hours:
The average number of hours the employee works in a month
years_spent_at_company:
The number of years the employee has worked at the company
work_accident:
A binary variable that indicates whether the employee experienced an accident at work
left_company:
A binary variable that indicates whether an employee left the company
promotion_in_last_5_years:
A binary variable that indicates whether an employee received a promotion in the last 5 years
department:
The department that the employee works in
salary:
The level of the employee’ s salary(low, medium, high)
salary_range:
The dollar range for the salary levels
https://www.kaggle.com/ludobenistant/hr-analytics
https://creativecommons.org/licenses/by-sa/4.0/legalcode
Data for 50 Exchange-Traded Funds (ETF) and their earning factors.
Rows: 50
Columns: 4
ETF:
The ID number for the ETF
Share Price ($):
The cost of one share
Divided Per Share ($):
Payment that the company will give to shareholders for each share they possess
Dividend Yield (%):
The percentage of the share cost that the shareholder receives (dividend per share/ share price)
2013—2014 U.S. State data regarding high school completion rates, average teacher salary, student-to-teacher ration, and state expenditure per student.
Rows: 50
Columns: 5
State:
State in the United States
Completion Rate:
The 4-year ACGR is the number of students who graduate in 4 years with a regular high school diploma divided by the number of students who form the adjusted cohort for the graduating class. This number has been rounded to the nearest whole number.
Average Teacher's Salary ($):
The average salary of teachers in the state for the academic year 2013-2014.
Pupil/Teacher Ratio:
The ratio of total number of students to total number of teachers for the academic year 2013-2014.
Expenditure per Student ($):
The amount of money spent per student during the academic year 2013-2014.
https://nces.ed.gov/ccd/tables/ACGR_RE_and_characteristics_2013-14.asp
Assorted farmland available for purchase from 2013—2018.
Rows: 496
Columns: 7
State:
U.S. state
County:
County where farmland is available
Size of Land:
The amount of land available (in acres)
Sale Price:
Price that the farmland is listed at
Price per Acre (approx.):
The total price of the farmland divided by the number of acres
Land Details:
A description of the available farmland including information such as the quality of the soil and the number of tillable acres
Date:
Month and year that the farmland was available for purchase
https://nces.ed.gov/ccd/tables/ACGR_RE_and_characteristics_2013-14.asp
Averages by year and species for data collected on Darwin's finches on Daphe Major Island.
Rows: 80
Columns: 5
Year
Species:
Species of finch, genus is Geospiza
Beak Length:
Beak length, in millimeters
Beak Depth:
Beak depth, in millimeters
Beak Width:
beak width, in millimeters
A sample of data collected on Darwin's finches on Daphe Major Island.
Rows: 100
Columns: 11
Band:
Refers to an individual's identity, more specifically, the number on a metal leg band it was given
Species:
Species name
Sex:
Male, female, or unknown. The reason for the "unknown" category is that males start their lives looking like females. After one or more years they molt into a plumage with some black feathering that indicates they are males.
First adult year:
The year after the individual hatched from an egg
Last Year:
The last year of that individual's life
Weight (g):
Weight, in grams
Wing (mm):
Wing length, in millimeters
Tarsus (mm):
Tarsus length (a part of the leg), in millimeters
Beak Length (mm):
Beak length, in millimeters
Beak Depth (mm):
Beak depth, in millimeters
Beak Width (mm):
Beak width, in millimeters
Data regarding high school completion and crime rates across U.S. states in 2014.
Rows: 50
Columns: 5
State:
U.S. state
High School Completion Rate (Adjusted Cohort Graduation Rate):
The 4-year ACGR is the number of students who graduate in 4 years with a regular high school diploma divided by the number of students who form the adjusted cohort for the graduating class.” This number has been rounded to the nearest whole number.
Crime Rate (per 100,000):
Rate of violent and property crimes per 100,000 people
Violent crimes (per 100,000):
Rate of violent crimes (murder, rape, robbery, aggravated assault) per 100,000 people
Property crimes (per 100,000):
Rate of property crimes per 100,000
https://www.ucr.fbi.gov/crime-in-the-u.s/2014; https://nces.ed.gov/ccd/
A data set regarding the distribution of housefly wing lengths.
Rows: 100
Columns: 1
Length:
wing length (X.1mm)
https://www.seattlecentral.edu/qelp/sets/057/057.html
Data on vehicle fuel economy for model years 1984—2019.
Rows: 38,693
Columns: 11
city:
city miles per gallon
cylinders:
number of cylinders in engine
displ:
engine displacement in liters
drive:
drive axle type
fuelType:
type of fuel
highway:
highway miles per gallon
make:
manufacturer
model:
model name
trans:
type of transmission
VClass:
vehicle size class
year:
model year
https://www.seattlecentral.edu/qelp/sets/057/057.html
A data set containing selected statistics for Major League Baseball teams 1962—2012.
Rows: 1,232
Columns: 16
Team:
The team name abbreviation
League:
The league of the MLB that the team played in
Year:
The year associated with the statistics
RS:
Runs scored
RA:
Runs Allowed
RD:
Run Differential
W:
Number of wins
OBP:
On-base percentage
SLG:
Slugging percentage
BA:
Batting Average
Playoffs:
A binary variable that indicates whether or not a team made the playoffs
RankSeason:
Team ranking at the end of the regular season
RankPlayoffs:
Team ranking at the end of the post-season
G:
Number of games played
OOBP:
Opponent on-base percentage
OSLG:
Opponent slugging percentage
www.baseball-reference.com
Information about properties for sale in three subdivisions of Mount Pleasant, South Carolina, in the year 2017.
Rows: 245
Columns: 24
ID:
The property ID number
List Price:
The price the owner is selling the property for
Duplex:
Whether the property is a duplex or not
Bedrooms:
The number of bedrooms
Baths – Total:
Total number of bathrooms
Baths – Full:
Number of full bathrooms
Baths – Half:
Number of half bathrooms
Stories:
Number of stories
Subdivision:
The subdivision the property is in
Square Footage:
The estimated floor area inside the house
Year Built:
The year the house was constructed
Acreage:
The size of the lot
New Owned:
Whether the house has been lived in previously
House Style:
The type of property(traditional, condo, ranch, etc.)
Covered Parking Spots:
The number of covered parking spots included with the property
Misc.Exterior:
Miscellaneous exterior features
Has Pool:
Whether the property has a private pool or not
Has Dock:
Whether the property has a private dock or not
Fenced Yard:
Whether the property has a fenced - in yard or not
Screened Porch:
Whether the property has a screened porch or not
Amenities:
Amenities included with the property
Golf Course:
Whether the property is located on a golf course or not
Fireplace:
Whether the property has a fireplace or not
Number of Fireplaces:
The number of fireplaces
Data gathered by the Organisation for Economic Co-operation and Development (OECD) regarding the economic strength and well-being of its 35 member countries as well as 3 prominent non-member countries (Brazil, Russia, and South Africa).
Rows: 38
Columns: 25
country:
Name of the country
percent_of_houses_no_facilities:
Percent of households in the country that lack basic facilities
percent_of_income_spent_on_housing:
Average percent of household income spent on housing
rooms_per_person:
The average number of rooms per person residing in a household
household_net_adj_disposable_income:
The amount a household has to spend after income taxes
household_net_financial_wealth:
The net worth of a household (assets minus liabilities)
percent_labor_market_insecurity:
Expected earnings lost, measured as the percentage of the previous earnings, associated with unemployment.
employment_rate:
Percentage of working age population (15 to 64) that is employed
long_term_unemployment_rate:
Percentage of working age population (15 to 64) that has been unemployed for longer than 27 weeks.
personal_earning_per_year:
The total amount of income earned annually.
quality_of_support:
The quality of the social support network (friends, family, etc.)
percent_of_pop_finish_highschool:
The percentage of the population between the ages of 25 and 64 that holds at least one upper secondary degree
stats.oecd.org
U.S. census data regarding the population in each state and the population change from 2010—2016.
Rows: 51
Columns: 5
Geographic Area:
All 50 states and the Distric of Columbia
April 1, 2010:
Population estimate for each state in 2010
July 1, 2016:
Population estimate for each state in 2016
Number:
Numerical value for the population change from 2010 to 2016 for each state
Percent:
Percent change from 2010 to 2016 for each state
https://www.census.gov/data/tables/2016/demo/popest/state-total.html#ds
A data set that provides employment information regarding employees in the San Francisco area for the year 2014.
Rows: 22,334
Columns: 13
Id:
The ID number assigned to the employee
EmployeeName:
The name of the employee
JobTitle:
The title of the position that the employee holds
BasePay:
The base annual salary (in dollars) that the employee received
OvertimePay:
The amount (in dollars) the employee received in overtime pay
OtherPay:
The amount (in dollars) the employee received in payment from bonuses and other pay
Benefits:
The amount (in dollars) the employee received in the form of companybenefits
TotalPay:
The total amount (in dollars) the employee received throughout theyear not including benefits
TotalPayBenefits:
The total amount (in dollars) the employee receivedthroughout the year including benefits
LogTotalPayBenefits:
The log transformation of the TotalPayBenefits variable
Year:
The year the data was recorded
Agency:
The location the data was gathered from
Status:
Full-time or part-time
https://www.kaggle.com/kaggle/sf-salaries
https://creativecommons.org/publicdomain/zero/1.0/legalcode
A data set comparing students’ predicted college GPAs from their SAT scores to their actual college GPAs.
Rows: 30
Columns: 8
Student:
The ID Number associated with each student
SAT Verbal:
SAT Score earned on the Verbal portion of the test
SAT Math:
SAT score earned on the Math portion of the test
SAT Total:
Combined score earned on both Verbal and Math portion of the test
College GPA:
Actual college GPA
Predicted GPA:
The college GPA predicted considering the student’s SAT score
Error:
The difference between the actual and predicted GPAs
Error Squared:
The square of this error
This data set considers a sample of shops from a major city and compares the average amount of return the store receives from the number of independent customers from different households.
Rows: 30
Columns: 4
Shop (The ID number associated with the individual shops)
Location
Annual Return
Number of Households
Data related to the Supplemental Nutrition Assistance Program (SNAP).
Rows: 40
Columns: 3
Shop:
Shop ID Number
Location:
Type of location in the city or surrounding area that the shop is located in
Annual Return (Thousands of Dollars):
The profit that the store receives each year
Number of Households (Thousands):
The number of customers that buy from the shop represented by the number of independent households
Information on the closing prices of four stocks—Amazon, Starbucks, Coca-Cola, and S&P 500—over the years 2000—2017.
Rows: 4,528
Columns: 16
date
close:
the closing price of the stock on that day
Price Change:
difference in closing price compared to the day before
Return:
percent change in closing price compared to the day before
https://www.macrotrends.net/stocks/charts
Self-reported health and lifestyle data gathered from 30 college freshmen and sophomores.
Rows: 30
Columns: 5
Sleep:
Reported hours of sleep on a typical weekday
Studying:
Reported hours of studying on a typical weekday
Calories:
Reported calories consumed on a typical weekday
Exercise:
Reported hours of exercise on a typical weekday
Social Media:
Reported hours spent on social media on a typical weekday
Information on football team statistics for every Super Bowl played from the years 1967 to 2017.
Rows: 55
Columns: 38
Date:
The date the game was played on
SB:
The roman numeral denoting the name and number of the The Big Game
Winner:
The team that won the The Big Game that year
Winner_Pts:
The amount of points the winning team scored in the game
Winner_First Downs:
The number of first downs the winning team earned during the game
Winner_Rush Attempts:
The amount of times the winning team tried to run the footballduring the game
Winner_Rushing Yards:
The number of yards the winning team gained by running the football during the game
Winner_Rushing TDs:
The number of touchdowns the winning team scored by running the football during the game
Winner_Fumbles:
The amount of times the winning team fumbled the football
Winner_Fumbles Lost:
The amount of times the winning team fumbled the football and turned possession over to their opponent
Winner_Pass Attempts:
The amount of times the winning team tried to pass the football during the game
Winner_PassesCompleted:
The amount of times the winning team passed the ball and made a successful catch
Winner_Passing Yards:
The number of yards the winning team gained by passing the football during the game
Winner_Passing TDs:
The number of touchdowns the winning team scored by passing the football during the game
Winner_Interceptions:
The amount of times the winning team intercepted a pass attempt made by their opponent
Winner_Total Yards:
The total number of yards the winning team gained by either rushing or passing the football during the game
Winner_Time ofPossession:
The amount of time the game clock ran when the winning team had possession of the football
Loser:
The team that lost the Super Bowl that year
Loser_Pts:
The amount of points the losing team scored in the game
Loser_First Downs:
The number of first downs the losing team earned during the game
Loser_Rush Attempts:
The amount of times the losing team tried to run the football during the game
Loser_Rushing Yards:
The number of yards the losing team gained by running the football during the game
Loser_Rushing TDs:
The number of touchdowns the losing team scored by running the football during the game
Loser_Fumbles:
The amount of times the losing team fumbled the football
Loser_Fumbles Lost:
The amount of times the losing team fumbled the football and turned possession over to their opponent
Loser_Pass Attempts:
The amount of times the losing team tried to pass the football during the game
Loser_Passes Completed:
The amount of times the losing team passed the ball and made a successful catch
Loser_Passing Yards:
The number of yards the losing team gained by passing the football during the game
Loser_Passing TDs:
The number of touchdowns the losing team scored by passing the football during the game
Loser_Interceptions:
The amount of times the losing team intercepted a pass attempt made by their opponent
Loser_Total Yards:
The total number of yards the losing team gained by either rushing or passing the football during the game
Loser_Time ofPossession:
The amount of time the game clock ran when the losing team had possession of the football
MVP:
The player that was voted the Super Bowl MVP
Stadium:
The stadium the game was played in
City:
The city the game was played in
State:
The state the game was played in
Coin Toss Result:
Whether the coin toss resulted in heads or tails
Coin Toss Winner:
The team that won the coin toss
https://www.pro-football-reference.com/
Comparison of tuition rates for 400 colleges and universities for the 2015—2016 school year.
Rows: 400
Columns: 4
Name:
Name of college or university with city and state
In-state tuition:
cost to attend institution as a resident of the state the school is located
Out-of-state tuition:
cost to attend institution as a non-resident of the state the school is located
Type:
public (primarily funded by the state government) or private (primarily not funded by the state government)
https://www.collegetuitioncompare.com/
A data set containing information regarding nearly every county in the United States for 2010.
Rows: 3,143
Columns: 94
fips:
The FIPS county code
name_16:
The name of the county
County:
The county name and the state it is in
Less.Than.High.School:
The percentage of the population 18-years old or older with less than a high school education
At.Least.High.School.Diploma:
The percentage of the population 18-years old or older with at least a high school diploma or GED
At.Least.Bachelor.s.Degree:
The percentage of the population 25-years old or older with at least a Bachelor’s degree
Graduate.Degree:
The percentage of the population 25-years old or older with at least a Master’s degree
School.Enrollment:
School enrollment percentage for the population 3-years old and older
Median.Earnings.2010.dollars:
The median annual income for an individual normalized to the value of a dollar in 2010
White.not.Latino.Population:
The percentage of the county population that identifies as Caucasian with no Latino heritage
African.American.Population:
The percentage of the county population that identifies as African American
Native.American.Population:
The percentage of the county population that identifies as Native American
Asian.American.Population:
The percentage of the county population that identifies as Asian American
Population.some.other.race.or.races:
The percentage of the county population that identifies with another ethnicity or multiples ethnicities
Latino.Population:
The percentage of the county population that identifies as Latino
Children.Under.6.Living.in.Poverty:
The percentage of children under the age of 6 that are living in poverty
Adult.65.and.Older.Living.in.Poverty:
The percentage of adults aged 65 and older that are living in poverty
Total.Population:
The total county population
Kirkegaard, E. O. W. (2017, April 7). Inequality across US counties: an S factor analysis. Retrieved from osf.io/cknjr
Data set containing the average U.S. gas price (dollars per gallon) from 1991—2015.
Rows: 25
Columns: 4
Year:
Year (from 1991-2015)
Average US Gas Price:
Average US gas price throughout the year
2 Period Moving Average:
The average price of gas for a 2-year period. It is taken from dividing the price of gas from the previous year and current year by two.
3 Period Moving Average:
The average price of gas for a 3-year period. It is taken from dividing the price of gas from the previous two years and current year by three.
Source: www.data.bls.gov
Violent crime rates for every state in the U.S. from 1989—2014.
Rows: 26
Columns: 52
Year:
The year the crime rate is associated with
Alabama:
The number of reported offenses (i.e. murder, rape, robbery, and aggravated assault) per 100,000 individuals in Alabama by year
...
Wyoming:
The number of reported offenses (i.e. murder, rape, robbery, and aggravated assault) per 100,000 individuals in Wyoming by year
https://www.ucrdatatool.gov/