Here I continue my Sabermetrics Baseball tutorial. If you missed part one, check it out first because in it I show you how to get all of your tools. It is here Sabermetrics.
I’m going to introduce a few new SQL functions being CONCAT, COALESCE and HAVING.
We will also calculate the top players based on RBIs, HRs, Batting Average, On Base Percentage, and Slugging Percentage. And, I’ll show you how to pull in data on the players position and their salary.
If you like videos like this tell Google [googleplusone]
To make me extra happy, feel free to share it
Code from the Video
TOP 50 RBI (RBI - Runs Credited to a Batter Based on their at bats) select b.playerID, CONCAT(m.nameFirst, " ", m.nameLast) AS NAME, SUM(b.RBI) AS RBI FROM Batting b, Master m WHERE b.playerID = m.playerID GROUP BY playerID ORDER BY RBI DESC LIMIT 50; TOP 50 HR select b.playerID, CONCAT(m.nameFirst, " ", m.nameLast) AS NAME, SUM(b.HR) AS HR FROM Batting b, Master m WHERE b.playerID = m.playerID GROUP BY playerID ORDER BY HR DESC LIMIT 50; JOKE TOP BATTING AVERAGE select b.playerID, CONCAT(m.nameFirst, " ", m.nameLast) AS NAME, (SUM(b.H)/SUM(b.AB)) AS AVG FROM Batting b, Master m WHERE b.playerID = m.playerID GROUP BY playerID ORDER BY AVG DESC LIMIT 50; TOP 50 BATTING AVERAGE (Percentage of ABs that turn into a hit) select b.playerID, CONCAT(m.nameFirst, " ", m.nameLast) AS NAME, SUM(b.AB) AS AB, (SUM(b.H)/SUM(b.AB)) AS AVG FROM Batting b, Master m WHERE b.playerID = m.playerID GROUP BY playerID HAVING AB > 1000 # Makes Sure Player Has 1000 ABs ORDER BY AVG DESC LIMIT 50; TOP 50 OBP (How often a player gets on base) * COALESCE - If the first value equals null use 0 instead. This is done because Sacrifice Flys weren't recorded in the early days of baseball select b.playerID, CONCAT(m.nameFirst, " ", m.nameLast) AS NAME, SUM(b.AB) AS AB, (SUM(b.H)/SUM(b.AB)) AS AVG, (SUM(b.H)+SUM(b.BB)+SUM(b.HBP))/(SUM(b.AB)+SUM(b.BB)+SUM(b.HBP)+COALESCE(SUM(b.SF), 0)) AS OBP FROM Batting b, Master m WHERE b.playerID = m.playerID GROUP BY playerID HAVING AB > 1000 ORDER BY OBP DESC LIMIT 50; TOP 50 OBP WITH POSITION select b.playerID, CONCAT(m.nameFirst, " ", m.nameLast) AS NAME, SUM(b.AB) AS AB, (SUM(b.H)/SUM(b.AB)) AS AVG, (SUM(b.H)+SUM(b.BB)+SUM(b.HBP))/(SUM(b.AB)+SUM(b.BB)+SUM(b.HBP)+COALESCE(SUM(b.SF), 0)) AS OBP, f.POS AS POS FROM Batting b, Master m, Fielding f WHERE b.playerID = m.playerID AND b.playerID = f.playerID GROUP BY playerID HAVING AB > 1000 ORDER BY OBP DESC LIMIT 50; YEARLY OBP WITH POSITION select b.playerID, CONCAT(m.nameFirst, " ", m.nameLast) AS NAME, b.AB AS AB, (b.H/b.AB) AS AVG, (b.H+b.BB+b.HBP)/(b.AB+b.BB+b.HBP+COALESCE(b.SF, 0)) AS OBP, f.POS AS POS, s.salary, b.yearID FROM Batting b, Master m, Fielding f, Salaries s WHERE b.playerID = m.playerID AND b.playerID = f.playerID AND b.yearID = 2002 AND s.playerID = b.playerID AND s.yearID = b.yearID GROUP BY playerID HAVING AB > 300 ORDER BY OBP DESC LIMIT 50; TOP 50 SLG (Measure of the power of a hitter) select b.playerID, CONCAT(m.nameFirst, " ", m.nameLast) AS NAME, SUM(b.AB) AS AB, (SUM(b.H)/SUM(b.AB)) AS AVG, (SUM(b.H)+(SUM(b.2B)*2)+(SUM(b.3B)*3)+(SUM(b.HR)*4))/SUM(b.AB) AS SLG FROM Batting b, Master m WHERE b.playerID = m.playerID GROUP BY playerID HAVING AB > 1000 ORDER BY SLG DESC LIMIT 50;
Another good one. thanks for sharing
I’m glad you liked it. I love to completely change subjects like this while reenforcing my current work. This seems to fit in well as a SQL / MySQL refresher. It will also work right back into the Java Tutorial.
hello derek,
I got the output of below query after 9 min 33.45 secs, obviously because batting has 95k entries and master has 17k entries, how do i optimise the query?
For writing an enterp app, if we are in such situation, what should we do to optimise time?
Select b.playerID, m.nameFirst, sum(b.RBI) FROM Batting b , master m where b.playerID = m.playerID group by playerID order by rbi desc limit 50.
Any ideas why your 51 second query took an hour and 15 minutes on my SSD ultrabook? I duplicated it on another Core i5 machine too with similar results. CPU never broke 40% and disk activity wasn’t high. Plenty of memory available. Is there a setting in mysql that you could point me to?
Sorry, but i’m not sure? Maybe the database has changed?