Sabermetrics 2

Sabermetrics BaseballHere I continue my Sabermetrics Baseball tutorial. If you missed part one, check it out first because in it I show you how to get all of your tools. It is here Sabermetrics.

I’m going to introduce a few new SQL functions being CONCAT, COALESCE and HAVING.

We will also calculate the top players based on RBIs, HRs, Batting Average, On Base Percentage, and Slugging Percentage. And, I’ll show you how to pull in data on the players position and their salary.

If you like videos like this tell Google
To make me extra happy, feel free to share it

Code from the Video

TOP 50 RBI
(RBI - Runs Credited to a Batter Based on their at bats)

select b.playerID, 
CONCAT(m.nameFirst, " ", m.nameLast) AS NAME, 
SUM(b.RBI) AS RBI 
FROM Batting b, Master m 
WHERE b.playerID = m.playerID 
GROUP BY playerID 
ORDER BY RBI DESC 
LIMIT 50;

TOP 50 HR

select b.playerID, 
CONCAT(m.nameFirst, " ", m.nameLast) AS NAME, 
SUM(b.HR) AS HR 
FROM Batting b, Master m 
WHERE b.playerID = m.playerID 
GROUP BY playerID 
ORDER BY HR DESC LIMIT 50;

JOKE TOP BATTING AVERAGE

select b.playerID, 
CONCAT(m.nameFirst, " ", m.nameLast) AS NAME, 
(SUM(b.H)/SUM(b.AB)) AS AVG 
FROM Batting b, Master m 
WHERE b.playerID = m.playerID 
GROUP BY playerID 
ORDER BY AVG DESC 
LIMIT 50;

TOP 50 BATTING AVERAGE
(Percentage of ABs that turn into a hit)

select b.playerID, 
CONCAT(m.nameFirst, " ", m.nameLast) AS NAME, 
SUM(b.AB) AS AB, (SUM(b.H)/SUM(b.AB)) AS AVG 
FROM Batting b, Master m 
WHERE b.playerID = m.playerID 
GROUP BY playerID 
HAVING AB > 1000 # Makes Sure Player Has 1000 ABs
ORDER BY AVG DESC 
LIMIT 50;

TOP 50 OBP
(How often a player gets on base)
* COALESCE - If the first value equals null use 0 instead. This is done
because Sacrifice Flys weren't recorded in the early days of baseball

select b.playerID, 
CONCAT(m.nameFirst, " ", m.nameLast) AS NAME, 
SUM(b.AB) AS AB, (SUM(b.H)/SUM(b.AB)) AS AVG, (SUM(b.H)+SUM(b.BB)+SUM(b.HBP))/(SUM(b.AB)+SUM(b.BB)+SUM(b.HBP)+COALESCE(SUM(b.SF), 0)) AS OBP 
FROM Batting b, Master m 
WHERE b.playerID = m.playerID 
GROUP BY playerID 
HAVING AB > 1000 
ORDER BY OBP 
DESC LIMIT 50;

TOP 50 OBP WITH POSITION

select b.playerID, 
CONCAT(m.nameFirst, " ", m.nameLast) AS NAME, 
SUM(b.AB) AS AB, (SUM(b.H)/SUM(b.AB)) AS AVG, (SUM(b.H)+SUM(b.BB)+SUM(b.HBP))/(SUM(b.AB)+SUM(b.BB)+SUM(b.HBP)+COALESCE(SUM(b.SF), 0)) AS OBP,
f.POS AS POS 
FROM Batting b, Master m, Fielding f 
WHERE b.playerID = m.playerID AND b.playerID = f.playerID
GROUP BY playerID 
HAVING AB > 1000 
ORDER BY OBP 
DESC LIMIT 50;

YEARLY OBP WITH POSITION

select b.playerID, 
CONCAT(m.nameFirst, " ", m.nameLast) AS NAME, 
b.AB AS AB, (b.H/b.AB) AS AVG, (b.H+b.BB+b.HBP)/(b.AB+b.BB+b.HBP+COALESCE(b.SF, 0)) AS OBP,
f.POS AS POS,
s.salary,
b.yearID 
FROM Batting b, Master m, Fielding f, Salaries s 
WHERE b.playerID = m.playerID AND b.playerID = f.playerID AND b.yearID = 2002 AND s.playerID = b.playerID AND s.yearID = b.yearID
GROUP BY playerID 
HAVING AB > 300 
ORDER BY OBP 
DESC LIMIT 50;

TOP 50 SLG
(Measure of the power of a hitter)

select b.playerID, 
CONCAT(m.nameFirst, " ", m.nameLast) AS NAME, 
SUM(b.AB) AS AB, (SUM(b.H)/SUM(b.AB)) AS AVG, (SUM(b.H)+(SUM(b.2B)*2)+(SUM(b.3B)*3)+(SUM(b.HR)*4))/SUM(b.AB) AS SLG 
FROM Batting b, Master m 
WHERE b.playerID = m.playerID 
GROUP BY playerID 
HAVING AB > 1000 
ORDER BY SLG 
DESC LIMIT 50;

5 Responses to “Sabermetrics 2”

  1. mr.roshik says:

    Another good one. thanks for sharing

    • admin says:

      I’m glad you liked it. I love to completely change subjects like this while reenforcing my current work. This seems to fit in well as a SQL / MySQL refresher. It will also work right back into the Java Tutorial.

  2. sham says:

    hello derek,
    I got the output of below query after 9 min 33.45 secs, obviously because batting has 95k entries and master has 17k entries, how do i optimise the query?
    For writing an enterp app, if we are in such situation, what should we do to optimise time?

    Select b.playerID, m.nameFirst, sum(b.RBI) FROM Batting b , master m where b.playerID = m.playerID group by playerID order by rbi desc limit 50.

  3. danny says:

    Any ideas why your 51 second query took an hour and 15 minutes on my SSD ultrabook? I duplicated it on another Core i5 machine too with similar results. CPU never broke 40% and disk activity wasn’t high. Plenty of memory available. Is there a setting in mysql that you could point me to?

Leave a Reply

Your email address will not be published.

Google+