Web Design and Programming Pt 24

Regex TutorialIn todays Programming tutorial I show you how to grab information from a website using Regular Expressions in PHP. The information was particularly hard to get for the following reasons:

  • It was plain text
  • There were no tags to search for
  • Data I wanted was laid out in an unorganized way
  • I had to search for odd unicode characters like ½

I did it though using Regular Expressions and I’ll show you how.

This is what the data looked like. Actually it wasn’t even this organized. You can see it here

The following video will walk you through the code and hopefully teach you more about Regular Expressions and PHP.

All of the code follows the video. If you have any questions or comments leave them below.

Note: You guys have been asking me what politicians mean when they talk about providing tax cuts to companies that ship jobs overseas. I briefly answer that question as well.

Code From the Video

<html>
<head>
<title><?php echo “Website Scraping”;?></title>
</head>
<body>
<?php
$file = fopen(“http://www.derbylane.com/EntriesResult/SP01-05-2011eRES.HTM”, “r”);
$file2 = fopen(“http://www.derbylane.com/EntriesResult/SP01-05-2011eRES.HTM”, “r”);
while (!feof($file))
{
// \s+(\d{1}\p{N}?)\s+(\d)\s+(\d{1}\p{N}?)\s+(\d)\s+(\d{1}\p{N}?)\s+(\d{1,2}\.\d{2})\s+(\*?\d{1,2}\.\d{2})\s+
preg_match_all (“%^(Derby Lane)\s+([\w]+?)\s+([\w]{3}\s+[\d]{2}\s+[\d]{4})\s+([\w]+)\s+.+Grade\s+([\w])\s+\((\d{3})\)\s+Time:\s+(.*)$%”, fgets($file), $raceinfo, PREG_SET_ORDER);
preg_match_all (“%^([A-Za-z’\s]+)(\d{2}\p{N}?)\s+(\d)\s+(\d)\s+(\d)\s+([^A-Z]*)[A-Z]{1}.*$%”, fgets($file2), $doginfo, PREG_SET_ORDER);
foreach ($raceinfo as $val)
{
echo “Match: “. $val[0].”<br />”;
echo “Location: “. $val[1].”<br />”;
echo “Day: “. $val[2].”<br />”;
echo “Date: “. $val[3].”<br />”;
echo “Time of Day: “. $val[4].”<br />”;
echo “Grade: “. $val[5].”<br />”;
echo “Length: “. $val[6].”<br />”;
echo “Winning Time: “. $val[7].”<br /><br />”;
echo “<hr /><hr />”;
}
foreach ($doginfo as $val2)
{
if (!strlen(strstr($val2[0], “Derby Lane”))>0)
{
// echo “Dog Info: “. $val2[0].”<br />”;
echo “Name: “. $val2[1].”<br />”;
echo “Weight: “. $val2[2].”<br />”;
echo “Post Position: “. $val2[3].”<br />”;
echo “Order Leaving Box(off): “. $val2[4].”<br />”;
echo “Position at Break: “. $val2[5].”<br />”;
$restOfNums = preg_replace(“%\s+%”, ” “, $val2[6]);
$restOfNums = explode(” “, $restOfNums);
if (count($restOfNums) == 6)
{
echo “Break Lead: “. $restOfNums[0] . “<br />”;
echo “Position at Stretch: “. $restOfNums[1] . “<br />”;
echo “Position Dog Finished: “. $restOfNums[2] . “<br />”;
echo “Seconds to Complete Race: “. $restOfNums[3] . “<br />”;
echo “Final Odds: “. $restOfNums[4] . “<br /><br />”;
echo “<hr />”;
} else {
echo “Break Lead: “. $restOfNums[0] . “<br />”;
echo “Position at Stretch: “. $restOfNums[1] . “<br />”;
echo “Stretch Lead: “. $restOfNums[2] . “<br />”;
echo “Position Dog Finished: “. $restOfNums[3] . “<br />”;
echo “Finish Lead: “. $restOfNums[4] . “<br />”;
echo “Seconds to Complete Race: “.$restOfNums[5] . “<br />”;
echo “Final Odds: “. $restOfNums[6] . “<br /><br />”;
echo “<hr />”;
}
}
}
}
fclose($file);
fclose($file2);
exit();
?>
</body>
</html>

Leave a Reply

Your email address will not be published.

Google+