View Full Version : RSS parsing in php
hey, could you help me out on making one of those news feed scripts?
i suck at php :(
i'm guessing you either parse the html or you're using some type of php rss client or something... like i said, i suck. help?
zubaz
12-19-2003, 03:03 PM
here's the code I use -- the actual function that I call is getnews() sending it either slashdot or shacknews as an argument. I pulled this pretty verbatim from the php.net site on rss feeds (http://us3.php.net/xml), but made a couple adjustments. This returns 6 headlines, and even if the rss feed is broken for some reason (slashdot has some issues, i dont' know if it's my parser or what) it returns 6 blank lines as not to mess with any formatting.
[code:1]
<?php
function StartHandler(&$Parser, &$Elem, &$Attr) {
global $key, $ii, $siter;
$key = $Elem.$ii;
if ($siter=="slashdot") {
$item = "STORY";
} elseif ($siter=="shacknews") {
$item = "ITEM";
}
if ($Elem==$item) {
$ii++;
}
}
function CharacterHandler(&$Parser, &$Line) {
global $data, $key;
$data[$key] = $Line;
}
function EndHandler(&$Parser, &$Elem) {
global $key;
$key='';
}
function getnews($site) {
global $data, $ii, $siter;
$siter = $site;
if ($site=="slashdot") {
$feed = "http://slashdot.org/slashdot.xml";
} elseif ($site=="shacknews") {
$feed = "http://www.shacknews.com/shackfeed.xml";
} else {
echo "no news";
}
$fd = @fopen($feed, "rt");
if (!$fd) {
$feeddata = "<span class=znlink>feed unavailable</span>";
} else {
while (!feof($fd)) {
$feeddata .= fread($fd, 4096);
}
fclose ($fd);
}
$ii=1;
$feeddata = str_replace('&amp;', '%p%', $feeddata);
$sParser = xml_parser_create();
xml_set_element_handler($sParser, 'StartHandler', 'EndHandler');
xml_set_character_data_handler($sParser, 'CharacterHandler');
# Pass the content string to the parser.
if ( !xml_parse($sParser, $feeddata) ) {
$errormsg = "<span class=znlink>feed unavailable</span>";
}
$headlines .= $errormsg;
$xx = 2;
while($xx <= 7) {
if ($siter=="slashdot") {
$linkkey = "URL".$xx;
} elseif ($siter=="shacknews") {
$linkkey = "LINK".$xx;
}
$titlekey = "TITLE".$xx;
$url = $data[$linkkey];
$name = $data[$titlekey];
$name=str_replace("%p%", "&amp;", $name);
$headlines .= "<a class=znlink href=\"$url\">$name</a>";
$xx++;
}
return $headlines;
}
?>
[/code:1]
hey, do you know anything about cronjob? i was talking to some guy about parsing html (before i learned how to parse rss feeds).
open the target page as a file, go through it line by line looking for certain strings that are always associated with a headline link. Reuters has special css classes that they use in their code to do it, caranddriver has some stuff too, they're all unique. You have your script go through the file looking for those things and if it finds a line with the headline in it you parse that shit and get the link and the headline text out of it, then you put it back together.
If you have it do all this on every load it's going to be slow as shit (loading like 8 different pages at once and parsing them) so I have it run on a cronjob every 5 minutes and output the headlines to a text file. Then the epalla mainpage just reads out the text file with some really simple code (because it's already formatted nicely).
i asked him to explain it a bit more...
it's a feature of apache or probably whatever you're using for webserver software called a cronjob that executes the script every so often. A lot of hosting places don't give you access to cronjobs, which is why it was one of my main criteria in selecting a new host for my site.
know anything about this? i'm a newbie when it comes to server-side scripting. client-side has always been my thing 8)
btw, if you wanna see his script in action, check http://www.epalla.com
zubaz
12-22-2003, 07:51 PM
sometimes there's a folder on your hosting account called cron, and you can have scripts periodically run at set intervals. That's about all I know about it, I've never had occasion to use it. But yeah, that would be a pretty good way to do it, even if you were just parsing the RSS and not the entire HTML, cause sometimes that shit times out and causes some problems.
all of my scripts run on my own computer. i use apache 1.3x. any ideas of how i would go about adding this "cron" folder to my server? or would i be better off just googling this stuff?
zubaz
12-22-2003, 09:31 PM
google, that's beyond my knowledge by a good bit.
vBulletin® v3.7.0, Copyright ©2000-2008, Jelsoft Enterprises Ltd.