May 30 2010

Parsing Flickr Feed with PHP tutorial

Category: Php,Spiders & web botsGiulio Pons @ 10:48 pm

I’ve spent about 30 minutes to find a javascript embed to print out a custom thumbs list of flickr photos, but I didn’t find anything clean enaugh… and I’ve not enaugh time to spend to read the flickr’s API’s…
So, I’ve searched for the feed of the user, and I’ve found it at the bottom of flickr’s pages: I’ve decided to grab the feed and parse it to get my custom gallery.
If you’re looking for the flickr’s feed it’s here, at the bottom:

flickr feed

If you click on the feed link you will open the feed (this can be shown in different ways depending on your browser). If you look the URL of the feed you’ve clicked, you can see that it contains the “id” of the feed on flickr database, here is the id:

feed id

Take the id. And then use this php function to grab and create the thumbs list, this code will simply output anchors and images, so you have to use css to customize it as you want:

function attr($s,$attrname) { // return html attribute
	preg_match_all('#\s*('.$attrname.')\s*=\s*["|\']([^"\']*)["|\']\s*#i', $s, $x);
	if (count($x)>=3) return $x[2][0]; else return "";
}

// id = id of the feed
// n = number of thumbs
function parseFlickrFeed($id,$n) {
	$url = "http://api.flickr.com/services/feeds/photos_public.gne?id={$id}&lang=it-it&format=rss_200";
	$s = file_get_contents($url);
	preg_match_all('#<item>(.*)</item>#Us', $s, $items);
	$out = "";
	for($i=0;$i<count($items[1]);$i++) {
		if($i>=$n) return $out;
		$item = $items[1][$i];
		preg_match_all('#<link>(.*)</link>#Us', $item, $temp);
		$link = $temp[1][0];
		preg_match_all('#<title>(.*)</title>#Us', $item, $temp);
		$title = $temp[1][0];
		preg_match_all('#<media:thumbnail([^>]*)>#Us', $item, $temp);
		$thumb = attr($temp[0][0],"url");
		$out.="<a href='$link' target='_blank' title=\"".str_replace('"','',$title)."\"><img src='$thumb'/></a>";
	}
	return $out;
}

// usage example:
echo parseFlickrFeed("16664181@N00",9);
// you have to use css to customize it

Like here:
flickr thumbs css

This code will be addedd to the next version of Mini Bots Class.

Share

Tags: , , , , , , , , ,


Feb 15 2010

Mixing bots to gain new services

Category: Php,Spiders & web botsGiulio Pons @ 12:39 pm

Spiders and bots let you take services from other web sites, this could be very cool, but also this could become a problem (you are using stuff made from other people, is it correct? they know what you’re doing, are there any bandwidth problems you can cause? are your bots ok with copyright?).

Well let’s go over all this problems and try to make spider’s work even more cool: you can mix two or more spiders to create something new, in this example I’ve mixed geographic ip reference bot and meteo bot to get a meteo service localized for the user who connect at your site.
This is a geographic meteo as the ones you can find on smartphones.

Have you any ideas about other mix you can do? You can grab restourants and show localized restourants, shops… There are many applications that do this on iPhone… the problem is, how much good is the result, and this depends on how good are the sources. But this geo mixes already sounds old, we have to find new mixex.

Share

Tags: , , , , ,


Jan 12 2010

Bot that retrieves url meta data and other infos

Category: Php,Spiders & web botsGiulio Pons @ 2:55 pm

From a given url this function retrieves page title, meta description, keywords, favicon, and an array of 5 images to use for links. It call file_get_contents and then make some regular expression job.

This function is included in the Mini Bots Class.

print_r(getLinksInfo("http://www.rockit.it/articolo/825/nada-studio-report-quando-nasce-una-canzone"));

function getLinksInfo($url) {
	$web_page = file_get_contents($url);

	$data['keywords']="";
	$data['description']="";
	$data['title']="";
	$data['favicon']="";
	$data['images']=array();

	preg_match_all('#<title([^>]*)?>(.*)</title>#Uis', $web_page, $title_array);
	$data['title'] = $title_array[2][0];
	preg_match_all('#<meta([^>]*)(.*)>#Uis', $web_page, $meta_array);
	for($i=0;$i<count($meta_array[0]);$i++) {
		if (strtolower(attr($meta_array[0][$i],"name"))=='description') $data['description'] = attr($meta_array[0][$i],"content");
		if (strtolower(attr($meta_array[0][$i],"name"))=='keywords') $data['keywords'] = attr($meta_array[0][$i],"content");
	}
	preg_match_all('#<link([^>]*)(.*)>#Uis', $web_page, $link_array);
	for($i=0;$i<count($link_array[0]);$i++) {
		if (strtolower(attr($link_array[0][$i],"rel"))=='shortcut icon') $data['favicon'] = makeabsolute($url,attr($link_array[0][$i],"href"));
	}
	preg_match_all('#<img([^>]*)(.*)/?>#Uis', $web_page, $imgs_array);
	$imgs = array();
	for($i=0;$i<count($imgs_array[0]);$i++) {
		if ($src = attr($imgs_array[0][$i],"src")) {
			$src = makeabsolute($url,$src);
			if (getRemoteFileSize($src)>15000) array_push($imgs,$src);
		}
		if (count($imgs)>5) break;
	}
	$data['images']=$imgs;

	return $data;
}

Here is the output:

Array
(
    [keywords] => Nada
    [description] => (Nada e il compagno Gerri Manzoli, foto d archivio) Nada &egrave; al Naural HeadQuarter di Ferrara per la registrazione del suo ultimo album in studio, il ventitreesimo, un nuovo capitolo che segna un ulteriore punto nella sua carriera da musicista, iniziata da giovanissima alla fine dei 60. Il titolo non &egrave; stato ancora scelto, cos&igrave; come la data d uscita. Ma possiamo anticiparvi...
    [title] => Nada Studio report - Quando nasce una canzone
    [favicon] => http://www.rockit.it/favicon.ico
    [images] => Array
        (
            [0] => http://ww2.rockit.it/rockit/immagini/Nadain2.jpg
            [1] => http://ww2.rockit.it/rockit/immagini/NadaIn3.jpg
        )

)

And here there are the used functions:

function attr($s,$attrname) {
		//retrn html attribute
		preg_match_all('#\s*('.$attrname.')\s*=\s*["|\']([^"\']*)["|\']\s*#i', $s, $x);
		if (count($x)>=3) return $x[2][0];
		return "";
	}

function makeabsolute($url,$link) {
	if (strpos( $link,"http://")===0 ) return $link;
	$p = parse_url($url);
	if (strpos( $link, "/")===0) return "http://".$p['host'].$link;
	return str_replace(substr(strrchr($url, "/"), 1),"",$url).$link;
}

function getRemoteFileSize($url) {
	if (substr($url,0,4)=='http') {
		$x = array_change_key_case(get_headers($url, 1),CASE_LOWER);
		if ( strcasecmp($x[0], 'HTTP/1.1 200 OK') != 0 ) { $x = $x['content-length'][1]; }
		else { $x = $x['content-length']; }
	}
	else { $x = @filesize($url); }
	return $x;
}
Share

Tags: , , , ,


Next Page »