Feb 21 2011

get MySpace events with a PHP function

Category: Php,Spiders & web botsGiulio Pons @ 12:41 pm

Here is a function to read the concerts for a myspace band page. This code retrieves the “shows page” for a specified myspace username, and than parse the html to find and decode data.

Since myspace returns a page in Italian (this probably depends on geographic ip translations) the fnction uses a months array in italian. Probably you should change this, or you can try to make it better by adding some header to curl to specify the language of the page (I think it’s possible).

You can watch a DEMO here.

function myspaceConcerts($user) {
	$ch = curl_init("http://www.myspace.com/".$user."/shows");
	curl_setopt($ch, CURLOPT_HTTPGET, TRUE);
	curl_setopt($ch, CURLOPT_POST, FALSE);
	curl_setopt($ch, CURLOPT_HEADER, false);
	curl_setopt($ch, CURLOPT_NOBODY, FALSE);
	curl_setopt($ch, CURLOPT_VERBOSE, FALSE);
	curl_setopt($ch, CURLOPT_REFERER, "");
	curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
	curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
	curl_setopt($ch, CURLOPT_MAXREDIRS, 4);
	curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
	curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 6.1; he; rv:1.9.2.8) Gecko/20100722 Firefox/3.6.8");
	$page = curl_exec($ch);
	// look for band name
	preg_match_all("#<a class=\"userLink\" href=\"/".$user."\">(.*)</a>#Us", $page, $a);
	$band = trim(strip_tags($a[1][0]));
	//
	// months array is in italian because from my web server pages come in italian
	// probably you have to change this array to match myspace response
	$months = array("gen"=>"01","feb"=>"02","mar"=>"03","apr"=>"04","mag"=>"05","giu"=>"06","lug"=>"07","ago"=>"08","set"=>"09","ott"=>"10","nov"=>"11","dic"=>"12");
	$out = array();
	$c=0;	// concerts counter
	$li = preg_split("/<li class=\"moduleItem event( odd| even)?( first| last)? vevent\" ?>/i",$page);
	for($i=0;$i<count($li);$i++) {
		if(stristr($li[$i],"<div class=\"entryDate\">")) {
			// find date
			preg_match_all("#<span class=\"month\">(.*)</span>#Us", $li[$i], $temp);
			$month = $months[strip_tags(trim($temp[1][0]))];
			preg_match_all("#<span class=\"day\">(.*)</span>#Us", $li[$i], $temp);
			$day = str_pad( strip_tags(trim($temp[1][0])), 2, "0", STR_PAD_LEFT);
			$year = date("Y");
			$data = $year."-".$month."-".$day;
			if($data<date("Y-m-d")) { $data = (date("Y")+1)."-".$month."-".$day; }

			// find venue
			preg_match_all("#<h4>(.*)</h4>#Us", $li[$i], $temp);
			$posto = strip_tags(trim($temp[1][0]));
			preg_match_all("#<span class=\"locality\">(.*)</span>#Us", $li[$i], $temp);

			// find city
			$citta = strip_tags(trim($temp[1][0]));
			preg_match_all("#<span class=\"region\">(.*)</span>#Us", $li[$i], $temp);

			// find region
			$region = strip_tags(trim($temp[1][0]));
			preg_match_all("#<span class=\"country-name\">(.*)</span>#Us", $li[$i], $temp);

			// find country
			$stato = strip_tags(trim($temp[1][0]));

			// build output array
			$out[$c]["band"] = $band;
			$out[$c]["date"] = $data;
			//$out[$c]["time"] = ""; not parsed
			$out[$c]["venue"] = $posto;
			//$out[$c]["url"] = ""; not parsed
			$out[$c]["where"] = $citta.",".$region.",".$stato;
			$c++;
		}
	}
	return $out;
}

This function is included in the Mini Bot Class with many other small spiders.

Share

Tags: , , , , , ,


May 30 2010

Parsing Flickr Feed with PHP tutorial

Category: Php,Spiders & web botsGiulio Pons @ 10:48 pm

I’ve spent about 30 minutes to find a javascript embed to print out a custom thumbs list of flickr photos, but I didn’t find anything clean enaugh… and I’ve not enaugh time to spend to read the flickr’s API’s…
So, I’ve searched for the feed of the user, and I’ve found it at the bottom of flickr’s pages: I’ve decided to grab the feed and parse it to get my custom gallery.
If you’re looking for the flickr’s feed it’s here, at the bottom:

flickr feed

If you click on the feed link you will open the feed (this can be shown in different ways depending on your browser). If you look the URL of the feed you’ve clicked, you can see that it contains the “id” of the feed on flickr database, here is the id:

feed id

Take the id. And then use this php function to grab and create the thumbs list, this code will simply output anchors and images, so you have to use css to customize it as you want:

function attr($s,$attrname) { // return html attribute
	preg_match_all('#\s*('.$attrname.')\s*=\s*["|\']([^"\']*)["|\']\s*#i', $s, $x);
	if (count($x)>=3) return $x[2][0]; else return "";
}

// id = id of the feed
// n = number of thumbs
function parseFlickrFeed($id,$n) {
	$url = "http://api.flickr.com/services/feeds/photos_public.gne?id={$id}&lang=it-it&format=rss_200";
	$s = file_get_contents($url);
	preg_match_all('#<item>(.*)</item>#Us', $s, $items);
	$out = "";
	for($i=0;$i<count($items[1]);$i++) {
		if($i>=$n) return $out;
		$item = $items[1][$i];
		preg_match_all('#<link>(.*)</link>#Us', $item, $temp);
		$link = $temp[1][0];
		preg_match_all('#<title>(.*)</title>#Us', $item, $temp);
		$title = $temp[1][0];
		preg_match_all('#<media:thumbnail([^>]*)>#Us', $item, $temp);
		$thumb = attr($temp[0][0],"url");
		$out.="<a href='$link' target='_blank' title=\"".str_replace('"','',$title)."\"><img src='$thumb'/></a>";
	}
	return $out;
}

// usage example:
echo parseFlickrFeed("16664181@N00",9);
// you have to use css to customize it

Like here:
flickr thumbs css

This code will be addedd to the next version of Mini Bots Class.

Share

Tags: , , , , , , , , ,


Mar 10 2010

PHP parse url, mailto, and also twitter’s usernames and arguments

Category: PhpGiulio Pons @ 9:41 pm

This small function receive a text as input and returns an html text with links if the source text contains urls (http://www… but also ftp://… and every other protocol), emails, twitter’s usernames (with @ at the beginning) and also twitter tags (with # at the beginning).
Those replaces are possible with the php preg_replace function:

function parse_twitter($t) {
	// link URLs
	$t = " ".preg_replace( "/(([[:alnum:]]+:\/\/)|www\.)([^[:space:]]*)".
		"([[:alnum:]#?\/&=])/i", "<a href=\"\\1\\3\\4\" target=\"_blank\">".
		"\\1\\3\\4</a>", $t);

	// link mailtos
	$t = preg_replace( "/(([a-z0-9_]|\\-|\\.)+@([^[:space:]]*)".
		"([[:alnum:]-]))/i", "<a href=\"mailto:\\1\">\\1</a>", $t);

	//link twitter users
	$t = preg_replace( "/ +@([a-z0-9_]*) ?/i", " <a href=\"http://twitter.com/\\1\" target=\"_blank\">@\\1</a> ", $t);

	//link twitter arguments
	$t = preg_replace( "/ +#([a-z0-9_]*) ?/i", " <a href=\"http://twitter.com/search?q=%23\\1\" target=\"_blank\">#\\1</a> ", $t);

	// truncates long urls that can cause display problems (optional)
	$t = preg_replace("/>(([[:alnum:]]+:\/\/)|www\.)([^[:space:]]".
		"{30,40})([^[:space:]]*)([^[:space:]]{10,20})([[:alnum:]#?\/&=])".
		"</", ">\\3...\\5\\6<", $t);
	return trim($t);
}
Share

Tags: , , , ,


Next Page »