Feb 21 2011

get MySpace events with a PHP function

Category: Php,Spiders & web botsGiulio Pons @ 12:41 pm

Here is a function to read the concerts for a myspace band page. This code retrieves the “shows page” for a specified myspace username, and than parse the html to find and decode data.

Since myspace returns a page in Italian (this probably depends on geographic ip translations) the fnction uses a months array in italian. Probably you should change this, or you can try to make it better by adding some header to curl to specify the language of the page (I think it’s possible).

You can watch a DEMO here.

function myspaceConcerts($user) {
	$ch = curl_init("http://www.myspace.com/".$user."/shows");
	curl_setopt($ch, CURLOPT_HTTPGET, TRUE);
	curl_setopt($ch, CURLOPT_POST, FALSE);
	curl_setopt($ch, CURLOPT_HEADER, false);
	curl_setopt($ch, CURLOPT_NOBODY, FALSE);
	curl_setopt($ch, CURLOPT_VERBOSE, FALSE);
	curl_setopt($ch, CURLOPT_REFERER, "");
	curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
	curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
	curl_setopt($ch, CURLOPT_MAXREDIRS, 4);
	curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
	curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 6.1; he; rv:1.9.2.8) Gecko/20100722 Firefox/3.6.8");
	$page = curl_exec($ch);
	// look for band name
	preg_match_all("#<a class=\"userLink\" href=\"/".$user."\">(.*)</a>#Us", $page, $a);
	$band = trim(strip_tags($a[1][0]));
	//
	// months array is in italian because from my web server pages come in italian
	// probably you have to change this array to match myspace response
	$months = array("gen"=>"01","feb"=>"02","mar"=>"03","apr"=>"04","mag"=>"05","giu"=>"06","lug"=>"07","ago"=>"08","set"=>"09","ott"=>"10","nov"=>"11","dic"=>"12");
	$out = array();
	$c=0;	// concerts counter
	$li = preg_split("/<li class=\"moduleItem event( odd| even)?( first| last)? vevent\" ?>/i",$page);
	for($i=0;$i<count($li);$i++) {
		if(stristr($li[$i],"<div class=\"entryDate\">")) {
			// find date
			preg_match_all("#<span class=\"month\">(.*)</span>#Us", $li[$i], $temp);
			$month = $months[strip_tags(trim($temp[1][0]))];
			preg_match_all("#<span class=\"day\">(.*)</span>#Us", $li[$i], $temp);
			$day = str_pad( strip_tags(trim($temp[1][0])), 2, "0", STR_PAD_LEFT);
			$year = date("Y");
			$data = $year."-".$month."-".$day;
			if($data<date("Y-m-d")) { $data = (date("Y")+1)."-".$month."-".$day; }

			// find venue
			preg_match_all("#<h4>(.*)</h4>#Us", $li[$i], $temp);
			$posto = strip_tags(trim($temp[1][0]));
			preg_match_all("#<span class=\"locality\">(.*)</span>#Us", $li[$i], $temp);

			// find city
			$citta = strip_tags(trim($temp[1][0]));
			preg_match_all("#<span class=\"region\">(.*)</span>#Us", $li[$i], $temp);

			// find region
			$region = strip_tags(trim($temp[1][0]));
			preg_match_all("#<span class=\"country-name\">(.*)</span>#Us", $li[$i], $temp);

			// find country
			$stato = strip_tags(trim($temp[1][0]));

			// build output array
			$out[$c]["band"] = $band;
			$out[$c]["date"] = $data;
			//$out[$c]["time"] = ""; not parsed
			$out[$c]["venue"] = $posto;
			//$out[$c]["url"] = ""; not parsed
			$out[$c]["where"] = $citta.",".$region.",".$stato;
			$c++;
		}
	}
	return $out;
}

This function is included in the Mini Bot Class with many other small spiders.

Share

Tags: , , , , , ,


Sep 09 2010

How to change twitter status with php and curl without oAuth

Category: PhpGiulio Pons @ 6:43 pm

Twitter api authentication
Since the 31 of august 2010, twitter made its API more secure, stopping basic authentication calls.
So, if you used basic authentication you have to change your code and implement oAuth authentication model (you can read about oAuth on Wikipedia).

The oAuth authentication model
Using oAuth means that you have to register an application on twitter developer site, understand the new model and implement it (here they suggest how to pass to oAuth). In the oAuth model the process of authenticating a user need to take the user on twitter’s site, here he is recognized and he has to grant permission to your application by clicking on an “Allow” button. After this grant step the user is redirected to the application web site with a “token” and a “secret”, and the application that calls the API for posting or reading need to authenticate each call by adding some parameters created with “token” and “secret”. Before this, we needed a few lines of curl to change the status now we need a lot of code, and we really need a user that click on the “allow” button, at least the first time (than we can store “token” and “secret” for each user on database and re-use for each call).
Well, this process is also much more complex, so it’s not possible to think: “I do it by myself!”

Do It Yourself oAuth
I’m really stubborn and I’ve spent quite a day trying to make my own mini-function to post to twitter. But I did only a small part of the process and this part doesn’t fully work and also it wasn’t enough “mini” to include it in my mini php spider class.
So, if you want to use oAuth, I suggest you to use an existing class, especially if you don’t have a week of free time to spend.

Spider, the rough way
twitter php set status with curl
Thus, I did it the rough way: I wrote a bot that calls twitter home, finds the right form to login, fills it with my credentials, and post it. This post returns another page, that’s my homepage, so my bot finds the form to tweet, fills it and sends it. One hour of work, no oAuth. And here is the Php function:

function twitterSetStatus($user,$pwd,$status) {
	if (!function_exists("curl_init")) die("twitterSetStatus needs CURL module, please install CURL on your php.");
	$ch = curl_init();

	// -------------------------------------------------------
	// get login form and parse it
	curl_setopt($ch, CURLOPT_URL, "https://mobile.twitter.com/session/new");
	curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
	curl_setopt($ch, CURLOPT_FAILONERROR, 1);
	curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
	curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
	curl_setopt($ch, CURLOPT_TIMEOUT, 5);
	curl_setopt($ch, CURLOPT_COOKIEJAR, "my_cookies.txt");
	curl_setopt($ch, CURLOPT_COOKIEFILE, "my_cookies.txt");
	curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (iPhone; U; CPU like Mac OS X; en) AppleWebKit/420+ (KHTML, like Gecko) Version/3.0 Mobile/1A543a Safari/419.3 ");
	$page = curl_exec($ch);
	$page = stristr($page, "<div class='signup-body'>");
	preg_match("/form action=\"(.*?)\"/", $page, $action);
	preg_match("/input name=\"authenticity_token\" type=\"hidden\" value=\"(.*?)\"/", $page, $authenticity_token);

	// -------------------------------------------------------
	// make login and get home page
	$strpost = "authenticity_token=".urlencode($authenticity_token[1])."&username=".urlencode($user)."&password=".urlencode($pwd);
	curl_setopt($ch, CURLOPT_URL, $action[1]);
	curl_setopt($ch, CURLOPT_POSTFIELDS, $strpost);
	$page = curl_exec($ch);
	// check if login was ok
	preg_match("/\<div class=\"warning\"\>(.*?)\<\/div\>/", $page, $warning);
	if (isset($warning[1])) return $warning[1];
	$page = stristr($page,"<div class='tweetbox'>");
	preg_match("/form action=\"(.*?)\"/", $page, $action);
	preg_match("/input name=\"authenticity_token\" type=\"hidden\" value=\"(.*?)\"/", $page, $authenticity_token);

	// -------------------------------------------------------
	// send status update
	$strpost = "authenticity_token=".urlencode($authenticity_token[1]);
	$tweet['display_coordinates']='';
	$tweet['in_reply_to_status_id']='';
	$tweet['lat']='';
	$tweet['long']='';
	$tweet['place_id']='';
	$tweet['text']=$status;
	$ar = array("authenticity_token" => $authenticity_token[1], "tweet"=>$tweet);
	$data = http_build_query($ar);
	curl_setopt($ch, CURLOPT_URL, $action[1]);
	curl_setopt($ch, CURLOPT_POSTFIELDS, $data);
	$page = curl_exec($ch);

	return true;
}

Bad things
A you can see from the code there are some things that are not good: we need to make 3 calls because we need to login using a hidden authentication token at the login. This means that this function is slow.
Twitter do not like these methods.
If twitter changes its code, it could happen that this function doesn’t work anymore.

Mini bot class
This class will be included in the Mini bot class, you can try a demo here.

Share

Tags: , , , , , ,


Aug 29 2010

PHP bot to get wikipedia definitions

Category: Php,Spiders & web botsGiulio Pons @ 3:14 pm

Wikipedia, the collaborative and multilingual encyclopedia project, has a lot of usefull terms defined in its database, you can find informations on artists, cities, medical terms, cars, brands… quite everything.
If you need to add some content to your pages without having that content in your database you can use Wikipedia API or Google define query (probably there’s also a Google API). You can, for example, need to add automatically a simple description to a city name, or to a band name. Or you could need to add the definition of some technological terms. You can do all of this things using Wikipedia, since Wikipedia has an API that easily lets you do it.

The php job is simple: we use CURL to call the API that returns an XML response, we parse it and the get the defnition.
Here is the code that make it for italian wikipedia, you can modify the url to match your wikipedia country site:

function wikidefinition($s) {
	$url = "http://it.wikipedia.org/w/api.php?action=opensearch&search=".urlencode($s)."&format=xml&limit=1";
	$ch = curl_init($url);
	curl_setopt($ch, CURLOPT_HTTPGET, TRUE);
	curl_setopt($ch, CURLOPT_POST, FALSE);
	curl_setopt($ch, CURLOPT_HEADER, false);
	curl_setopt($ch, CURLOPT_NOBODY, FALSE);
	curl_setopt($ch, CURLOPT_VERBOSE, FALSE);
	curl_setopt($ch, CURLOPT_REFERER, "");
	curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
	curl_setopt($ch, CURLOPT_MAXREDIRS, 4);
	curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
	curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 6.1; he; rv:1.9.2.8) Gecko/20100722 Firefox/3.6.8");
	$page = curl_exec($ch);
	$xml = simplexml_load_string($page);
	if((string)$xml->Section->Item->Description) {
		return array((string)$xml->Section->Item->Text, (string)$xml->Section->Item->Description, (string)$xml->Section->Item->Url);
	} else {
		return "";
	}
}

This code will be added to the MINI BOTS CLASS.

Share

Tags: , , , , , ,


Next Page »