Mar 01 2010

PHP curl bot to update Facebook status

Category: Php,Spiders & web botsGiulio Pons @ 10:30 pm

I’ve found this great mini bot from Alste blog, and I’ve decided to add it to the mini bot class. This bot uses curl to connect to facebook mobile (m.facebook.com) and perform the login. Then it saves the cookies received from mobile facebook and go to the facebook mobile homepage where it sets the status making a post.
I’ve tried to make the same thing with the normal facebook, but it didn’t work. I think that mobile facebook is simpler and easier to make bots working. I’ve added this bot to the Mini Bot PHP class.

NEW
Today (28/07/2010) I’ve modified the function to handle more variables from facebook forms and it seems to work again.

//
// change Facebook status with curl
// Thanks to Alste (curl stuff inspired by nexdot.net/blog)
function setFacebookStatus($status, $login_email, $login_pass, $debug=false) {
	//CURL stuff
	//This executes the login procedure
	$ch = curl_init();
	curl_setopt($ch, CURLOPT_URL, 'https://login.facebook.com/login.php?m&next=http%3A%2F%2Fm.facebook.com%2Fhome.php');
	curl_setopt($ch, CURLOPT_POSTFIELDS, 'email=' . urlencode($login_email) . '&pass=' . urlencode($login_pass) . '&login=' . urlencode("Log in"));
	curl_setopt($ch, CURLOPT_POST, 1);
	curl_setopt($ch, CURLOPT_HEADER, 0);
	//curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
	curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
	curl_setopt($ch, CURLOPT_COOKIEJAR, "my_cookies.txt");
	curl_setopt($ch, CURLOPT_COOKIEFILE, "my_cookies.txt");
	curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
	//make sure you put a popular web browser here (signature for your web browser can be retrieved with 'echo $_SERVER['HTTP_USER_AGENT'];'
	curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.0.12) Gecko/2009070611 Firefox/3.0.12");
	curl_exec($ch);

	//This executes the status update
	curl_setopt($ch, CURLOPT_POST, 0);
	curl_setopt($ch, CURLOPT_URL, 'http://m.facebook.com/home.php');
	$page = curl_exec($ch);

	//echo htmlspecialchars($page);

	curl_setopt($ch, CURLOPT_POST, 1);
	//this gets the post_form_id value
	preg_match("/input type=\"hidden\" name=\"post_form_id\" value=\"(.*?)\"/", $page, $form_id);
	preg_match("/input type=\"hidden\" name=\"fb_dtsg\" value=\"(.*?)\"/", $page, $fb_dtsg);
	preg_match("/input type=\"hidden\" name=\"charset_test\" value=\"(.*?)\"/", $page, $charset_test);
	preg_match("/input type=\"submit\" class=\"button\" name=\"update\" value=\"(.*?)\"/", $page, $update);

	//we'll also need the exact name of the form processor page
	//preg_match("/form action=\"(.*?)\"/", $page, $form_num);
	//sometimes doesn't work so we search the correct form action to use
	//since there could be more than one form in the page.
	preg_match_all("#<form([^>]*)>(.*)</form>#Ui", $page, $form_ar);
	for($i=0;$i<count($form_ar[0]);$i++)
		if(stristr($form_ar[0][$i],"post_form_id")) preg_match("/form action=\"(.*?)\"/", $page, $form_num); 	

	$strpost = 'post_form_id=' . $form_id[1] . '&status=' . urlencode($status) . '&update=' . urlencode($update[1]) . '&charset_test=' . urlencode($charset_test[1]) . '&fb_dtsg=' . urlencode($fb_dtsg[1]);
	if($debug) {
		echo "Parameters sent: ".$strpost."<hr>";
	}
	curl_setopt($ch, CURLOPT_POSTFIELDS, $strpost );

	//set url to form processor page
	curl_setopt($ch, CURLOPT_URL, 'http://m.facebook.com' . $form_num[1]);
	curl_exec($ch);

	if ($debug) {
		//show information regarding the request
		print_r(curl_getinfo($ch));
		echo curl_errno($ch) . '-' . curl_error($ch);
		echo "<br><br>Your Facebook status seems to have been updated.";
	}
	//close the connection
	curl_close($ch);
}
Share

Tags: , , , , ,


Mar 01 2010

PHP to get twitter infos and avatar

Category: Php,Spiders & web botsGiulio Pons @ 10:23 pm

I’ve just updated the Mini Bot Php Class with an improved version of the twitterInfo function, here is the code of the new function, I’ve added the avatar image url:

/* this function is part of the Mini Bot Class */

	//
	// get twitter infos from nickname
	// and get avatar url
	public function twitterInfo($nick) {
		$user_agent = "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)";
		$ch = curl_init();    // initialize curl handle
		curl_setopt($ch, CURLOPT_URL, "http://twitter.com/$nick"); // set url to post to
		curl_setopt($ch, CURLOPT_FAILONERROR, 1);              // Fail on errors
		curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);    // allow redirects
		curl_setopt($ch, CURLOPT_RETURNTRANSFER,1); // return into a variable
		curl_setopt($ch, CURLOPT_PORT, 80);            //Set the port number
		curl_setopt($ch, CURLOPT_TIMEOUT, 15); // times out after 15s
		curl_setopt($ch, CURLOPT_USERAGENT, $user_agent);
		$document = curl_exec($ch);
		preg_match_all('#<div class="stats">(.*)</div>#Uis', $document, $stats);
		preg_match_all('#<span[^>]*?>(.*)</span>#Uis', $stats[1][0], $spans);
		$o = array();
		for ($i=0;$i<count($spans[0]);$i++) {
			if ($this->attr($spans[0][$i],"id")=="following_count") $o['following'] = $spans[1][$i];
			if ($this->attr($spans[0][$i],"id")=="follower_count") $o['follower'] = $spans[1][$i];
			if ($this->attr($spans[0][$i],"id")=="lists_count") $o['lists'] = $spans[1][$i];
		}
		$o['avatar'] = "";
		preg_match_all('#<img [^>]*?>#Uis', $document, $t);
		for ($i=0;$i<count($t[0]);$i++) if (attr($t[0][$i],"id")=="profile-image") $o['avatar'] = attr($t[0][$i],"src");
		return $o;
	}
Share

Tags: , , , , ,


Jan 16 2010

PHP Web page to text function

Category: Php,Spiders & web botsGiulio Pons @ 2:55 pm

I’ve found this nice small bot on the www.php.net site, thanks to the author of the script on the preg_replace page.
This bot returns the text content of a url and it could be used to take text from a site and find relevant words to search.

It’s nice because it uses CURL and let us see some nice stuff that CURL does:

  1. hide himself presenting with a specific user agent, to seem a browser and not a spider
  2. follows redirect (this means that you can call this function with a tiny url to retrieve the text in the real page!)
  3. use very powerful regular expression to remove html tags, but also javascript and styles (they remain if you simply do a strip_tags)

This script will be included in the next version of the Mini Bots PHP Class.

function webpage2txt($url) {
	$user_agent = "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)";

	$ch = curl_init();    // initialize curl handle
	curl_setopt($ch, CURLOPT_URL, $url); // set url to post to
	curl_setopt($ch, CURLOPT_FAILONERROR, 1);              // Fail on errors
	curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);    // allow redirects
	curl_setopt($ch, CURLOPT_RETURNTRANSFER,1); // return into a variable
	curl_setopt($ch, CURLOPT_PORT, 80);            //Set the port number
	curl_setopt($ch, CURLOPT_TIMEOUT, 15); // times out after 15s

	curl_setopt($ch, CURLOPT_USERAGENT, $user_agent);

	$document = curl_exec($ch);

	$search = array('@<script[^>]*?>.*?</script>@si',  // Strip out javascript
		'@<style[^>]*?>.*?</style>@siU',    // Strip style tags properly
		'@<[\/\!]*?[^<>]*?>@si',            // Strip out HTML tags
		'@<![\s\S]*?–[ \t\n\r]*>@',         // Strip multi-line comments including CDATA
		'/\s{2,}/',
	);

	$text = preg_replace($search, "\n", html_entity_decode($document));

	$pat[0] = "/^\s+/";
	$pat[2] = "/\s+\$/";
	$rep[0] = "";
	$rep[2] = " ";

	$text = preg_replace($pat, $rep, trim($text));

	return $text;
}

echo webpage2txt("http://www.rockit.it");
Share

Tags: , , , ,


« Previous PageNext Page »