PHP parse url, mailto, and also twitter’s usernames and arguments

This small function receive a text as input and returns an html text with links if the source text contains urls (http://www… but also ftp://… and every other protocol), emails, twitter’s usernames (with @ at the beginning) and also twitter tags (with # at the beginning).
Those replaces are possible with the php preg_replace function:

function parse_twitter($t) {
	// link URLs
	$t = " ".preg_replace( "/(([[:alnum:]]+:\/\/)|www\.)([^[:space:]]*)".
		"([[:alnum:]#?\/&=])/i", "<a href=\"\\1\\3\\4\" target=\"_blank\">".
		"\\1\\3\\4</a>", $t);

	// link mailtos
	$t = preg_replace( "/(([a-z0-9_]|\\-|\\.)+@([^[:space:]]*)".
		"([[:alnum:]-]))/i", "<a href=\"mailto:\\1\">\\1</a>", $t);

	//link twitter users
	$t = preg_replace( "/ +@([a-z0-9_]*) ?/i", " <a href=\"http://twitter.com/\\1\" target=\"_blank\">@\\1</a> ", $t);

	//link twitter arguments
	$t = preg_replace( "/ +#([a-z0-9_]*) ?/i", " <a href=\"http://twitter.com/search?q=%23\\1\" target=\"_blank\">#\\1</a> ", $t);

	// truncates long urls that can cause display problems (optional)
	$t = preg_replace("/>(([[:alnum:]]+:\/\/)|www\.)([^[:space:]]".
		"{30,40})([^[:space:]]*)([^[:space:]]{10,20})([[:alnum:]#?\/&=])".
		"</", ">\\3...\\5\\6<", $t);
	return trim($t);
}
  1. How many times a web link has been shared on Twitter
  2. Make square thumbs or cropped thumbs with php
  3. Truncate string preserving some words in PHP
  4. PHP Web page to text function
  5. PHP google images mini bot

6 comments

  1. Aiko

    Another great script Giulio. Just having a few “problems” and regular expressions are always confusing me so I don’t know how to fix it.

    I made a test page at http://blog.atgp.nl/parsetest.php so you can see what I mean. If I put two or more hash tags or twitter usernames right after each other the script seems to skip every second one.

    Maybe it’s just a small issue?

  2. Aiko

    Ah never mind I found a solution that works:

    +@([a-z0-9_]*) ?/i”

    remove the space into

    +@([a-z0-9_]*)?/i”

    Now all works correct. Removed the testpage.

  3. Giulio Pons

    Mmm are you sure? I’ve not tested it. That space is followed by a ? which means that this expression matches even if the previous space there isn’t. If you remove the space, the ? means that the expression matches even if the block ([a-z0-9_]*) there isn’t. mmm.

  4. Aiko

    As said: regular expressions are not my thing :-) so I don’t really understand what I’m doing – it’s mainly a case of trial and error.

    But, it seems to work the way I made the changes. I’ve put the testpage back online so you can see for yourself:
    http://blog.atgp.nl/parsetest.php

    I don’t mind an alternative solution :-)

Post a comment

You may use the following HTML:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>