PHP parse url, mailto, and also twitter’s usernames and arguments

This small function receive a text as input and returns an html text with links if the source text contains urls (http://www… but also ftp://… and every other protocol), emails, twitter’s usernames (with @ at the beginning) and also twitter tags (with # at the beginning).
Those replaces are possible with the php preg_replace function:

function parse_twitter($t) {
	// link URLs
	$t = " ".preg_replace( "/(([[:alnum:]]+:\/\/)|www\.)([^[:space:]]*)".
		"([[:alnum:]#?\/&=])/i", "<a href=\"\\1\\3\\4\" target=\"_blank\">".
		"\\1\\3\\4</a>", $t);

	// link mailtos
	$t = preg_replace( "/(([a-z0-9_]|\\-|\\.)+@([^[:space:]]*)".
		"([[:alnum:]-]))/i", "<a href=\"mailto:\\1\">\\1</a>", $t);

	//link twitter users
	$t = preg_replace( "/ +@([a-z0-9_]*) ?/i", " <a href=\"http://twitter.com/\\1\" target=\"_blank\">@\\1</a> ", $t);

	//link twitter arguments
	$t = preg_replace( "/ +#([a-z0-9_]*) ?/i", " <a href=\"http://twitter.com/search?q=%23\\1\" target=\"_blank\">#\\1</a> ", $t);

	// truncates long urls that can cause display problems (optional)
	$t = preg_replace("/>(([[:alnum:]]+:\/\/)|www\.)([^[:space:]]".
		"{30,40})([^[:space:]]*)([^[:space:]]{10,20})([[:alnum:]#?\/&=])".
		"</", ">\\3...\\5\\6<", $t);
	return trim($t);
}

6 comments

  1. Aiko

    Another great script Giulio. Just having a few “problems” and regular expressions are always confusing me so I don’t know how to fix it.

    I made a test page at http://blog.atgp.nl/parsetest.php so you can see what I mean. If I put two or more hash tags or twitter usernames right after each other the script seems to skip every second one.

    Maybe it’s just a small issue?

  2. Aiko

    Ah never mind I found a solution that works:

    +@([a-z0-9_]*) ?/i”

    remove the space into

    +@([a-z0-9_]*)?/i”

    Now all works correct. Removed the testpage.

  3. Giulio Pons

    Mmm are you sure? I’ve not tested it. That space is followed by a ? which means that this expression matches even if the previous space there isn’t. If you remove the space, the ? means that the expression matches even if the block ([a-z0-9_]*) there isn’t. mmm.

  4. Aiko

    As said: regular expressions are not my thing :-) so I don’t really understand what I’m doing – it’s mainly a case of trial and error.

    But, it seems to work the way I made the changes. I’ve put the testpage back online so you can see for yourself:
    http://blog.atgp.nl/parsetest.php

    I don’t mind an alternative solution :-)