• Top
  • Comment
  • Reply

Extracting Twitter Usertags using Regex

Take the following tweet as an example

@shahmirj can be found on hello@example.com and @r2d2 but @000 dosent exist as a user

We need to be sure to match only @shahmirj and @r2d2 and leave any thing that starts with a number or is an email addresses. To do this we use the following regex:

(?<=^|(?<=[^a-zA-Z0-9-_\\.]))@([A-Za-z]+[A-Za-z0-9_]+)

The best way to understand the regex above is to start at the right of @, lets understand the meaning of the following @([A-Za-z]+[A-Za-z0-9]+)

We have to make sure that any thing we match starts with characters hence the [A-Za-z]+ but which can be followed by any numbers, therefore the followed expression [A-Za-z0-9]+. This will make sure to match user-names such as @r2d2. We cant leave things there because if you run this regex as it is, you will end up catching @example which is not what we want. This is where the part previous to @ sign comes in.

Lets break (?<=^|(?<=[^a-zA-Z0-9-_\.])) down. If we look at the inner right side of the bracket (?<=[^a-zA-Z0-9-_\.]) which makes sure that we don't catch any characters before the @ sign, So emails or tags such as aaa@bbb are ignored. However if we only use this part then our first expression @shahmirj disappears as it dosent start with any character, Therefore we use the expression before ?<=^ and combine it all together (?<=^|(?<=[^a-zA-Z0-9-_\.])) which in plain English translates to match anything which either at the start or starts with a space.

We can now combine this into our PHP and change the twitter text to highlight the user-names and convert them into <a> tags. An example of this can be seen at http://www.shahmirj.com/twitter

$string = "@shahmirj can be found on hello@example.com and @r2d2 but @000 dosent exist as a user";
$regex = "/(?<=^|(?<=[^a-zA-Z0-9-_\.]))@([A-Za-z]+[A-Za-z0-9]+)/i";

preg_replace($regex, "<a href='http://twitter.com/$1'>@$1</a>", $string);

Just a point of note that this can also be used for Hash Tags, just change the @ symbol to #.

(?<=^|(?<=[^a-zA-Z0-9-\.]))#([A-Za-z]+[A-Za-z0-9]+)

If any one has a better suggestion or something I missed please leave a comment (Now Working!)

UPDATE

Fixed the issue where it wasnt picking up tags such as @shahmirj_, Needed to add _ in the end of the matching group

By

18th Jun 2011
© 2011 Shahmir Javaid - http://shahmirj.com/blog/17

Slavisha

2nd Feb 2012

Hey man, this is awesome. Wonderful explanation!

BTW your website rocks as well.

Shahmir Javaid

5th Feb 2012

Thanks @Slavisha, The site is simple and no Photoshop used :D

Jenny

15th Nov 2012

hi Shahmir

also can you add how to use regex for getting twitter username for any url

likt

http://twitter.com/#username
https://twitter.com/username
http://www.twitter.com/@username

Shahmir Javaid

15th Nov 2012

It should be easy to create your own, by using the above and prepending some of the example in the following http://stackoverflow.com/questions/161738/what-is-the-best-regular-expression-to-check-if-a-string-is-a-valid-url

Mike Stoddart

12th Feb 2014

Any chance you could port this to Python? I can't seem to get the right syntax..



Back to Top
All content is © copyrighted, unless stated otherwise.
Subscribe, @shahmirj, Shahmir Javaid+