• Subscribe to the RSS feed! RSS icon
  • Subscribe by Email
  • home
  • blog
  • dev
  • Recent Posts

    • Automatically upload screenshots in XFCE4
    • Zend Framework full page cache tips
    • No more Wordpress
    • Xdebug is full of awesome
    • Creating a chat bot with PHP and Dbus
    • A year in review: 2011
    • Notes on shell scripting
    • Listening to Dbus signals with PHP
    • Configuring 2 monitors with xrandr
    • A quick note on Dojo's data grids and dojox.data.HtmlStore
  • Recent Comments

    • Robert on Zend Framework full page cache tips
    • Stephen S. Musoke on Zend Framework full page cache tips
    • David on Zend Framework full page cache tips
    • Anon on A quick note on Dojo's data grids and dojox.data.HtmlStore
    • James on Communicating with Pidgin from PHP via D-Bus
    • Robert on A Zend Framework 2 EventManager use case
    • Jowee on A Zend Framework 2 EventManager use case
    • Jurian Sluiman on A Zend Framework 2 EventManager use case
    • Jurian Sluiman on A Zend Framework 2 EventManager use case
    • djozsef on Webkonf 2011 recap
  • Tags

    php, about, random, framework, zend, example, ubuntu, blog, site, zend framework, book, conference, me, python, wordpress, apache, introduction, lamp, linux, open source, review, script, setup, signals, ape, community, contributing, dbus, dojo, events, hack, mysql, netbeans, pidgin, plugin, pyqt, security, shell, svn, talk
  • Categories

    • Blablabla
    • Development
    • Free time
    • Places on the web
    • Programming
    • Software
    • Uncategorized
  • Archives

    • February, 2012
    • January, 2012
    • December, 2011
    • November, 2011
    • October, 2011
    • September, 2011
    • August, 2011
    • July, 2011
    • May, 2011
    • April, 2011
    • March, 2011
    • January, 2011
    • December, 2010
    • November, 2010
    • October, 2010
    • July, 2010
    • June, 2010
    • April, 2010
    • February, 2010
    • January, 2010
    • December, 2009
    • November, 2009
    • October, 2009
    • August, 2009
    • May, 2009
    • March, 2009
    • February, 2009
    • January, 2009
    • December, 2008
    • November, 2008
    • October, 2008
    • September, 2008

Posts Tagged 'regexp'

Regular expressions with PHP

by Robert Basic on September 22nd, 2008

I just want to write some real examples. These regexps are (and always will be, 'cause I plan to write several posts on this topic) for the PHP's PCRE library. Here's a good PHP PCRE cheat sheet, it's an excellent resource for regexps. If you know nothing about regexps, first read this Wiki page.

Regexps for <a> tags

A common case is when you have a source of some web page and you want to parse out all the links from it.
An anchor tag goes something like this:

<a href="http://example.com/" title="Some website">Website</a>

Also it can have more attributes, like class, target etc. Knowing how it's built up, we can start writing a pattern, depending on what we want.
Here are some examples, some explanations are in the comments:

<?php
// Regexp examples for <a> tags

/**
* Different combinations...
* $matches_comb[0] contains the whole <a> tag
* $matches_comb[1] contains what's inside the "href" attribute
* $matches_comb[2] contains what's after <a> and before </a>
* with the "s" modifier mathces <a> tags that are broken in several lines,
* ie. matches <a> tags with newlines
* without the "s" modifier, matches only <a> tags without a newline
*/
preg_match_all(
    '#<a\s.*href=["\'](.*)["\'].*>(.*)</a>#isxU',
    $string,
    $matches_comb
);

/**
* Match only what's inside the href attributes...
*/
preg_match_all(
    '#<a\s.*href=["\'](.*)["\'].*>.*</a>#isxU',
    $string,
    $matches_href
);

/**
* Match only what's inside the href attirbutes,
* only when it starts with http:// and includes http://
* $mathces_href_http[0] contains some trash also, nevermind,
* $mathces_href_http[1] contains exactly what we need
*/
preg_match_all(
    '#<a\s.*href=["\'](http://.*)["\'].*>.*</a>#isxU',
    $string,
    $matches_href_http
);

/**
* Match all Email addresses - mailto:
*/
preg_match_all(
    '#"mailto:(.*)"#',
    $string,
    $matches_emails
);

?>

Play around with these patterns, see what's for what, experiment, that's the best way to learn regexps.
Do you have some more regexps for links? Some better ones than these here?
Happy hacking!

Tags: example, pcre, php, regex, regexp.
Categories: Development, Programming.
Comments: None.
1
Robert Basic © 2008 — 2012
Design & graphics by: Livia Radvanski
Coded by: Robert Basic
Home page last updated on November 30th, 2009.
Frameworks used: Zend Framework, Dojo, 960 Grid System