Skip to content
  • Digital Marketing
    • Content Marketing
    • Copywriting
    • Email Marketing
    • SEO – Search Engine Optimization
    • eCommerce
    • Web Design
    • Web Development
  • Demand Generation
  • About
Menu
  • Digital Marketing
    • Content Marketing
    • Copywriting
    • Email Marketing
    • SEO – Search Engine Optimization
    • eCommerce
    • Web Design
    • Web Development
  • Demand Generation
  • About
Contact
Web Design and Web Development | MagicLamp
  • Digital Marketing
    • SEO – Search Engine Optimization
    • Content Marketing
    • Copywriting
    • Email Marketing
    • eCommerce
      • eCommerce Web Design
      • eCommerce Web Development
      • Benefits of a Professional eCommerce Web Developer
    • Web Design
      • Mobile & Tablet Web Design
      • Web Design for eCommerce
      • Web Design for Lead Conversion
    • Web Development
      • Web Development for eCommerce
      • WooCommerce and WordPress
      • WooCommerce Web Development and Customization
      • WordPress Experts
  • Demand Generation
  • About
    • Webmail
    • Portfolio
    • Blog
    • Web Development Staff
    • Web & Internet Infrastructure
    • Call Us @ 877 923 4678
  • Contact

The HTML’ers Guide to Regular Expressions

The HTML’ers Guide to Regular Expressions Part 1: Cleaning Up Content


Over the years, I’ve written 1000’s of lines of original code, in many languages. Somehow, I missed the chapter on Regular Expression (RegEx), which are pretty scary even for a programmer type, but recently I was forced to learn RegEx to deal with faulty HTML content being returned by a syndication service. An example of a Regular Expression for HTML:

<.*?>


This is the most useful Regular Expression I have in my arsenal for tackling content that has been given to me in an unknown state. Simply put, it removes all HTML tags that start with “<” and end with “>”, regardless of styles, attributes, or other text inside the tag. If I am using this Expression on an HTML page, I use my favorite editor to go through and restyle/structure the document. This is often easier than trying to interpret or correct the existing HTML structure.


If I am programmatically manipulating content (e.g. from a database, RSS, or syndication service) I might not use something less global:

<*.em*.>


This is more atomic in nature than the first Regular Expression, as it only finds/replaced emphasis tags. How many times have you had to do a global search and replace on a document where you first searched for <em> (and replaced with nothing) and then </em>. This Regular Expression catches both.


Side Note: I tend to preserve HTML entities (like ») as it is not obvious that it is missing from the structure of the document. HTML tags are different, in that they are not part of the content, they are describing the document structure, and through the use of CSS, the presentation.


Here are some other useful expressions for dealing with HTML formatting:


Regular Expressions for manipulating content and HTML in content:



Sources for Helpful Regular Expressions:

http://regexlib.com/Search.aspx?k=HTML

http://www.regular-expressions.info/examples.html


RegEx Tools

These tools are helpful in learning or debugging Regular Expressions:

RegEx Coach (PC) http://weitz.de/regex-coach/

RegEx Buddy (PC) http://www.regexbuddy.com/library.html

RegExWidget (OS-X) http://robrohan.com/projects/widgets/


Text Editors w/RegExp support

From a perspective of having used many different editors over the years, I have definitely built up a bias for certain solutions. I find that the built-in widget of EditPadPro offers the easiest and most efficient use of RegEx in day-to-day HTML editing. Most of the text editors or applications with RegEx support implement it in the Search-and-Replace window, which is useful, but from a usability perspective, you find yourself typing and clicking more to use the dialog type of interface. Nowadays, I use RegEx so much in my work, that the extra typing in clicking is a big deal. You may not care about this at all.


Lots of applications support RegEx:

Macromedia Dreamweaver (Find/SnR Dialog)

Microsoft Frontpage (Find/SnR Dialog)

Microsoft Visual Studio

SlickEdit (Find/SnR Dialog)

e/TextMate (Find/SnR Dialog)

EditPadPro (has a great SnR w/RegEx widget in the main edit window)

VIM (Find/SnR Dialog)

Eclipse (Find/SnR Dialog)

Aptana (widget in the main edit window)


Programming & Scripting Languages that support Regular Expressions

In fact, most programming languages support RegEx through various libraries or objects. We call out this list so that if you happen to use one (or many) of these, you know you have a simple way to try out RegEx in a familiar environment.

PERL*

JavaScript

PHP

Python

Ruby

C, C++

VB, VBScript, VB.NET, C#, VBA


* Some would say PERL and Regular Expressions are too closely knit to consider RegEx a “part” of PERL. The truth is that PERL can be thought of as an extension of RegEx.

Tags
Adaptive Design (1) Awards (1) CRM (1) Cross-indexed merchandising (1) Custom CMS (1) Customer Expectations (1) Customer Service (1) Custom Theme (2) Digital Marketing (7) Faceted Classification (3) Graphics (3) HTML (1) IIS (1) Internet Explorer (1) Locator Integration (1) Magic Shopping Cart (1) Merchandising (1) Minimalist Design (3) Mobile Optimization (3) Online Credibility (1) PayPal (1) Programming (1) Responsive Design (3) Security (2) SEO (4) Social Media Integration (1) Spam (1) Tags (1) Usability (4) Visitor Conversion (6) Visual Appeal (3) Web Development (10) Web Fonts (1) WooCommerce (1)
Articles and Reviews
  • Science Fiction or Science Future? Web Design Meets Mobile Tech
  • The Future is Mobile: Which Kind of Responsive is Right for You?
  • Graphic Possibilities With Open Source Web Fonts.
  • Faceted Classification Part 3: The Power to Merchandise
  • Faceted Classification Part 2: Doing It Right
Ready to talk to us?

We are an account based marketing agency.

Twitter Facebook-f Linkedin

Digital Marketing

  • Demand Generation
  • Search Engine Optimization
  • Content Marketing
  • Copywriting & Messaging
  • Email Marketing

Websites

  • Web Design
  • Web Development
  • eCommerce
  • Mobile Design
  • Woocommerce

About

  • Portfolio
  • Staff
  • Blog
  • Infrastructure
  • Contact

©2022 MagicLamp Inc. -- All rights reserved

Made with ❤ by Elementor

Posting....
Scroll back to top
We use information collected through cookies and similar technologies to improve your experience on our site, analyse how you use it and for marketing purposes.
Privacy Policy

Your privacy settings

We and our partners use information collected through cookies and similar technologies to improve your experience on our site, analyse how you use it and for marketing purposes. Because we respect your right to privacy, you can choose not to allow some types of cookies. However, blocking some types of cookies may impact your experience of the site and the services we are able to offer. In some cases, data obtained from cookies is shared with third parties for analytics or marketing reasons. You can exercise your right to opt-out of that sharing at any time by disabling cookies.
Privacy Policy
Allow all

Manage Consent Preferences

Necessary
Always ON
These cookies and scripts are necessary for the website to function and cannot be switched off. Theyare usually only set in response to actions made by you which amount to a request for services, suchas setting your privacy preferences, logging in or filling in forms. You can set your browser to block oralert you about these cookies, but some parts of the site will not then work. These cookies do notstore any personally identifiable information.
Analytics
These cookies and scripts allow us to count visits and traffic sources, so we can measure and improve the performance of our site. They help us know which pages are the most and least popular and see how visitors move around the site. All information these cookies collect is aggregated and therefore anonymous. If you do not allow these cookies and scripts, we will not know when you have visited our site.
Embedded Videos
These cookies and scripts may be set through our site by external video hosting services likeYouTube or Vimeo. They may be used to deliver video content on our website. It’s possible for the video provider to build a profile of your interests and show you relevant adverts on this or other websites. They do not store directly personal information, but are based on uniquely identifying your browser and internet device. If you do not allow these cookies or scripts it is possible that embedded video will not function as expected.
Marketing
These cookies and scripts may be set through our site by our advertising partners. They may be used by those companies to build a profile of your interests and show you relevant adverts on other sites. They do not store directly personal information, but are based on uniquely identifying your browser and internet device. If you do not allow these cookies and scripts, you will experience less targeted advertising.
Confirm my choices Allow all
Verified by ConsentMagic