Regex To Match Content Between HTML Tags

A regular expression to match all text content between the opening and closing HTML tags. Supports both single lines and multiple lines.

Don’t forget to replace the HTML tag name (div in this example) according to your actual needs before you start using this regular expression.

/(?<=(<div>))(\w|\d|\n|[().,\-:;@#$%^&*\[\]"'+–/\/®°⁰!?{}|`~]| )+?(?=(</div>))/g

Details:

It matches any string of characters, including alphanumeric characters, punctuation, and special symbols, as well as spaces and line breaks, that are contained within a <div> and </div> HTML tag pair.

The positive lookbehind assertion (?<=(<div>)) ensures that the pattern is preceded by the <div> tag, while the positive lookahead assertion (?=(</div>)) ensures that the pattern is followed by the </div> tag.

The +? quantifier matches one or more characters in a non-greedy manner, meaning it will match the shortest possible string that satisfies the pattern.

The g flag performs a global search for all matches within a string.

Matches:

  • <div>RegexPattern</div>
  • <div>RegexPattern</div><div>RegexPattern</div>

Non-matches:

  • <div>RegexPattern<div>
  • <p>RegexPattern</p>

Note that you might find the Regex listed above only matches content between HTML tags.  To include the wrapping tags, use this Regex instead:

/<div.*>.*?<\/div>/ig

Matches:

  • <div>RegexPattern</div>
  • <div>RegexPattern</div><div>RegexPattern</div>

See Also:

Regex Is Copied!