Class HTML

java.lang.Object
net.sf.basedb.clients.web.util.HTML

public class HTML
extends Object
This class contains a set of static methods that may be useful in a web application for handling text/HTML strings.
Version:
2.0
Author:
Nicklas
  • Field Details

    • SMART_IMAGES

      public static final int SMART_IMAGES
      This flag is used in scanForLinks(String,int) when you want to create links for images with a <img> tag instead of an <a> tag.
      See Also:
      Constant Field Values
    • EMAIL_REGEXP

      public static final Pattern EMAIL_REGEXP
      This pattern can be used to find email addresses. The pattern will check that an @ symbol is present and that it is preceded with at least one character and followed by at least one subdomain and one topdomain. The pattern will allow any unicode letters, digits, underscore and hyphen in the address. It will not check that the domain or email address actually exists.
    • URL_REGEXP

      public static final Pattern URL_REGEXP
      This pattern can be used to find URL:s. It will look for sequences starting with http://, https://, ftp:// or www. followed by at least one subdomain and one topdomain followed by an optional port number and an optional path including query information.
    • LINEBREAKS_REGEXP

      public static final Pattern LINEBREAKS_REGEXP
      This pattern can be used to find line breaks. It will match any combination of carrige return and linefeed characters as well as some unicode line separator characters.
    • TAG_REGEXP

      public static final Pattern TAG_REGEXP
      This pattern can be used to fins HTML tags. It will match both start and end tags. The entire tag with attributes are put in the $1 group, the tag name in the $2 group and the attributes in $3.
    • SAFE_TAGS

      public static final Pattern SAFE_TAGS
      This pattern is a list of HTML tags considered "safe".
    • MARKUP

      public static final Pattern MARKUP
      Pattern that matches everything inside a HTML tag.
      Since:
      2.10
    • LEADING_TRAILING_LINEBREAKS

      private static final Pattern LEADING_TRAILING_LINEBREAKS
    • AT_OR_DOT

      private static final Pattern AT_OR_DOT
    • IMAGE_EXTENSION

      private static final Pattern IMAGE_EXTENSION
    • AMP

      private static final Pattern AMP
    • LT

      private static final Pattern LT
    • GT

      private static final Pattern GT
    • QUOTE

      private static final Pattern QUOTE
    • PERCENT

      private static final Pattern PERCENT
    • PLUS

      private static final Pattern PLUS
    • SPACE

      private static final Pattern SPACE
    • HASH

      private static final Pattern HASH
    • AMPERSAND

      private static final Pattern AMPERSAND
    • BACKSLASH

      private static final Pattern BACKSLASH
    • NEWLINE

      private static final Pattern NEWLINE
    • SINGLE_QUOTE

      private static final Pattern SINGLE_QUOTE
    • DOUBLE_QUOTE

      private static final Pattern DOUBLE_QUOTE
  • Constructor Details

    • HTML

      public HTML()
  • Method Details

    • isValidEmail

      public static boolean isValidEmail​(String email)
      Checks if the given string looks like an email address. This is done by trying to match it against the EMAIL_REGEXP pattern.
      Parameters:
      email - The string to check
      Returns:
      TRUE or FALSE
      See Also:
      EMAIL_REGEXP
    • isValidUrl

      public static boolean isValidUrl​(String url)
      Checks if the given string looks like an URL. This is done by trying to match it against the URL_REGEXP pattern.
      Parameters:
      url - The string to check
      Returns:
      TRUE or FALSE
      See Also:
      URL_REGEXP
    • formatLineBreaks

      public static String formatLineBreaks​(String in)
      Finds all linebreaks in a string and replaces them with a <br> tag, except that leading and trailing linebreaks will be removed.
      Parameters:
      in - The string to search
      Returns:
      The new string, or an empty string if NULL was passed
    • scanForLinks

      public static String scanForLinks​(String in)
      Scans a string for email addresses and URL:s and replaces them with <a href="---"> tags. Using this method is equivalent to: scanForLinks(in, LINK_URL+LINK_EMAIL+SMART_IMAGES, null);
      Parameters:
      in - The string to search
      Returns:
      The new string, or an empty string if NULL was passed
    • scanForLinks

      public static String scanForLinks​(String in, int flags)
      Scans a string for email addresses and URL:s and replaces them with <a href="---"> tags. Using this method is equivalent to: scanForLinks(in, flags, null);
      Parameters:
      in - The string to search
      flags - A combination of the following flags to indicate what we should search for: LINK_URL, LINK_EMAIL SMART_IMAGES
      Returns:
      The new string, or an empty string if NULL was passed
    • scanForLinks

      public static String scanForLinks​(String in, String target)
      Scans a string for email addresses and URL:s and replaces them with <a href="---"> tags. Using this method is equivalent to: scanForLinks(in, LINK_URL+LINK_EMAIL+SMART_IMAGES, target);
      Parameters:
      in - The string to search
      target - The name of the target window in which the link should be opened
      Returns:
      The new string, or an empty string if NULL was passed
    • scanForLinks

      public static String scanForLinks​(String in, int flags, String target)
      Scans a string for email addresses and URL:s and replaces them with <a href="---"> tags.
      Parameters:
      in - The string to search
      flags - A combination of the following flags to indicate what we should search for: LINK_URL, LINK_EMAIL SMART_IMAGES
      target - The name of the target window in which the link should be opened
      Returns:
      The new string, or an empty string if NULL was passed
    • encodeTags

      public static String encodeTags​(String in)
      Scans a string for HTML tags and replaces all & with &amp;, < with &lt;, > with &gt; and all " with "
      Parameters:
      in - The string to search
      Returns:
      The new string, or an empty string if NULL was passed
    • encodeTags

      public static String encodeTags​(String in, String safeTags)
      Scans a string for HTML tags and replaces all < and > for tags not found in the safeTags pattern with &lt; and &gt; respectively. Tags that are found in the safeTags pattern are not modified. Using this method is equivalent to encodeTags(in, Pattern.compile(safeTags));
      Parameters:
      in - The string to search
      safeTags - A regular expression that should match all safe tags
      Returns:
      The new string, or an empty string if NULL was passed
      See Also:
      SAFE_TAGS
    • encodeTags

      public static String encodeTags​(String in, Pattern safeTags)
      Scans a string for HTML tags and replaces all < and > for tags not found matching the safeTags pattern with &lt; and &gt; respectively. Tags that matches the safeTags pattern are not modified.
      Parameters:
      in - The string to search
      safeTags - A regular expression pattern that matches all safe tags
      Returns:
      The new string, or an empty string if NULL was passed
      See Also:
      SAFE_TAGS
    • urlEncode

      public static String urlEncode​(String in)
      Encode URL-unsafe characters in a string. See URLEncoder for more information.
      Parameters:
      in - The string to encode
      Returns:
      The encoded string, or an empty string if NULL was passed
    • javaScriptEncode

      public static String javaScriptEncode​(String in)
      Escape a string to make it safe for use in a JavaScript statement. Replaces \ with \\, newline with \n, ' with \' and " with \".
      Parameters:
      in - String to escape.
      Returns:
      a String object, ready to be used in javaScripts.
    • niceFormat

      public static String niceFormat​(String in)
      A convenience method for doing encodeTags(in, SAFE_TAGS), scanForLinks(in, "_blank") and formatLineBreaks(in) in a single operation.
      Parameters:
      in - The string to format
      Returns:
      The result after calling the three methods mentioned above, in that order, or an empty string if NULL was passed
    • niceFormat

      public static String niceFormat​(String in, int flags)
      A convenience method for doing encodeTags(in, SAFE_TAGS), scanForLinks(in, flags, "_blank") and formatLineBreaks(in) in a single operation.
      Parameters:
      in - The string to format
      flags - Flags to be used in the call to scanForLinks
      Returns:
      The result after calling the three methods mentioned above, in that order, or an empty string if NULL was passed
    • niceFormat

      public static String niceFormat​(String in, String linkTarget)
      A convenience method for doing encodeTags(in, SAFE_TAGS), scanForLinks(in, target) and formatLineBreaks(in) in a single operation.
      Parameters:
      in - The string to format
      linkTarget - The target parameter to be used in the call to scanForLinks(String,String)
      Returns:
      The result after calling the three methods mentioned above, in that order, or an empty string if NULL was passed
    • niceFormat

      public static String niceFormat​(String in, int flags, String linkTarget)
      A convenience method for doing encodeTags(in, SAFE_TAGS), scanForLinks(in, flags, link_target) and formatLineBreaks(in) in a single operation.
      Parameters:
      in - The string to format
      flags - Flags to be used in the call to scanForLinks
      linkTarget - The target parameter to be used in the call to scanForLinks(String,int,String)
      Returns:
      The result after calling the three methods mentioned above, in that order, or an empty string if NULL was passed
    • stripMarkup

      public static String stripMarkup​(CharSequence in)
      Remove all HTML markup in a string and return what is left.
      Parameters:
      in - The string to strip from HTML
      Returns:
      The resulting string, or null if the input is null
      Since:
      2.10
    • textLength

      public static int textLength​(CharSequence html)
      Counts the length of a string ignoring all characters in HTML markup tags. The result is the approximately the length of the string that is displayed on screen by a browser. This method counts all characters that are not inside < and >. The actual number of characters displayed by a browser may be less because of escaped sequences, eg. &amp; and white-space that is collapsed.
      Parameters:
      html - The HTML string
      Returns:
      The number of characters that are not HTML markup
      Since:
      2.10