Class HTML


  • public class HTML
    extends Object
    This class contains a set of static methods that may be useful in a web application for handling text/HTML strings.
    Version:
    2.0
    Author:
    Nicklas
    • Field Detail

      • EMAIL_REGEXP

        public static final Pattern EMAIL_REGEXP
        This pattern can be used to find email addresses. The pattern will check that an @ symbol is present and that it is preceded with at least one character and followed by at least one subdomain and one topdomain. The pattern will allow any unicode letters, digits, underscore and hyphen in the address. It will not check that the domain or email address actually exists.
      • URL_REGEXP

        public static final Pattern URL_REGEXP
        This pattern can be used to find URL:s. It will look for sequences starting with http://, https://, ftp:// or www. followed by at least one subdomain and one topdomain followed by an optional port number and an optional path including query information.
      • LINEBREAKS_REGEXP

        public static final Pattern LINEBREAKS_REGEXP
        This pattern can be used to find line breaks. It will match any combination of carrige return and linefeed characters as well as some unicode line separator characters.
      • TAG_REGEXP

        public static final Pattern TAG_REGEXP
        This pattern can be used to fins HTML tags. It will match both start and end tags. The entire tag with attributes are put in the $1 group, the tag name in the $2 group and the attributes in $3.
      • SAFE_TAGS

        public static final Pattern SAFE_TAGS
        This pattern is a list of HTML tags considered "safe".
      • MARKUP

        public static final Pattern MARKUP
        Pattern that matches everything inside a HTML tag.
        Since:
        2.10
      • LEADING_TRAILING_LINEBREAKS

        private static final Pattern LEADING_TRAILING_LINEBREAKS
      • AT_OR_DOT

        private static final Pattern AT_OR_DOT
      • IMAGE_EXTENSION

        private static final Pattern IMAGE_EXTENSION
      • AMP

        private static final Pattern AMP
      • LT

        private static final Pattern LT
      • GT

        private static final Pattern GT
      • QUOTE

        private static final Pattern QUOTE
      • PERCENT

        private static final Pattern PERCENT
      • PLUS

        private static final Pattern PLUS
      • SPACE

        private static final Pattern SPACE
      • HASH

        private static final Pattern HASH
      • AMPERSAND

        private static final Pattern AMPERSAND
      • BACKSLASH

        private static final Pattern BACKSLASH
      • NEWLINE

        private static final Pattern NEWLINE
      • SINGLE_QUOTE

        private static final Pattern SINGLE_QUOTE
      • DOUBLE_QUOTE

        private static final Pattern DOUBLE_QUOTE
    • Constructor Detail

      • HTML

        public HTML()
    • Method Detail

      • isValidEmail

        public static boolean isValidEmail​(String email)
        Checks if the given string looks like an email address. This is done by trying to match it against the EMAIL_REGEXP pattern.
        Parameters:
        email - The string to check
        Returns:
        TRUE or FALSE
        See Also:
        EMAIL_REGEXP
      • isValidUrl

        public static boolean isValidUrl​(String url)
        Checks if the given string looks like an URL. This is done by trying to match it against the URL_REGEXP pattern.
        Parameters:
        url - The string to check
        Returns:
        TRUE or FALSE
        See Also:
        URL_REGEXP
      • formatLineBreaks

        public static String formatLineBreaks​(String in)
        Finds all linebreaks in a string and replaces them with a <br> tag, except that leading and trailing linebreaks will be removed.
        Parameters:
        in - The string to search
        Returns:
        The new string, or an empty string if NULL was passed
      • scanForLinks

        public static String scanForLinks​(String in)
        Scans a string for email addresses and URL:s and replaces them with <a href="---"> tags. Using this method is equivalent to: scanForLinks(in, LINK_URL+LINK_EMAIL+SMART_IMAGES, null);
        Parameters:
        in - The string to search
        Returns:
        The new string, or an empty string if NULL was passed
      • scanForLinks

        public static String scanForLinks​(String in,
                                          int flags)
        Scans a string for email addresses and URL:s and replaces them with <a href="---"> tags. Using this method is equivalent to: scanForLinks(in, flags, null);
        Parameters:
        in - The string to search
        flags - A combination of the following flags to indicate what we should search for: LINK_URL, LINK_EMAIL SMART_IMAGES
        Returns:
        The new string, or an empty string if NULL was passed
      • scanForLinks

        public static String scanForLinks​(String in,
                                          String target)
        Scans a string for email addresses and URL:s and replaces them with <a href="---"> tags. Using this method is equivalent to: scanForLinks(in, LINK_URL+LINK_EMAIL+SMART_IMAGES, target);
        Parameters:
        in - The string to search
        target - The name of the target window in which the link should be opened
        Returns:
        The new string, or an empty string if NULL was passed
      • scanForLinks

        public static String scanForLinks​(String in,
                                          int flags,
                                          String target)
        Scans a string for email addresses and URL:s and replaces them with <a href="---"> tags.
        Parameters:
        in - The string to search
        flags - A combination of the following flags to indicate what we should search for: LINK_URL, LINK_EMAIL SMART_IMAGES
        target - The name of the target window in which the link should be opened
        Returns:
        The new string, or an empty string if NULL was passed
      • encodeTags

        public static String encodeTags​(String in)
        Scans a string for HTML tags and replaces all & with &amp;, < with &lt;, > with &gt; and all " with "
        Parameters:
        in - The string to search
        Returns:
        The new string, or an empty string if NULL was passed
      • encodeTags

        public static String encodeTags​(String in,
                                        String safeTags)
        Scans a string for HTML tags and replaces all < and > for tags not found in the safeTags pattern with &lt; and &gt; respectively. Tags that are found in the safeTags pattern are not modified. Using this method is equivalent to encodeTags(in, Pattern.compile(safeTags));
        Parameters:
        in - The string to search
        safeTags - A regular expression that should match all safe tags
        Returns:
        The new string, or an empty string if NULL was passed
        See Also:
        SAFE_TAGS
      • encodeTags

        public static String encodeTags​(String in,
                                        Pattern safeTags)
        Scans a string for HTML tags and replaces all < and > for tags not found matching the safeTags pattern with &lt; and &gt; respectively. Tags that matches the safeTags pattern are not modified.
        Parameters:
        in - The string to search
        safeTags - A regular expression pattern that matches all safe tags
        Returns:
        The new string, or an empty string if NULL was passed
        See Also:
        SAFE_TAGS
      • urlEncode

        public static String urlEncode​(String in)
        Encode URL-unsafe characters in a string. Replaces % with %25, + with %2B, space with +, # with %23 and & with %26.
        Parameters:
        in - The string to encode
        Returns:
        The encoded string, or an empty string if NULL was passed
      • javaScriptEncode

        public static String javaScriptEncode​(String in)
        Escape a string to make it safe for use in a JavaScript statement. Replaces \ with \\, newline with \n, ' with \' and " with \".
        Parameters:
        in - String to escape.
        Returns:
        a String object, ready to be used in javaScripts.
      • stripMarkup

        public static String stripMarkup​(CharSequence in)
        Remove all HTML markup in a string and return what is left.
        Parameters:
        in - The string to strip from HTML
        Returns:
        The resulting string, or null if the input is null
        Since:
        2.10
      • textLength

        public static int textLength​(CharSequence html)
        Counts the length of a string ignoring all characters in HTML markup tags. The result is the approximately the length of the string that is displayed on screen by a browser. This method counts all characters that are not inside < and >. The actual number of characters displayed by a browser may be less because of escaped sequences, eg. &amp; and white-space that is collapsed.
        Parameters:
        html - The HTML string
        Returns:
        The number of characters that are not HTML markup
        Since:
        2.10