Class HtmlFileListParser


  • public class HtmlFileListParser
    extends java.lang.Object
    Html File List Parser.
    • Field Summary

      Fields 
      Modifier and Type Field Description
      private static java.util.regex.Pattern APACHE_INDEX_SKIP  
      private static java.util.regex.Pattern MAILTO_URLS  
      private static java.util.regex.Pattern[] SKIPS  
      private static java.util.regex.Pattern URLS_TO_PARENT  
      private static java.util.regex.Pattern URLS_WITH_PATHS  
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      private static java.lang.String cleanLink​(java.net.URI baseURI, java.lang.String link)  
      private static boolean isAcceptableLink​(java.lang.String link)  
      static java.util.List<java.lang.String> parseFileList​(java.lang.String baseurl, java.io.InputStream stream)
      Fetches a raw HTML from a provided InputStream, parses it, and returns the file list.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • APACHE_INDEX_SKIP

        private static final java.util.regex.Pattern APACHE_INDEX_SKIP
      • URLS_WITH_PATHS

        private static final java.util.regex.Pattern URLS_WITH_PATHS
      • URLS_TO_PARENT

        private static final java.util.regex.Pattern URLS_TO_PARENT
      • MAILTO_URLS

        private static final java.util.regex.Pattern MAILTO_URLS
      • SKIPS

        private static final java.util.regex.Pattern[] SKIPS
    • Constructor Detail

      • HtmlFileListParser

        public HtmlFileListParser()
    • Method Detail

      • parseFileList

        public static java.util.List<java.lang.String> parseFileList​(java.lang.String baseurl,
                                                                     java.io.InputStream stream)
                                                              throws TransferFailedException
        Fetches a raw HTML from a provided InputStream, parses it, and returns the file list.
        Parameters:
        stream - the input stream.
        Returns:
        the file list.
        Throws:
        TransferFailedException - if there was a problem fetching the raw html.
      • cleanLink

        private static java.lang.String cleanLink​(java.net.URI baseURI,
                                                  java.lang.String link)
      • isAcceptableLink

        private static boolean isAcceptableLink​(java.lang.String link)