Opened 13 years ago

Closed 13 years ago

#1582 closed enhancement (fixed)

Extended support for external files

Reported by: Nicklas Nordborg Owned by: Nicklas Nordborg
Priority: critical Milestone: BASE 3.0
Component: web Version:
Keywords: Cc:

Description

The current support for external files introduced in BASE 2.16 by #1485 is limited to http and https. There is also support for Basic and Digest authentication and client-side certificates. We would like to be able to make this support more generalized. Here are a few examples:

This ticket is about adding support in the BASE core api for various kinds of external file access. We hope to be able to do this in a somewhat pluggable fashion, although it may not work for all cases (eg. if the case needs more configuration options). The actual support for the S3 and Hadoop will be added as separate tickets.

Change History (7)

comment:1 by Nicklas Nordborg, 13 years ago

For HDFS there already exists a sub-project, 'HDFS proxy'. This is a https proxy server that can be run inside Tomcat that enables regular https access to the files in the HDFS file system. It seems like it comes with complete support for client-side certificates which means that everything should work with the current version of BASE.

Pros

  • The proxy solution means that the BASE server doesn't need access to all nodes in the Hadoop file system. The internals of the file system can be completely hidden.

Cons

  • The downside is that file access is slower since all communication needs to go through the proxy server instead of directly to the data nodes.
  • It may not be obvious to users what the URL of the file is.
  • If the Hadoop cluster is also going to be used for computations we need to use the internal Hadoop paths and not the proxy URLs. How can we convert between them? How do we handle the case when the file is actually on a different external server or on the BASE internal file system?

More information about HDFS proxy at: http://hadoop.apache.org/hdfs/docs/current/hdfsproxy.html

comment:2 by Nicklas Nordborg, 13 years ago

Owner: changed from everyone to Nicklas Nordborg
Status: newassigned

comment:3 by Nicklas Nordborg, 13 years ago

(In [5582]) References #1582: Extended support for external files

Created framework for external file support. A registry (ConnectionManagerRegistry) is keeping track of registered factories (ConnectionManagerFactory). Typcially, a specific protocol is handled by a single factory.

An implementation for http and https have been created. I also have code for HDFS but I think that should be made as an extension. To do this we first need #1593 and parts of #1592.

The current implementation only supports automatic selection of connection factory (based on the protocol). This will probably not work with, for example, Amazon S3, since this also uses http (+custom authentication procedure). We need a way to manually select a factory to use. This is probably best done at the FileServer level.

comment:4 by Nicklas Nordborg, 13 years ago

(In [5594]) References #1582: Extended support for external files

Removed import statement of HDFS class that is not yet checked in to subversion.

comment:5 by Nicklas Nordborg, 13 years ago

(In [5599]) References #1582: Extended support for external files

Defined an extension point for external file support ('net.sf.basedb.core.uri.connection-manager').

Added possibility to set a specific connection manager factory on a FileServer. If not set automatic detection is used as before.

comment:6 by Nicklas Nordborg, 13 years ago

(In [5614]) References #1582: Extended support for external files

Fixed a NullPointerException found while testing. Store latest robots.txt.

comment:7 by Nicklas Nordborg, 13 years ago

Resolution: fixed
Status: assignedclosed
Note: See TracTickets for help on using tickets.