Skip to main content

Plugins Guide

Plugins are way to enhance the basic DataHub functionality in a custom manner.

Currently, DataHub formally supports 2 types of plugins:

Authentication

Note: This is in BETA version

It is recommend that you do not do this unless you really know what you are doing

Custom authentication plugin makes it possible to authenticate DataHub users against any Identity Management System. Choose your Identity Management System and write custom authentication plugin as per detail mentioned in this section.

Currently, custom authenticators cannot be used to authenticate users of DataHub's web UI. This is because the DataHub web app expects the presence of 2 special cookies PLAY_SESSION and actor which are explicitly set by the server when a login action is performed. Instead, custom authenticators are useful for authenticating API requests to DataHub's backend (GMS), and can stand in addition to the default Authentication performed by DataHub, which is based on DataHub-minted access tokens.

The sample authenticator implementation can be found at Authenticator Sample

Implementing an Authentication Plugin

  1. Add datahub-auth-api as compileOnly dependency: Maven coordinates of datahub-auth-api can be found at Maven

    Example of gradle dependency is given below.

     dependencies {

    def auth_api = 'io.acryl:datahub-auth-api:0.9.3-3rc3'
    compileOnly "${auth_api}"
    testImplementation "${auth_api}"

    }
  2. Implement the Authenticator interface: Refer Authenticator Sample

    Sample class which implements the Authenticator interface
    public class GoogleAuthenticator implements Authenticator {

    @Override
    public void init(@Nonnull Map<String, Object> authenticatorConfig, @Nullable AuthenticatorContext context) {
    // Plugin initialization code will go here
    // DataHub will call this method on boot time
    }

    @Nullable
    @Override
    public Authentication authenticate(@Nonnull AuthenticationRequest authenticationRequest)
    throws AuthenticationException {
    // DataHub will call this method whenever authentication decisions are need to be taken
    // Authenticate the request and return Authentication
    }
    }
  3. Use getResourceAsStream to read files: If your plugin read any configuration file like properties or YAML or JSON or xml then use this.getClass().getClassLoader().getResourceAsStream("<file-name>") to read that file from DataHub GMS plugin's class-path. For DataHub GMS resource look-up behavior please refer Plugin Installation section. Sample code of getResourceAsStream is available in sample Authenticator plugin TestAuthenticator.java.

  1. Bundle your Jar: Use com.github.johnrengelman.shadow gradle plugin to create an uber jar.

    To see an example of building an uber jar, check out the build.gradle file for the apache-ranger-plugin file of Apache Ranger Plugin for reference.

    Exclude signature files as shown in below shadowJar task.

      apply plugin: 'com.github.johnrengelman.shadow';
    shadowJar {
    // Exclude com.datahub.plugins package and files related to jar signature
    exclude "META-INF/*.RSA", "META-INF/*.SF","META-INF/*.DSA"
    }
  2. Refer section Plugin Installation for plugin installation in DataHub environment

Enable GMS Authentication

By default, authentication is disabled in DataHub GMS.

Follow below steps to enable GMS authentication

  1. Download docker-compose.quickstart.yml: Download docker compose file docker-compose.quickstart.yml

  2. Set environment variable: Set METADATA_SERVICE_AUTH_ENABLED environment variable to true

  3. Redeploy DataHub GMS: Below is quickstart command to redeploy DataHub GMS

    datahub docker quickstart -f docker-compose.quickstart.yml

Authorization

Note: This is in BETA version

It is recommend that you do not do this unless you really know what you are doing

Custom authorization plugin makes it possible to authorize DataHub users against any Access Management System. Choose your Access Management System and write custom authorization plugin as per detail mentioned in this section.

The sample authorizer implementation can be found at Authorizer Sample

Implementing an Authorization Plugin

  1. Add datahub-auth-api as compileOnly dependency: Maven coordinates of datahub-auth-api can be found at Maven

    Example of gradle dependency is given below.

     dependencies {

    def auth_api = 'io.acryl:datahub-auth-api:0.9.3-3rc3'
    compileOnly "${auth_api}"
    testImplementation "${auth_api}"

    }
  2. Implement the Authorizer interface: Authorizer Sample

    Sample class which implements the Authorization interface
     public class ApacheRangerAuthorizer implements Authorizer {
    @Override
    public void init(@Nonnull Map<String, Object> authorizerConfig, @Nonnull AuthorizerContext ctx) {
    // Plugin initialization code will go here
    // DataHub will call this method on boot time
    }

    @Override
    public AuthorizationResult authorize(@Nonnull AuthorizationRequest request) {
    // DataHub will call this method whenever authorization decisions are need be taken
    // Authorize the request and return AuthorizationResult
    }

    @Override
    public AuthorizedActors authorizedActors(String privilege, Optional<ResourceSpec> resourceSpec) {
    // Need to add doc
    }
    }
  3. Use getResourceAsStream to read files: If your plugin read any configuration file like properties or YAML or JSON or xml then use this.getClass().getClassLoader().getResourceAsStream("<file-name>") to read that file from DataHub GMS plugin's class-path. For DataHub GMS resource look-up behavior please refer Plugin Installation section. Sample code of getResourceAsStream is available in sample Authenticator plugin TestAuthenticator.java.

  4. Bundle your Jar: Use com.github.johnrengelman.shadow gradle plugin to create an uber jar.

    To see an example of building an uber jar, check out the build.gradle file for the apache-ranger-plugin file of Apache Ranger Plugin for reference.

    Exclude signature files as shown in below shadowJar task.

      apply plugin: 'com.github.johnrengelman.shadow';
    shadowJar {
    // Exclude com.datahub.plugins package and files related to jar signature
    exclude "META-INF/*.RSA", "META-INF/*.SF","META-INF/*.DSA"
    }
  5. Install the Plugin: Refer to the section (Plugin Installation)[#plugin_installation] for plugin installation in DataHub environment

Plugin Installation

DataHub's GMS Service searches for the plugins in container's local directory at location /etc/datahub/plugins/auth/. This location will be referred as plugin-base-directory hereafter.

For docker, we set docker-compose to mount ${HOME}/.datahub directory to /etc/datahub directory within the GMS containers.

Docker

Follow below steps to install plugins:

Lets consider you have created an uber jar for authorizer plugin and jar name is apache-ranger-authorizer.jar and class com.abc.RangerAuthorizer has implemented the Authorizer interface.

  1. Create a plugin configuration file: Create a config.yml file at ${HOME}/.datahub/plugins/auth/. For more detail on configuration refer Config Detail section

  2. Create a plugin directory: Create plugin directory as apache-ranger-authorizer, this directory will be referred as plugin-home hereafter

     mkdir -p ${HOME}/.datahub/plugins/auth/apache-ranger-authorizer
  3. Copy plugin jar to plugin-home: Copy apache-ranger-authorizer.jar to plugin-home

     copy apache-ranger-authorizer.jar ${HOME}/.datahub/plugins/auth/apache-ranger-authorizer
  4. Update plugin configuration file: Add below entry in config.yml file, the plugin can take any arbitrary configuration under the "configs" block. in our example, there is username and password

       plugins:
    - name: "apache-ranger-authorizer"
    type: "authorizer"
    enabled: "true"
    params:
    className: "com.abc.RangerAuthorizer"
    configs:
    username: "foo"
    password: "fake"

  5. Restart datahub-gms container:

    On startup DataHub GMS service performs below steps

    1. Load config.yml
    2. Prepare list of plugin where enabled is set to true
    3. Look for directory equivalent to plugin name in plugin-base-directory. In this case it is /etc/datahub/plugins/auth/apache-ranger-authorizer/, this directory will become plugin-home
    4. Look for params.jarFileName attribute otherwise look for jar having name as <plugin-name>.jar. In this case it is /etc/datahub/plugins/auth/apache-ranger-authorizer/apache-ranger-authorizer.jar
    5. Load class given in plugin params.className attribute from the jar, here load class com.abc.RangerAuthorizer from apache-ranger-authorizer.jar
    6. Call init method of plugin

    On method call of `getResourceAsStream` DataHub GMS service looks for the resource in below order. 1. Look for the requested resource in plugin-jar file. if found then return the resource as InputStream. 2. Look for the requested resource in `plugin-home` directory. if found then return the resource as InputStream. 3. Look for the requested resource in application class-loader. if found then return the resource as InputStream. 4. Return `null` as requested resource is not found.

By default, authentication is disabled in DataHub GMS, Please follow section Enable GMS Authentication to enable authentication.

Kubernetes

Helm support is coming soon.

Config Detail

A sample config.yml can be found at config.yml.

config.yml structure:

FieldRequiredTypeDefaultDescription
plugins[].namestringname of the plugin
plugins[].typeenum[authenticator, authorizer]type of plugin, possible values are authenticator or authorizer
plugins[].enabledbooleanwhether this plugin is enabled or disabled. DataHub GMS wouldn't process disabled plugin
plugins[].params.classNamestringAuthenticator or Authorizer implementation class' fully qualified class name
plugins[].params.jarFileNamestringdefault to plugins[].name.jarjar file name in plugin-home
plugins[].params.configsmap<string,object>default to empty mapRuntime configuration required for plugin

plugins[] is an array of plugin, where you can define multiple authenticator and authorizer plugins. plugin name should be unique in plugins array.

Plugin Permissions

Adhere to below plugin access control to keep your plugin forward compatible.

  • Plugin should read/write file to and from plugin-home directory only. Refer Plugin Installation step2 for plugin-home definition
  • Plugin should access port 80 or 443 or port higher than 1024

All other access are forbidden for the plugin.

Disclaimer: In BETA version your plugin can access any port and can read/write to any location on file system, however you should implement the plugin as per above access permission to keep your plugin compatible with upcoming release of DataHub.

Migration Of Plugins From application.yml

If you have any custom Authentication or Authorization plugin define in authorization or authentication section of application.yml then migrate them as per below steps.

  1. Implement Plugin: For Authentication Plugin follow steps of Implementing an Authentication Plugin and for Authorization Plugin follow steps of Implementing an Authorization Plugin

  2. Install Plugin: Install the plugins as per steps mentioned in Plugin Installation. Here you need to map the configuration from application.yml to configuration in config.yml. This mapping from application.yml to config.yml is described below

    Mapping for Authenticators

    a. In config.yml set plugins[].type to authenticator

    b. authentication.authenticators[].type is mapped to plugins[].params.className

    c. authentication.authenticators[].configs is mapped to plugins[].params.configs

    Example Authenticator Plugin configuration in config.yml

    plugins:
    - name: "apache-ranger-authenticator"
    type: "authenticator"
    enabled: "true"
    params:
    className: "com.abc.RangerAuthenticator"
    configs:
    username: "foo"
    password: "fake"

    Mapping for Authorizer

    a. In config.yml set plugins[].type to authorizer

    b. authorization.authorizers[].type is mapped to plugins[].params.className

    c. authorization.authorizers[].configs is mapped to plugins[].params.configs

    Example Authorizer Plugin configuration in config.yml

    plugins:
    - name: "apache-ranger-authorizer"
    type: "authorizer"
    enabled: "true"
    params:
    className: "com.abc.RangerAuthorizer"
    configs:
    username: "foo"
    password: "fake"

  3. Move any other configurations files of your plugin to plugin_home directory. The detail about plugin_home is mentioned in Plugin Installation section.