+31 6 55300946 ik@paulkaspers.nl

Ok, this time a recent issue I was tasked with to look into. How to set up Airflow in an Azure Kubernetes environment while securing it’s access through Active Directory.
Well according to the Airflow documentation this should be possible, so let’s get to work.

First let me tell you which versions I used for a successful installation:

Steps to create the right docker image:

  1. Make sure to built with the “[azure]” option added => pip install apache-airflow[azure]
  2. Add werkzeug: pip install werkzeug==0.16.*
  3. Add flask_oauthlib: pip install flask_oauthlib
  4. Install right version of the requests-oauthlib (or else you’ll receive the following error: ImportError: cannot import name ‘bytes_type’ from ‘oauthlib.common’): Install pip install requests-oauthlib==1.1.0
  5. Make a webserver_config.py (see below) and copy it to ${AIRFLOW_HOME}/webserver_config.py
    Replace the {client_id}, {client_secret} and {tenant_id} with your own.
    Also take a close look onto the AUTH_USER_REGISTRATION_ROLE = “Admin”, you might want to change this to User or even lower. For testing this is fine, but all new registered users are now given admin rights, not something you want to do in production.
# -*- coding: utf-8 -*-
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#   http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied.  See the License for the
# specific language governing permissions and limitations
# under the License.
import os
from flask_appbuilder.security.manager import AUTH_OAUTH

from airflow.configuration import conf

basedir = os.path.abspath(os.path.dirname(__file__))

# The SQLAlchemy connection string.
SQLALCHEMY_DATABASE_URI = conf.get('core', 'SQL_ALCHEMY_CONN')

# Flask-WTF flag for CSRF
CSRF_ENABLED = True

AUTH_TYPE = AUTH_OAUTH

# Uncomment to setup Full admin role name
AUTH_ROLE_ADMIN = 'Admin'

# Uncomment to setup Public role name, no authentication needed
AUTH_ROLE_PUBLIC = 'Public'

# Will allow user self registration
AUTH_USER_REGISTRATION = True

# The default user self registration role
AUTH_USER_REGISTRATION_ROLE = "Admin"

# When using OAuth Auth, uncomment to setup provider(s) info
# Google OAuth example:
OAUTH_PROVIDERS = [
    {
        "name": "azure",
        "icon": "fa-windows",
        "token_key": "access_token",
        "remote_app": {
            "consumer_key": "{client_id}",
            "consumer_secret": "{client_secret}",
            "base_url": "https://login.microsoftonline.com/{tenant_id}/oauth2/v2.0/token",
            "request_token_params": {
                "scope": "https://graph.microsoft.com/User.Read openid profile email offline_access"
            },
            "access_token_method": "POST",
            "authorize_url": "https://login.microsoftonline.com/{tenant_id}/oauth2/v2.0/authorize"
        }
    }
]

Now the Docker preparations are ready go forth and deploy your image on your cluster.

After deploying you should make some changes in Azure so go to the Azure portal: https://portal.azure.com/#home
Login and go to the “App registrations” where you select the App where you are running your Airflow image.

Now go to the “Token configuration” and add three optional claims with “Token type” ID:

  • family_name
  • given_name
  • upn

If you don’t do this you see error’s in the Airflow web logging stating those values are missing.

Now go to the “API permissions” and add a new permission:

  • “Microsoft Graph” delegated permission on User.Read

When you have set all this, you can go to your Airflow web url, if all went right you should see a request for logging in.
After logging in successfully you should see the account name appear in the top right corner.
One word of caution if you let accounts from outside your tenant have access you should check their email addresses because they will be set to the UPN value which isn’t a valid email address.

Thanks to Eneco for letting me share this out into the world.