Jobs API

This module provides the functionalities to interact with Amorphic ETL jobs API.

Usage

  1. Initialize Amorphic wrapper

from amorphicutils.api.amorphic import Amorphic

url = "https://bw7rwkd87f.execute-api.us-east-1.amazonaws.com"
environment = "master"
role_id = "admin-role-535343eb-0g44-4h34-g5df-766u87ed5ded"

amorphic_api = Amorphic(url, environment, role_id)
  1. Create pythonshell job

script_path = os.path.join(os.path.dirname(os.path.abspath(__file__)),
                           'assets/test.py')
payload = {
    'JobName': 'python_test_job,
    'ETLJobType': 'pythonshell',
    'ScriptPath': script_path
}

response = self.amorphic_api.jobs.create_job(**payload)

Implementation

class amorphicutils.api.models.jobs.Jobs(api_wrapper)

Class to call etl job related api

add_job_libs(JobName, PythonLibs)

It adds external libs to ETL job

Parameters
  • JobName – Name of the job.

  • PythonLibs – List of python libs to be added.

Returns

create_common_libs(LibraryName, PythonLibs, LibraryDescription='Created via Amorphicutils', Keywords=None)

Creates common libs to be used for ETL

Parameters
  • LibraryName – Name of the library

  • PythonLibs – List of python libs to be added

  • LibraryDescription – Description of the library

  • Keywords – Keywords for the package

Returns

create_job(JobName, ETLJobType, ScriptPath, Description=None, PythonVersion='3', GlueVersion='1.0', AllocatedCapacity=None, WorkerType=None, NumberOfWorkers=None, Keywords=None, NetworkConfiguration='general-public-network', MaxRetries=1, Timeout=None, MaxConcurrentRuns=None, DatasetAccess=None, SharedLibraries=None, ParameterAccess=None, DefaultArguments=None, **kwargs)

Create ETL job in Amorphic

Parameters
  • JobName – Name of the job

  • ETLJobType – Type of the job, can be one from [‘pythonshell’, ‘spark’]

  • ScriptPath – Local path of the main script

  • Description – Description of the job

  • PythonVersion – Python version for the job, can be 2 or 3, Default: 3

  • GlueVersion – Version of the Glue, can be 0.9 or 1, Default: 1

  • AllocatedCapacity – Allocated Capacity for job based on job type. * pythonshell: Can be 0.0625 or 1 * spark: greater than 2

  • WorkerType – WorkerType for special workload, [‘G.1X’, ‘G.2X’, ‘Standard’, None], Default:None

  • NumberOfWorkers – Number of Workers must be greater than 2, Default:None

  • Keywords – Keywords for the job

  • NetworkConfiguration – Network configuration for the job, can be one from [‘general-public-network’, ‘app-public-network’, ‘app-private-network’], Default: general-public-network

  • MaxRetries – Maximum retries by job, Default: 1

  • Timeout – Timeout for the job

  • MaxConcurrentRuns – Maximum concurrent run of the job

  • DatasetAccess

    Dataset access to the job. Format must be,

    { ‘Owner’: list({ ‘DatasetName’: ‘<DatasetName>’, ‘DatasetId’: ‘<DatasetId>’ }), ‘ReadOnly’: list({ ‘DatasetName’: ‘<DatasetName>’, ‘DatasetId’: ‘<DatasetId>’ }) }

  • SharedLibraries – Name of the shared libraries

  • DefaultArguments – Arguments for job in dictionary format

  • ParameterAccess – List of ParameterAccess

  • kwargs – Extra arguments

Returns

delete_common_libs(LibraryName)

Deletes the common library from amorphic

Parameters

LibraryName – Name of the lib to delete

Returns

delete_job(JobName=None, JobId=None)

Deletes ETL job by Id or Name

Parameters
  • JobName – ETL job name

  • JobId – ETL job id

Returns

get_all_common_libs()

Returns list of all common libs

Returns

get_all_jobs(jobs_list=None)

Returns list of all the ETL jobs

Parameters

jobs_list

Returns

get_common_libs(LibraryName)

Returns details of library

Parameters

LibraryName – Name of the library

Returns

get_job(JobName)

Returns job details

Parameters

JobName – ETL job name

Returns

remove_job_libs(JobName, PythonLibs=None, RemoveAll=False)

It removes external libs from ETL job

Parameters
  • JobName – Name of the job.

  • PythonLibs – List of python libs to be added.

  • RemoveAll – True to remove all libs, Default: False

Returns

update_common_libs(LibraryName, PythonLibs)

Update the common library

Parameters
  • LibraryName – Name of the library

  • PythonLibs – List of python libraries

Returns

update_extra_resource(JobName=None, JobId=None, DatasetAccess=None, ParameterAccess=None, SharedLibraries=None, **kwargs)

Update the job’s extra resource access

Parameters
  • JobName – Name of the job

  • JobId – Id of the job

  • DatasetAccess

    Dataset access to the job. Format must be,

    { ‘Owner’: list({ ‘DatasetName’: ‘<DatasetName>’, ‘DatasetId’: ‘<DatasetId>’ }), ‘ReadOnly’: list({ ‘DatasetName’: ‘<DatasetName>’, ‘DatasetId’: ‘<DatasetId>’ }) }

  • ParameterAccess – List of ParameterAccess

  • SharedLibraries – Name of the shared libraries

Returns

update_job(JobName, ScriptPath=None, Description=None, PythonVersion=None, GlueVersion=None, AllocatedCapacity=None, WorkerType=None, NumberOfWorkers=None, Keywords=None, NetworkConfiguration=None, MaxRetries=None, Timeout=None, MaxConcurrentRuns=None, DatasetAccess=None, SharedLibraries=None, DefaultArguments=None, ParameterAccess=None, **kwargs)

Create ETL job in Amorphic

Parameters
  • JobName – Name of the job

  • ScriptPath – Local path of the main script

  • Description – Description of the job

  • PythonVersion – Python version for the job, can be 2 or 3, Default: 3

  • GlueVersion – Version of the Glue, can be 0.9 or 1, Default: 1

  • AllocatedCapacity – Allocated Capacity for job based on job type. * pythonshell: Can be 0.0625 or 1 * spark: greater than 2

  • WorkerType – WorkerType for special workload, [‘G.1X’, ‘G.2X’, ‘Standard’, None], Default:None

  • NumberOfWorkers – Number of Workers must be greater than 2, Default:None

  • Keywords – Keywords for the job

  • NetworkConfiguration – Network configuration for the job, can be one from [‘general-public-network’, ‘app-public-network’, ‘app-private-network’], Default: general-public-network

  • MaxRetries – Maximum retries by job, Default: 1

  • Timeout – Timeout for the job

  • MaxConcurrentRuns – Maximum concurrent run of the job

  • DatasetAccess

    Dataset access to the job. Format must be,

    { ‘Owner’: list({ ‘DatasetName’: ‘<DatasetName>’, ‘DatasetId’: ‘<DatasetId>’ }), ‘ReadOnly’: list({ ‘DatasetName’: ‘<DatasetName>’, ‘DatasetId’: ‘<DatasetId>’ }) }

  • SharedLibraries – Name of the shared libraries

  • DefaultArguments – Arguments for job in dictionary format

  • ParameterAccess – List of ParameterAccess

  • kwargs – Extra arguments

Returns

update_script(ScriptPath, JobName=None, JobId=None)

Updates the ETL script

Parameters
  • ScriptPath – local script path

  • JobName – ETL job name

  • JobId – ETL job id

Returns