Azure Data Factory - The basic

Exploring how to use and deploy Azure Data Factory

Posted by RITVN on December 02, 2017

Azure Data Factory - The basic

Nguyễn Hoàng Vũ, a senior, passionate developer of Rangers team, recently published an excellent series of blog posts about Azure Data Factory We are please to cross-post them here. His own blog has plenty of good posts that is worth to follow.

Azure Data Factory is the service that helps to copy or transform data from data source to another data source, and some use cases that we can use Azure Data Factory Azure are

  • Restore data from a production database to development database every day.
  • Transfer data from on-premises SQL server database to Azure SQL server database.
  • Backup data from production SQL server database to CSV files every night at 1 AM

There are many use cases that we can use Data Factory on, but I will introduce the first use case that I had been working on it last week.

Three basic things

To use Data Factory, we have to prepare 3 basic things that Data Factory needs to work.

1. Linked services

Linked services are much like connection strings, which define the information that’s needed for Data Factory to connect to external resources. For example, the connectionString in below JSON string describes the connection information to connect to a SQL server database.

{
    "name": "ProdSQLLinkedService",
    "properties": {
        "type": "AzureSqlDatabase",
        "typeProperties": {
            "connectionString":
                "Data Source=production.database.windows.net;Initial Catalog=ProdDB;Integrated Security=False;User ID=Username;Password=Password;"
        }
    }
}

So in this use case, we will have 2 linked services, one is for the production database, and one is for the development database.

  • ProdSQLLinkedService
  • DevSQLLinkedService

2. Datasets

Datasets represent data structures within the data stores, such as the table structure we want to restore.

{
    "name": "Input_ProductionUsersTable",
    "properties": {
        "type": "AzureSqlTable",
        "linkedServiceName": "ProdSQLLinkedService",
        "structure": [
            {
                "name": "UserId",
                "type": "Guid"
            },
            {
                "name": "Username",
                "type": "String"
            },
            {
                "name": "DateOfBirth",
                "type": "Datetime"
            }
        ],
        "typeProperties": {
            "tableName": "Users"
        },
        "availability": {
            "frequency": "Day",
            "interval": 1
        },
        "external": true,
        "policy": {}
    }
}

So in this use case, we also will have 2 datasets, one is for the table in production database, and one is for the table in development database.

  • Input_ProdUsersTable
  • Output_DevUsersTable

3. Pipeline

A pipeline contains one or more activities that together perform a task. An activity could be a copy data action, or a transform data action or also a custom action that is anything we want to do on our data. So in my use case, I will put a copy activity to the pipeline.

{
    "name": "CopyPipeline",
    "properties":
    {
        "description": "Copy data in table User from Prod database to Dev database for checking",
        "activities": [
            {
                "name": "CopyFromSQLToSQL",
                "type": "Copy",
                "inputs": [
                    {
                        "name": "Input_ProdUsersTable"
                    }
                ],
                "outputs": [
                    {
                        "name": "Output_DevUsersTable"
                    }
                ],
                "typeProperties": {
                    "source": {
                        "type": "SqlSource"
                    },
                    "sink": {
                        "type": "SqlSink",
                        "writeBatchSize": 10000,
                        "writeBatchTimeout": "60:00:00"
                    }
                },
                "policy": {
                    "concurrency": 1,
                    "retry": 0,
                    "timeout": "01:00:00"
                }
            }
        ],
        "start": "2016-07-12T00:00:00Z",
        "end": "2099-12-31T00:00:00Z"
    }
}

Finally, we will have 2 linked services, 2 datasets and 1 pipeline with a copy activity inside.

20171202-datafactory-3-basic-things

On the next blog, I will continue with how to deploy them to Data Factory through Azure portal.

Summary

The Azure Data Factory supports many ways to do with the data, but three things above always need to understand at the first step when you walk into this world 😊.

We are hiring

Would you like to work with the coolest IT guys in Vietnam? Apply to one of the following jobs:

C# / .Net Developer
Senior / Specialist
1000 – 3000 USD

Job requires a high level of technical skills and experience within Microsoft technologies and offers high salary, exciting projects, and constant challenges in terms of technology and design.

Tell me more
Xamarin Developer
Experienced
800 – 1500 USD

Want to have fun developing innovative Xamarin products? We are developing a number of exciting games and social applications of our own as well as supporting third party clients.

Tell me more
Technical Writer
Advanced English
2000 – 4000 USD

Job requires both good English as well as the ability to understand complex technical subjects and systems. You will mainly be writing SEO articles and guidelines for our many products.

Tell me more
Front-End Developers
Experienced
1000 – 2500 USD

We are looking to fill Developer positions with a new team that uses JavaScript, TypeScript, HTML5, AngularJS.
Knowing Ionic framework or NodeJs is a plus, but is not mandatory.

Tell me more
.NET/WPF Developers
Advanced English
1000 – 3000 USD

2 Senior Developer positions in an Offshore Development Center team. You will work directly with a Danish Project Manager at our Vietnam office. The initial project is for a multinational French company.

Tell me more
Ruby/Cucumber QA/Test engineering
Advanced English
1000 – 2000 USD

1 Senior QA/Test Engineer in an Offshore Development Center team. You will work directly with a Danish Project Manager at our Vietnam office. The initial project is for a multinational French company.

Tell me more