Mage is an open-source tool designed as a modern alternative to Airflow for building, running, and managing data pipelines.It facilitates the integration and transformation of data by allowing users to develop pipelines in Python, SQL, or R.
It emphasises ease of use, efficiency in pipeline development and testing, and scalability without requiring a large dedicated team. It supports the deployment on major cloud platforms (AWS, GCP, Azure) and offers features such as real-time and batch processing, interactive code previews, and comprehensive observability for operational management.
Use Cases
- Scenario 1: Suppose you're looking to try out Mage for the first time. Launching Mage for the first time with the project name myproject results in the creation of a corresponding folder and the establishment of a user management database with a default owner user. Once logged in as the default owner user, users are encouraged to update the initial credentials promptly (see section below). Subsequent Mage users can be managed via the user interface, with all data tied to the myproject project.
- Scenario 2: Suppose you're looking to enhance the capabilities of your existing Mage deployment, currently associated with myproject. This involves stopping the current instance and starting a new one with upgraded settings. When you use the same project name, the new deployment will automatically connect to your existing project data. This means you'll need to log in with the credentials that were previously set up.
- Scenario 3: Suppose you're looking to create a new Mage deployment, but you want to associate it with a new project. Creating a new project, like myproject2 initiates a similar setup process as the one described in Scenario 1, requiring initial access with the default owner's credentials.
Libraries Reference
In addition to the base image, the following Mage add-ons are included:
Package | Description |
---|---|
azure | Related packages for Azure |
clickhouse | Data import or export with Clickhouse |
dbt | Packages for dbt |
google-cloud-storage | Data import or export with Google Cloud Storage |
hdf5 | Processing of data in HDF5 file format |
mysql | Data import or export with MySQL |
postgres | Data import or export with PostgreSQL |
redshift | Data import or export with Redshift |
s3 | Data import or export with S3 |
snowflake | Data import or export with Snowflake |
spark | Integration of Spark (EMR) in Mage pipelines |
streaming | Streaming pipelines |
Resources
Find below some interesting links providing more information on Mage: