If you prefer watching the tutorial in form of videos you should also take a look at this webcast: Windows Azure Storage. I recorded this webcast for those of you who do not like reading long blog posts.
Storage Types In Windows Azure
If you want to store data in Windows Azure you can choose from four different data stores:
- SQL Azure
- Blob Storage
- Table Storage
Windows Azure Queues
The first one is easy to explain. I am sure that every developer is used to the concept of FiFo queues. Azure queues can be used to communicate between different applications or application roles. Additionally Azure queues offer some quite unique features that are extremely handy whenever you use them to hand off work from an Azure web role to an Azure worker role. I want to point your attention especially to the following two ones:
Auto-reappearance of messages
If a receiver takes out a message from the queue and crashes when handling it, it is likely that the receiver will not be able to reschedule the work before dying. To handle such situations Azure queues let you specify a time span when getting an element out of the queue. If you do not delete the received message within that time span Azure will automatically add the message to the queue again so that another instance can pick it up.
The dequeue count is closely related to the previously mentioned auto-reappearance feature. It can help detecting "poisoned" messages. Imagine an invalid message that kills the process that has received it. Because of auto-reappearance another instance will pick up the message - and will also be killed. After some time all your workers will be busy dying and restarting. The dequeue counter tells you how often the message has already been taken out of the queue. If it exceeds a certain number you could remove the message without further processing (maybe logging would be a good idea in such a situation).
Before we move to the next type of storage mechanism in Azure let me give you some tips & tricks concerning queues:
- Azure queues have not been built to transport large messages (message size must not be larger than 8KB). Therefore you should not include the messages' payload in the queue messages. Store the payload in any of the other storages (see below) and use the queue to pass a reference.
- Write application that are tolorant to system failures and therefore make your message processing idempotent.
- Do not rely on a certain message delivery order.
- If you need really high throughput package multiple logical messages (e.g. tasks) into a single physical Azure queue message or use multiple queues in parallel.
- Add poisoned message handling (see description above)
- If you use your Azure queues to pass work from your web roles to your worker roles write some monitoring code that checks the queue length. If it gets to long you could implement a mechanism to automatically start new worker instances. Similarly you can shut down instances if your queue remains emtpy or short for a longer period of time.
Yes, SQL Azure is a SQL Server in the cloud. No, SQL Azure is not just another SQL Server in the cloud. With inventing SQL Azure Microsoft did much more than buying some server, put Hyper-V on them and let the virtual machines run SQL Server 2008 R2. It is correct that behind the scenes SQL Server is doing the heavy lifting deep down in the dark corners of Azure's data centers. However, a lot of things are happening before you get access to your server.
The first important thing to note is that SQL Azure comes with a firewall/load balancer that you can configure e.g. through Azure's management portal. You can configure which IP addresses should be able to establish a connection to your SQL Azure instance.
If you have passed the first firewall you get connected with SQL Azure's Gateway Layer. I will not go into all details about the gateways because this is not a SQL Azure deep dive. The gateway layer is on the one hand a proxy (find the SQL Server nodes that are dedicated to your SQL Azure account) and on the other hand a stateful firewall. "Stateful firewall" means that the gateway understands TDS (Tabular Data Stream, SQL Server native communication language) and checks TDS packages before they hit the underlying SQL Servers. Only if the gateway layer finds everything ok with the TDS packages (e.g. right order, user and password ok, encrypted, etc.) your requests are handed over the the SQL Servers.
The beauty of SQL Azure is that you as a developer can work with SQL Azure just like you work with your SQL Server that stands in your own data center. SQL Azure supports the majority of programming features that you you are used to. You can access it using ADO.NET, Entity Framework or any other data access technology that you like. However, there are some limitations to SQL Azure because of security and scalability reasons. Please check MSDN for details about the restrictions.
Again some tips & tricks that could help when you start working with SQL Azure:
- Use SQL Server Management Studio 2008 R2 in order to be able to manage our SQL Azure instances in your Object Explorer.
- Never forget that SQL Azure always is a database cluster behind the scenes (you get three nodes for every database). Therefore you have to follow all Microsoft guidelines for working with database clusters (e.g. implement auto-reconnect in case of failures, auto-retry, etc.; check MSDN for details).
- Don't forget to estimate costs for SQL Azure before you start to use it. SQL Azure can be extremely cost-efficient for your applications. There are situations (especially if you have very large databases or a lot of very small ones) in which SQL Azure can get expensive.
Windows Azure Blob Storage
Windows Azure has been built to scale. Therefore typical Azure applications consist of many instances (e.g. web farm, farm of worker machines, etc.). As a consequence there is a need for a kind of file system that can be shared by all computers participating in a certain system (clients and servers!). Azure Blob Storage is the solution for that.
Natively Azure Blob Storage speaks a REST-based protocol. If you want to read or write data from and to blobs you have to send http requests. Don't worry, you do not have to deal with all the nasty REST details. The Windows Azure SDK hides them from you.
Similarly to SQL Azure I will not go into all details of Azure Blob Storage here. You will see how to access blobs in the example shown below. Let me just give you the following tips & tricks about what you can do with Azure Blobs:
- Azure Blob Storage has been built to store massive amounts of data. Don't be afraid of storing terabytes in your blob store if you need to. Even a single blob can hold up to 1TB (page blobs).
- Azure differs between block blobs (streaming + commit-based writes) and page blobs (random read/write). Maybe I should write a blog post about the differences... Until then please check MSDN for details.
- Blobs are organized into containers. All the blobs in a container can be structured in a kind of directory system similar to the directory system that you know from your on-premise disk storage. You can specify access permissions on container and blob level.
- You can programatically ask for a shared access signature (i.e. signed URL) for any blob in your Azure Blob store. With this URL a user can direcly access the blob's content (if necessary you can restrict the time until when the URL will be valid). Therefore you can e.g. generate a confirmation document, put it into blob store and send the user a direct link to it without having to write a single line of code for providing it's content (btw - this means also less load on your web roles).
Windows Azure Table Storage
Azure Table Storage is not your father's database. It is a No SQL data store. Just like with Azure Blob Storage you have to use REST to access Azure tables (if you use the Windows Azure SDK you use WCF Data Services to access Table Storage).
Every row in an Azure table consists of the following parts:
The partition key is similar to the table name in a RDBMS like SQL Server. However, every record can consist of a different set of properties even if the records have the same partition key (i.e. no fixed schema, just storing key/value pairs).
The row key identifies a single row inside a partition. Partition key + row key have to be unique throughout your whole Table Storage service.
Used to implement optimistic locking.
At the time of writing this article Azure Table Storage supports the following data types: String, binary, bool, DateTime, GUID, int, int64 and double.
So when to use what - SQL Azure or Azure Tables?? Here are some guidelines that could help you to choose what's right for your application:
- In SQL Azure storage is quite expensive while transactions are free. In Azure tables storage is very cheap but you have to pay for every single transaction. So if you have small data that is frequently accessed use SQL Azure, if you have large amounts of data that has to be stored but that is seldom access used Azure tables. If you find both scenarios in your application you could combine both storage technologies (this is what we do in our program time cockpit.
- At the time of writing SQL Azure does only offer a single (rather small) machine size for databases. Because of this SQL Azure does not really scale. If you need more performance you have to build your own scaling mechanisms (e.g. distribute data accross multiple SQL Azure databases using for instance SyncFramework). This is different for Azure tables. They scale very well. Azure will store different partitions (remember the partition key I mentioned before) on different servers in case of heavy load. This is done automatically! If you need and want automatic scaling you should prefer Azure tables over SQL Azure.
- Azure Table Storage is not good when it comes to complex queries. If you need and want all the great features that T-SQL offers you, you should stick to SQL Azure instead of Azure tables.
- The amount of data you can store in SQL Azure is limited whereas Azure tables have been built to store terabytes of data.
Azure Storage In Action
First Solution: Very Simple (Too Simple?)
Enough theory, let's build an example that uses all the different types of storage. Our sample scenario looks like this:
We have to write a website that can be used to upload customer orders in the form of CSV files. Every file can contain multiple orders.
Let's think about what Azure storage technology we should use in this case. The first one is easy: At the end of the import process it makes sense to store the resulting orders in SQL Azure in order to be able to do e.g. reporting. I want to keep the noise-ratio low and therefore we just use plain old ADO.NET + a stored proc for database access. In practise it is likely that you add a data access layer based on e.g. Entity Framework.
So here is the T-SQL script that we use to create the order table and the stored proc that creates orders in the table. Note that our order table is very simple and small. To make runtime results a little more realistic I added a WAITFOR DELAY statement to the procedure. You can take this T-SQL code and run it against your personal SQL Azure database.
Next we have to set up our solution. We need an Azure project with a single web role. To create the solution perform the following steps:
- Create a new Cloud project in Visual Studio (you have to have Windows Azure SDK installed for that)
- Do not create a web role during the creation of the cloud project
- Add an empty ASP.NET web application project to your solution
- Add the ASP.NET application to the cloud project. You can add the ASP.NET project to your cloud project by right-clicking on the Roles folder in the cloud-project.
Now we can implement the website that can be used to upload customer orders. The HTML part is extremely simple (FileUploadPage.aspx):
Here is the implementation of the upload page (FileUploadPage.aspx.cs). Note that you have to add references to Microsoft.WindowsAzure.ServiceRuntime and Microsoft.WindowsAzure.StorageClient in order to be able to build the solution. You can find this assemblies in the installation folder of the Windows Azure SDK (usually C:\Program Files\Windows Azure SDK\v1.2\ref).
Last but not least you have to add the connection string to your Azure configuration files (ServiceConfiguration.csdef and ServiceConfiguration.cscfg):
You can try your program by just hitting F5. You will see the Windows Azure Development Fabrik come up and you can debug the application. Here is a sample .CSV file that you can use to test the program:
If you like you can also change the connection string in your configuration file so that they point to your SQL Azure database. Because SQL Azure speaks TDS you do not have to change a single line of code to cloud-enable the ADO.NET code. If you have a Windows Azure computing account you can also try this version of the program running in a Windows Azure web role.
Note that you can use SQL Azure although your application runs locally on your computer. Often I get asked whether it is possible to use just parts of Windows Azure. The answer is of course yes. This is especially true for Azure storage.
Second Solution: Ready To Scale
If you test the application with a large .CSV file you will notice that the http request for FileUploadPage.aspx takes quite a while. What could we do to make our application more scalable? We want to be able to accept uploads as fast as possible and process them in the background. In Azure you typically implement such a pattern by separating work into a web role and a worker role (this separation enables scalability but does not guarantee it; we will get back on this later). Both roles communicate using a queue. In our case a customer order is so small that we could write the whole order into the queue. However, in practise you have to deal with larger amounts of data and therefore our sample separates the queue message (ID of order) from payload (customer data, amount, etc.). We could write the order payload directly to SQL Azure but this would not lead to a smart solution. The problem is that - as mentioned before - SQL Azure does not scale very good in case of high loads. Table storage is a much better option. So we end up with the following architecture:
- Web role accepts web request, writes request data into table store and adds message to the queue.
- Worker role listens to the queue, pulls payload from table store and processes the order request.
To handle the connection to Windows Azure Storage we write a small helper class that cares for authentication (CloudStorageConnection). Note that is is quite easy to establish a connection to Azure Storage. All you need to do is to provide a method that fetches the storage connection string from the configuration file and call CloudStorageAccount.FromConfigurationSetting.
The code shown above reads the Azure Storage connection string using RoleEnvironment.GetConfigurationSettingValue. Therefore we need the connection string in our Azure configuration files (ServiceConfiguration.csdef and ServiceConfiguration.cscfg):
In order to be able to write to our Azure table we need one additional class. This class acts as the "data model" for our table (Order.cs). Please pay close attention to the comments in the following implementation! If you change only one name of the first three properties you will end up spending hours and hours looking for strange http errors.
Now we are ready to change the implementation of our website. We can throw away the access to SQL Azure and replace it with the insert operation to the Azure table and the code necessary to add the message to the queue:
The application is ready to be tested. You can either use Development Storage or - if you have your own Windows Azure account - you can change the connection strings so that they point into the cloud. If you run this version of the program it will add all orders to a table and a queue. I recommend that you get one of the numerous explorer tools with which you can look into Azure tables, queues and blob stores. I personally prefer cerebrata's Cloud Storage Studio; is is worth every single Dollar it costs. If you don't want to spend money for cerebrata you can also use Visual Studio's explorer tools (new in the Windows Azure SDK 1.2; see Server Explorer in Visual Studio). At the time of writing this article Visual Studio tools could not be compared with the toolset that cerebrata provides.
Our current application version has a slight problem: It sends messages but no one cares. Therefore the next step is to write a worker role that monitors the queue and does the work. You can add a worker to your Azure project by right-clicking on the Roles folder in the cloud-project. The following steps are necessary to get ready to implement the worker:
- Add references to the Azure SDK assemblies Microsoft.WindowsAzure.ServiceRuntime and Microsoft.WindowsAzure.StorageClient.
- Add links to the files CloudStorageConnection.cs and Order.cs (you find them in the web role; details see above).
- Last but not least we have to copy the configuration settings in ServiceConfiguration.csdef and ServiceConfiguration.cscfg because the worker role needs the same connection strings as the web role:
That's it, we are ready to write the worker. Here is a sample implementation. You will find yourself familiar with most of the concepts because we already used them when implementing the web role. The new part is the query that retrieves data from our Azure table. As you can see you use WCF Data Services; if you are already familiar with that technology Azure tables will not be something new for you.
Hit F5 and watch how your worker picks up messages from the queue, reads order payload from your Azure table and generates rows in the SQL Azure database.
Third Step: Using Windows Azure Blob Store
In order to demonstrate all different storage mechanism of the Windows Azure Platform we have to include blob storage. In our sample a good application for blob storage could be a confirmation function. After we have successfully processed an order we could send the user a confirmation document. To keep it simple we are just going to create a simple text file (i.e. blob) that says Order xxx accepted. We just have to extend our helper class CloudStorageConnection a little bit:
With these changes we have a reference to the Azure blob container that should receive our blobs and we can create the blob inside our worker process (I will not repeat the whole worker class here, just the necessary lines for creating the blob):
Recap And Summary
The sample showed how to build an application that uses all parts of Windows Azure Storage. The application itself can run in Azure but it needs not.
We used a queue and table storage to separate order processing (worker) from receiving orders (web). This scales better than doing all the work in the web role. However, do we really get the most from what Azure is offering in terms of scalability and performance? No, our sample just scraches the surface. Here are a few tips for you if you want to make our sample even more scalable (possible I will create another blog post about that some times later):
- Currently we just use one single partition key in table storage. In order to enable automatic distribution or work accross multiple table storage servers we would have to distribute order data accress multiple partitions (i.e. partition keys). The same is true for queues; you could use mulitple queue names, too.
- The bottleneck for both worker and web role is not the CPU in our current implementation. It is quite likely that the web role will spend most of the time waiting for queues and table storage. The worker role will mainly wait for the database. If you want to get the maximum out of your Azure roles you should parallelize the algorithms in both web and worker role.
Enjoy playing with the sample, please send me feedback and don't forget to check out our product time cockpit.