Custom OData Provider for Windows Azure
16 February 2011 - Azure
Beside working on time cockpit I also do some consulting work regarding .NET in general and the Microsoft Windows Azure Platform in particular. In that context I had the chance to work as a coach in an Azure evaluation project at Austria's leading real estate search engine. Based on the research we did in this project I came up with the idea to build a custom OData provider that optimizes the way that real estate search requests are handled. It shows how sharding can be used in Winodws Azure to massively improve performance while raising costs moderately. In this blog post I would like to show you the architecture of the solution. You will see how I have built the provider and how the possibilities of the Windows Azure platform helped me to create an elastic solution that is able to handle high loads.
The content of this blog article has been presented at the conference VSOne 2011 in Munich in Feburary 2011. Here is the German and English abstract of the session:
Mit ODATA hat Microsoft ein Datenaustauschformat vorgestellt, das sich immer mehr zum Quasistandard vorarbeitet. ODATA = SOA ohne dem Overhead von SOAP. Es stehen mittlerweile Implementierungen auf verschiedenen Plattformen zur Verfügung. In dieser Session zeigt Rainer Stropek die Entwicklung individueller ODATA Provider, über die man eigene Datenstrukturen im ODATA Format zugänglich machen kann.
With ODATA Microsoft offers a data access format that has becomes an industriy standard more and more. ODATA = SOA without the overhead of SOAP. Today Microsoft and other vendors offer implementations of ODATA on various platforms. In this session Rainer Stropek demonstrates how to implement a custom ODATA provider that is tailored to specific needs.
- Visual Studio 2010
- Locally installed SQL Server 2008 R2 (Express Edition is ok)
- Windows Azure SDK 1.3 with Windows Azure Tools for Visual Studio 1.3
- Create at least three databases locally (PerfTest, PerfTest01, PerfTest02, etc.). The first one should be at least 10 GB for data and 5 GB for log; the other ones should be at least 1 GB for data and 1 GB for log.
- Create at least three databases in SQL Azure (HighVolumeServiceTest, HighVolumeServiceTest01, HighVolumeServiceTest02, etc.). The first one should be at least 10 GB; the other ones should be at least 1 GB.
Copy the solution CustomODataSample.sln from directory Begin into a working directory. Start Visual Studio as administrator and open the solution from there.
Adjust connection settings in the following config files according to your specific setup (see above):
Use the sample tool DemoDataGenerator to fill your databases (local and cloud) with demo data. With these two T-SQL statements you can check the number of rows and the size of your demo databases:
Your large demo database in the cloud should contain approx. 12.4 million rows.
Add Default OData Service
Add a WCF Data Service to the service project CustomODataService:
Select Add new item/WCF Data Service and call it DefaultRealEstateService.
This is how the implementation should look like:
Right-click DefaultRealEstateService.svc and select View in browser. You should see the list of resource types that the service support (RealEstate). Now try to do a query using e.g. the following URL: http://localhost:<YourPort>/DefaultRealEstateService.svc/RealEstate?$top=25&$filter=SizeOfParcel ge 200 and SizeOfParcel lt 1000 and HasBalcony eq true&$orderby=SizeOfBuildingArea desc.
Try the same for your local database and for your cloud database (change connection strings in your config files).
Add Custom Provider
The first step to create a custom OData service is to implement a custom context object. The sample solution contains a base class that makes it easier to implement such a context object: CustomDataServiceContext. In the session I have discussed this base class in more details (see also slide deck; link at the beginning of this article). This is how the implementation could look like (notice that the only job of the context object is to provide a queryable that the OData service can operate on):
The second step is the creation of the custom provider. The sample solution contains a base class that makes it easier to implement the custom provider: CustomDataService<T>. In the session I have discussed this base class in more details (see also slide deck; link at the beginning of this article). This is how the implementation could look like:
As you can see the custom provider metadata is built using the helper function BuildMetadataForEntityFrameworkEntity<T>. This function uses reflection to inspect the given type and generates all the necessary OData resource sets and types. In practise you could add addition intelligence here (e.g. provide different metadata for different use cases, generate metadata manually if you do not implement a stronly typed provider).
Is that it? We have created a custom read-only provider and we could add additional features like ability to write, support for relations, etc. If you are interested in these things I recommend reading this excellent blog post series: Custom Data Service Providers by Alex James, a Program Manager working on the Data Services team at Microsoft. I will not repeat his descriptions here. As described in the slides (see top of this blog article) my goal is to create a custom implementation to IQueryable and use it in the OData service. The IQueryable implemenation should hide all the complexity of sharding.
In order to see how the implemented OData providers perform I used LoadStorm to simulate some loads. I defined three reference queries and let 10 to 50 concurrent users (step-up load testing scenario; each of them firing six queries per minute) do some queries. The result has been as expected: Because of the large database on the single SQL Azure server the solution does not really scale. The first chart shows the number of users and the throughput. In the second chart you can see that from a certain number of concurrent users the response time gets greater then 35 seconds; as a result we see a lot of HTTP errors.
The poor performance does not come from OData. I also created a unit test that runs a Linq-to-Entities query - same results:
Custom LINQ Provider - First Steps
If you want to implement IQueryable you should really consider using the IQToolkit. If offers a base class QueryProvider. You can derive your custom IQueryable from that class. In our case we will implement the class ShardingProvider. It should be able to send a single Linq-to-Entities query to mulitple database in parallel and consolidate the partly results after that. The declaration of our new class looks like this:
To try our custom LINQ provider we can add a second unit test. This time the target queryable is Query<T> (part of IQToolkit). Query<T> needs a Linq provider - our custom ShardingProvider. As you can see the LINQ query to Entity Framework and to our sharding provider are identical. The only additional code that is necessary is the code for building the connection strings to our sharding databases. Here is the code you have to add to LinqProviderTest.cs in order to be able to try the custom LINQ provider:
Before we finish our Linq provider we want to link our custom OData service with the custom Linq provider. The only thing we have to do to achieve this is to use the code shown above (creates Query<T> instance) with the existing RealEstateContext. Here is the new code for RealEstateContext.cs (notice that GetQueryable now returns Query<T>):
You want to see it work? Set a break point to ShardingProvider.Execute and run an OData query against CustomRealEstateDataService.svc. You will see the query's expression tree received by the execute method. The following two images show this. The first one shows the query in the browser. The second one shows the resulting expression tree in the Linq provider.
Implementing The Custom LINQ Provider
The implementation of the custom LINQ provider has to perform the following two steps:
- Make sure that the query is ok (e.g. must be sorted, must contain top-clause, etc.; business rules defined by the customer in the project mentioned at the beginning of this blog article)
- Parallel loop over all connections to sharding databases
- Open entity framework connection to sharding database
- Replace Query<T> in expression tree by connection to sharding database
- Execute query and return partial result
- Combine partial results by sorting them and applying the top-clause
Here is the implementation of the ShardingProvider class that does this (notice that I do not include the visitor classes here; they are in the sample code download):
Tip: Don't forget to set minimum threads in thread pools to enable full potential of PLINQ with async database IO
Now that our sharding provider is completely implemented I used LoadStorm again to simulate the same load as shown before with the standard OData provider.The results look very different - of course. No errors and an average response time of approx. 3 seconds instead of more than 10 :-)