Sie sind auf Seite 1von 42

Brain Dump

Revision History:
Date 7/25/11 8/01/11 8/08/11 Author Michael Smith Michael Smith Michael Smith Change Summary Initial Outline and notes Add some deployment instructions Additions in product functionality and product infrastructure, a bulk of which came from old postings to my personal internal development blog Add instructions for generating help content, more details about Search implementation (keyword handling, paging/caching), messaging database design document, some information on logging initialization, Task engine design & implementation document along with some early design considerations, instructions on locating bounce messages, and some other minor updates and additions. Lots more Content added. Compare to version 4 for full difference list. Load Generation Tools, Add information on email retries on bkauto Version 1 2 3

8/11/11

Michael Smith

8/17/11 8/19/11

Michael Smith Michael Smith

5 6

Meta Information
This document is provided to document aspects of the Guru product and its development that may be singularly known to Michael Smith. The intention here is not to teach or educate on his experience and knowledge of approaching problems in general, but to document as much as possible about his specific knowledge about the product, how it works, how it was built, and how it needs to be maintained. This document may include login user names, but does not include any specific credentials.

Contents
Product Functionality .............................................................................................................................................................. 4 Search (for Freelancers or Projects).................................................................................................................................... 4 Keywords......................................................................................................................................................................... 4 Paging and Caching ......................................................................................................................................................... 4 Potential Cleanup ............................................................................................................................................................ 5 Messaging ........................................................................................................................................................................... 5 Landing Pages...................................................................................................................................................................... 6 Help ..................................................................................................................................................................................... 7 Project Feedback ................................................................................................................................................................. 7 Last Saved 2013-09-27 07:48:00 by Michael Smith Page 1

Certifications and Partnerships ......................................................................................................................................... 10 Product Infrastructure .......................................................................................................................................................... 13 Service Locator .................................................................................................................................................................. 13 UnitOfWork ....................................................................................................................................................................... 14 Lifecycle............................................................................................................................................................................. 14 Configuration .................................................................................................................................................................... 14 Logging .......................................................................................................................................................................... 14 Exception Handling ........................................................................................................................................................... 15 Task Engine ....................................................................................................................................................................... 15 Early Design Considerations .......................................................................................................................................... 15 Executing a Task Engine ................................................................................................................................................ 16 Task Engine Configuration ............................................................................................................................................ 17 Engine GUID vs. ID? ....................................................................................................................................................... 17 NHibernate ........................................................................................................................................................................ 18 Gotchas ......................................................................................................................................................................... 18 Caching .......................................................................................................................................................................... 18 Session Management .................................................................................................................................................... 20 Miscellaneous Utilities ...................................................................................................................................................... 21 Base 30 .......................................................................................................................................................................... 21 Development Practices ......................................................................................................................................................... 22 Merges/branches .............................................................................................................................................................. 22 Database schema dumps .................................................................................................................................................. 22 Visual Studio...................................................................................................................................................................... 22 Editor configuration for CFM files ................................................................................................................................. 22 Web site worker process has been terminated by IIS .................................................................................................. 23 Deployment Instructions ...................................................................................................................................................... 25 Preparing a Build for Deployment .................................................................................................................................... 25 Performing a Rolling Outage on Web Servers .................................................................................................................. 26 Upgrading the Task Engine ............................................................................................................................................... 28 Periodic Tasks / How To ........................................................................................................................................................ 29 Misc. Important Tasks ....................................................................................................................................................... 29 How to Update Generated Help Content in TFS ............................................................................................................... 30 How to access Production Email Accounts ....................................................................................................................... 30 How to update the Help URL Reference Spreadsheet ...................................................................................................... 32 How to locate original bounce messages ......................................................................................................................... 33 Last Saved 2013-09-27 07:48:00 by Michael Smith Page 2

SQL Queries ........................................................................................................................................................................... 33 Task Engine ....................................................................................................................................................................... 33 Clean up failed engines ................................................................................................................................................. 33 View recent Task Engine queue .................................................................................................................................... 34 Task Execution Performance Reporting ........................................................................................................................ 34 Manually Queue Tasks for Execution ............................................................................................................................ 34 Email/Messaging ............................................................................................................................................................... 35 Details for a specific Conversation ................................................................................................................................ 35 Reset incoming customer support email processing .................................................................................................... 36 View incoming email accounts (with configured) passwords ....................................................................................... 36 Bounce messages processed per day ........................................................................................................................... 37 Bounce details for specific email address ..................................................................................................................... 37 Freelancer Search.............................................................................................................................................................. 37 Query Performance and Execution Counts by Query Input ......................................................................................... 37 Recent Query Executions .............................................................................................................................................. 38 Misc. .............................................................................................................................................................................. 38 Bookmarks ............................................................................................................................................................................ 39 Libraries............................................................................................................................................................................. 39 Development Tools ........................................................................................................................................................... 39 External Bugs/Issues/Questions/etc. ................................................................................................................................ 39 Future Improvements ........................................................................................................................................................... 39 Infrastructure .................................................................................................................................................................... 39 Email processing ............................................................................................................................................................... 39 Task Engine ....................................................................................................................................................................... 40 Freelancer/Project Search ................................................................................................................................................ 40 Web Services ..................................................................................................................................................................... 40 Random Thoughts ................................................................................................................................................................. 41

Last Saved 2013-09-27 07:48:00 by Michael Smith

Page 3

Product Functionality
Search (for Freelancers or Projects)
Keywords Keywords used in search criteria adhere to the following rules: Keywords are case insensitive. Leading and trailing whitespace characters are ignored. Individual keywords are separated by any whitespace characters or control characters. The exact characters included here are defined by the Unicode specification. Commas are treated as whitespace (see Guru.Utils.TextSearchUtility.SimpleSearchStringParser) Some keywords are considered noise and are ignored since matching on them would likely not have a meaningful effect on the results. The current list of noise words can be found NoiseWords.data.txt. Note that some letters on their own are considered noise words (e.g. A) while others (e.g. C) are not. Generally, if a single letter is the name of a programming language (as determined using Wikipedias list of programming languages) it is not considered a noise word. All other single letters are considered noise words on their own. Keywords can be grouped into phrases where two otherwise separate keywords must match together (consecutively) by surrounding those keywords using ASCII double quotation marks: " (note: angled quotation marks like , , , and are different and will not cause a groping of keywords. Punctuation characters other than the comma and the ascii double quotation mark remain as-is and listed as part of the keyword (i.e. they are not treated any differently than letters or numbers by our application). The matching of an individual keyword with values in project or freelancer descriptions or other fields depends on the databases handling of those keywords with regards to word breakers and stemmers. We have two current special cases for C# and C++ to ensure searches for they function properly. Theres some information here admittedly very technical on how the database handles things (including the C# and C++ special cases). Handling of the double quote character and a plus sign (e.g. in c++) have been particularly troublesome in the past, especially when these characters end up in URLs since they need to be properly encoded to function appropriately. For addressing some issues of keyword parsing and using those keywords for full text database queries (e.g. Bug 8330), SQL Server should be used to parse keywords which could change the above rules. See Task 8383.

Paging and Caching If the data layer only retrieves the data necessary for the single page load (which is usually ideal), there is a reasonably high likelihood of inconsistencies when traversing from page to page if the data changes frequently. This is more noticeable for project searches (ordered by most recently approved so every time a new project is approved, the data changes), than it is for freelancer searches, but applies equally to both. To account for this and allow next and previous buttons on drill down pages that are also consistent with the original search, with each page load we do not just re-execute the search with the appropriate paging information to get only the data on the page. Instead, the search page generates the identifiers for all results (without loading any of the details) to lock in the results of the search and maintain consistency. Then, on each page load, the set of identifiers is used to load the proper data for that particular page. A new search generates a new list of identifiers, but paging within a single set of results will only load new data based on the previous search results. To implement this, we did not want to store the search results in the servers memory (e.g. in the users session) or the database due to the load it would place on our limited resources and potential issues with timeouts (i.e. when do we no longer need to maintain the result data?). Instead, we chose to encode the information and store it on the client side at the expense of additional bandwidth usage since the encoded list would need to be presented by the client with each request for the search data. See also: http://programmers.stackexchange.com/questions/28042. Since the list of projects or profiles could be long and their identifiers are typically 6 digits long these days, the list of all projects in the result list could have a significant quantity of data going back to the server with each next/previous click. We can reduce this quantity of data, however, with a simple encoding scheme. Last Saved 2013-09-27 07:48:00 by Michael Smith Page 4

The encoded list of result ids would be composed of the difference between one result id in the list and the next since the difference between one result and another is likely smaller and more consistent than the ids themselves. These integer values are then combined as byte arrays, compressed using gzip, and encoded using base64 to make sure the transferred value does not have special characters that cannot be represented in HTML or JavaScript. We can reduce the amount of data even further by limiting the actual number of identifiers stored for the results. Its unlikely that youll flip through all the pages of data, so if we store enough for the first 20 pages, well most likely be fine. If you do happen to page past the end of the cached data, the search can be re-executed with the risk of having some inconsistency. Since that use-case is so rare, the consequences are acceptable. Our implementation for this is broken down into a few different classes. At the most basic level, we provide the interface and basic functionality for any sort of result set that supports paging in our Guru.Business.PagedResults class. Inheritors of this class need only implement one trivial method (for duplicating the instance) and a method that retrieves the data for the current page (using the available record indexes) along with making sure that the initial paging information is setup properly (e.g. setting the initial total number of records, page size, and initial page number). Our project messaging and project alert search results do not use encoding or caching (query times are much shorter, result sizes are more manageable, and the most up-to-date information is important) and thus inherit directly from this base page results class (see Guru.Business.Activities.Messaging.FreelancerProjectAlertResults as an example) Inheriting from PagedResults is Guru.Business.EncodablePagedResults that extends this basic paged result set to one that can be encoded and reconstructed. It handles compressing and encoding the paging information, any arbitrary data that implementers may need to store and retrieve for their custom result sets, and a content hash to ensure data integrity. Along with implementing the methods for PagedResults, subclasses must also implement the encoding and decoding of the proper data that needs to be encoded (whether thats the raw search query refinements for re-execution of the search or the results of a previous search). We extend the EncodablePagedResults with one more generic result set class that provides facilities for caching keys for the results whether its the entire set of results, or just a smaller subset of contiguous results, or something else entirely. This class is the Guru.Business.CachedSearchResults. The class allows you to specify the maximum number of result keys to cache and the number of result keys to actually load at a time when results are needed that arent previously cached. It uses this information to load data and add it to the cache either incrementally or en masse. Again, though, the complete set of data is only loaded for the current page. The cached values are just keys to efficiently lookup that data without needing to re-execute a more expensive search. Its from the CachedSearchResults that we extend to construct the ProjectLeadResults (used for project searches, invites, project matches, and favorites), and ProfileSearchResults based on their specialized needs. Potential Cleanup Theres probably lots of obsolete code in searching for freelancers since searching for freelancers by way of their work samples has been removed (only searching by profiles remains). ProfileSearch activity could probably be merged into FreelancerSearch activity; FreelancerSearchOrder and ProfileSearchOrder enums could be merged as well. There is probably potential for additional refactoring to clean things up a bit.

Messaging
The design of messaging was intended to be able to replace all messaging within the application including customer support interactions. There have been some changes to the API from the design document to handle some unanticipated usages and/or for performance reasons, but the design document should still provide reasonably accurate information regarding how things were designed. In general, I believe the code should be relatively clean. One potential design defect is that different conversations may require different properties for business reasons. In the future, it may be worth considering separate tables for such extensions. For example, the PostingLocked status flag was put into the tConversationParticipant table with the intention it could, in theory, be used across conversation types. Ultimately, however, its only used by a single conversation type and would more appropriately be stored at the tConversation level for that conversation type. Another type of posting locked flag is also needed for Last Saved 2013-09-27 07:48:00 by Michael Smith Page 5

potentially the same conversations by way of an override on the dispute status of a conversation to allow posting even when under dispute (tGuruApply.IsPrivateDBLocked see also bug 9676). The posting locked flag cannot easily be re-used for that situation. Each represents a separate business use-case and cannot be forced to share the same flag. If, however, there were separate associated tables for that data, the business layer could use those tables as necessary based on the conversation type and the business rules in place. Here is a Visio document for the latest database schema for the messaging data:

Messaging Database Design v3.vsd

Landing Pages
The business entities created for landing pages include the following (in the Guru.Business.Entities.Marketing namespace):
LandingPageBase contains the primary text display information for a landing page (also includes the keyword

and category refinement information)


LandingPageLocation contains the location-based display information for a landing page (and includes the

location refinement information, currently limited to closest city, province, or country, but could also be expanded to include world region and zip proximity if deemed necessary in the future)
LandingPageLink contains a combination of a base landing page, an optional location, and a canonical custom

URL value for SEO purposes. The canonical URL stored in the landing page link can be changed to include different keywords without affecting previously generated links (the old links will still point to the same spot). This is possible by the way links are constructed. Landing Page URLs have the following forms (note: The URL will generally be prefixed with some other identifier to mark it as a landing page; e.g. /Find-Freelancers/):
<KeywordUrlKey>/0KK-CCCCCC <KeywordUrlKey>/1KKLLL-CCCCCC/<LocationUrlKey>

Eventually we may add support for industry-specific landingpages. Those would have a similar format:
<KeywordUrlKey>/2KKII-CCCCCC/<IndustryUrlKey> <KeywordUrlKey>/3KKLLLII-CCCCCC/<IndustryUrlKey>/<LocationUrlKey>

An alternate form (which is also not currently generated for use) is to use a pre-defined link id:
<KeywordUrlKey>/4SSSSSSS-CCCCCC/<ArbitraryUrlKey>

The first character of the id code (appearing after the keyword url tag) is an enumerated value that defines the format for the remaining portion of the ID in the URL up to the hyphen:
0 1 2 3 4 = = = = = Keyword (2 chars) Keyword (2 chars) + Location(3 chars) Keyword (2 chars) + Industry(2 chars) Keyword (2 chars) + Location(3 chars) + Industry(2 chars) Link(1-7 chars)

For URL formats 0-3, characters 2-3 are the Keyword Term ID in little endian base 30. For URL formats 1 & 3, characters 4-6 are the LandingPageLocation ID in little endian base 30. For URL format 2, characters 4-5 are the Industry ID in Last Saved 2013-09-27 07:48:00 by Michael Smith Page 6

little endian base 30. For URL format 3, characters 7-8 are the Industry ID in little endian base 30. For URL format 4, there are up to 7 characters before the hyphen that represent the link id using little endian base 30. While appropriate padding is used for the values in formats 0-3, there is no padding applied to format 4. In each of the existing URL formats, the hyphen is followed by a six character checksum that is used to verify the integrity of the URL. This allows the canonical URL to change yet still let us verify that the URL presented is one that was actually constructed by the application and not by someone playing around with URLs. As the arbitrary URL key changes for SEO purposes, the checksum will change as well. The <KeywordUrlKey>, <LocationUrlKey>, <IndustryUrlKey>, and <ArbitraryUrlKey> are arbitrary URL-safe "slugs". Slugs can consist of letters, numbers, and hyphens and sometimes slashes. <KeywordUrlKey> does not allow slashes. However, <LocationUrlKey>, <IndustryUrlKey>, and <ArbitraryUrlKey> are allowed to have slashes since their slashes do not impact parsing requirements (the <KeywordUrlKey> cannot have slashes so that we know exactly where to look for the format and identification information). These URL keys allow for SEO optimization of URLs.

Help
The help files are maintained in MadCap Softwares Flare product. The html output we use on the site is generated from Flare and then synchronized with source control. See How to Update Generated Help Content in TFS for instructions on how this process is done to ensure the most up to date help content for deployments. The raw Flare source files are also in source control (in the KnowledgeBase project rather than GEM), but maintenance of that is left to the help content maintainer (Stacy) and the TFS source code control plug-in for Flare.

Project Feedback
On the surface, project feedback seems like it should be relatively simple. In reality, however, there are a large number of variables that can be combined in enough ways to make the management of project feedback to be more complex than an initial instinct or a cursory view may reveal. In this post, I try to provide an overview of project feedback and its complexities. Some assumptions may be made regarding familiarity with the business rules involved. To start, let's define the business entities involved with the feedback system:
ProjectInvoice: represents a payment made for a project. It can be associated with a release from escrow or

a payment of an invoice. It represents the base unit for which feedback can be associated since feedback can only be left when there is a payment and that feedback is associated with that particular payment.
InvoiceFeedback: a common base class representing feedback on a ProjectInvoice whether it's

feedback left by a freelancer for an employer or by an employer for a freelancer. The possible states for a feedback record include Default (which is largely the same as the feedback not actually existing yet; i.e. no feedback has been given), Pending (for feedback that's been given, but not yet published or blocked by the receiver of the feedback), Published (for feedback received that's been explicitly or implicitly approved and published), or Blocked (for feedback that has been explicitly blocked by the receiving user).
ProjectFeedback: represents the specific instance of an InvoiceFeedback for feedback left by an employer

for a freelancer
CompanyFeedback: represents the specific instance of an InvoiceFeedback for feedback left by a freelancer

for an employer When managing feedback from the user interface, there are six separate views that primarily focus on the ProjectInvoice rather than any specific feedback details. Actual feedback for the ProjectInvoice need not actually exist yet, and for those that do, there can be up to two types of actual feedback records associated with it representing feedback given from employer to freelancer and freelancer to employer (ProjectFeedback or CompanyFeedback). Each of the six views focuses on different ProjectInvoices for different purposes and the Last Saved 2013-09-27 07:48:00 by Michael Smith Page 7

underlying entities used depend on the user context (i.e. whether the user is a freelancer or employer). Here's a brief rundown:

"All Feedback": this name is a little misleading or confusing in that it represents all ProjectInvoice records (i.e. all project payments) and not actually all feedback records. That said, however, since all ProjectInvoice records are represented along with any associated feedback, you do end up seeing all related feedback. This view is largely the same whether you are the freelancer or employer; it returns all records that you are associated with except those which you have marked as archived. "Feedback Received": this view filters the ProjectInvoice records to show only those for which the user has received feedback from their counterpart on the project (excluding those that are marked as archived). That is, freelancers will see all non-archived ProjectInvoice records for which there is an associated ProjectFeedback record that is not in the Default state and employers will see all non-archived ProjectInvoice records for which there is an associated CompanyFeedback record that is not in the Default state. This is the primary view a user would go to for reviewing and either publishing or blocking feedback, however it also includes feedback that has already been reviewed and published or blocked. "Leave Feedback": this view filters the ProjectInvoice records to show only those for which the user has not yet left feedback for their counterpart on the project (excluding those that are archived). That is, freelancers will see all non-archived ProjectInvoice records for which there is not an associated CompanyFeedback record or for which the CompanyFeedback record has a state of Default and employers will see all nonarchived ProjectInvoice records for which there is not an associated ProjectFeedback record or for which the ProjectFeedback record has a state of Default. "Feedback Left": this view filters the ProjectInvoice records to show only those for which the user has already left feedback for their project counterpart (excluding those that are marked as archived). That is, freelancers will see all non-archived ProjectInvoice records for which there is not an associated CompanyFeedback record or for which the CompanyFeedback record has a state of Default and employers will see all non-archived ProjectInvoice records for which there is not an associated ProjectFeedback record or for which the ProjectFeedback record has a state of Default. "Archived Feedback": this view filters the ProjectInvoice records to show only those that the freelancer or employer has archived (and thus not showing in any of the other views). There are no requirements as to whether any feedbackgiven or receivedis associated with the ProjectInvoice. "Blocked Feedback": this view filters the ProjectInvoice records to show only where the user has blocked received feedback (except those that are marked as archived). That is, freelancers will see ProjectInvoice records where the freelancer left feedback for the employer but the employer has blocked that feedback, and respectively, employers will see ProjectInvoice records where the employer left feedback for the freelancer but the freelancer has blocked that feedback.

Now, let's re-examine these views to summarize whether they are filtered using information about employer to freelancer feedback (E F), freelancer to employer feedback (F E), or neither (n/a), and we'll examine them separately for the employer context (where the employer is viewing the records) or the freelancer context. For now, we ignore additional constraints (existence or not, archived or not, etc). View All Feedback Feedback Received Leave Feedback Feedback Left Archived Feedback Blocked Feedback Employer Context n/a FE EF EF n/a FE Freelancer Context n/a EF FE FE n/a EF Page 8

Last Saved 2013-09-27 07:48:00 by Michael Smith

One possible method of mapping these six views to data retrieval methods is to construct separate methods for each use case. That is, create methods such as GetAllFeedbackForFreelancer(GuruAccount), GetFeedbackReceivedForEmployer(CompanyAccount), etc. That would have 12 separate data retrieval methods that the business layer could use to fulfill the data requirement needs of the view. This, however, arbitrarily ties the business layer and data layer to the data requirements of the current views without much possibility for re-use if a slightly different view is desired in the future (e.g. a view for those ProjectInvoice records that require action, i.e. a combination of "Leave Feedback" and the ProjectInvoice records in "Feedback Received" that haven't yet been published or blocked). It would be desirable to have the data layer be generic enough to support additional views and business requirements without requiring changes to the data layer. So, how can these be combined into one or more generic retrieval methods? Let's look at the individual differences between each of these views and parameterize them. First off, we need a user context; that is, which freelancer or employer is making the request? Next, we need a direction for the feedback we are filtering against (F E, E F, or n/a). And finally, we need information on the desired state(s) for the feedback record (i.e. Default, Pending, Published, or Blocked and whether it's archived or not). We combine these parameters into a search query object (FeedbackSearchQuery) that we use along with paging (SearchPage) and sorting information (ProjectInvoiceSortOrder) to construct our data retrieval method.
public enum FeedbackType { EmployerToFreelancer, FreelancerToEmployer } public enum FeedbackState { Default, Pending, Published, Blocked } public class FeedbackSearchQuery { public CompanyAccount Employer { get; set; } public GuruAccount Freelancer { get; set; } public bool? IsArchived { get; set; } public FeedbackType? AssociatedFeedbackType { get; set; } public FeedbackState[] AllowedFeedbackStates { get; set; } } public class ProjectInvoiceProvider : IProjectInvoiceProvider { public IList<ProjectInvoice> ExecuteSearch (FeedbackSearchQuery query, ProjectInvoiceSortOrder order, out SearchPage) { ... } }

With this data provider API, our views would use queries such as these: View All Feedback Feedback Received Employer Context
new FeedbackSearchQuery() { Employer = companyAccount, IsArchived = false } new FeedbackSearchQuery() { Employer = companyAccount, IsArchived = false, FeedbackType = FeedbackType.FreelancerToEmployer, AllowedFeedbackStates = new[] { FeedbackState.Pending, FeedbackState.Published, FeedbackState.Blocked } } new FeedbackSearchQuery() { Employer = companyAccount, IsArchived = false, FeedbackType =

Freelancer Context
new FeedbackSearchQuery() { Freelancer = guruAccount, IsArchived = false } new FeedbackSearchQuery() { Freelancer = guruAccount, IsArchived = false, FeedbackType = FeedbackType.EmployerToFreelancer, AllowedFeedbackStates = new[] { FeedbackState.Pending, FeedbackState.Published, FeedbackState.Blocked } } new FeedbackSearchQuery() { Freelancer = guruAccount, IsArchived = false, FeedbackType =

Leave Feedback

Last Saved 2013-09-27 07:48:00 by Michael Smith

Page 9

FeedbackType.EmployerToFreelancer, AllowedFeedbackStates = new[] { FeedbackState.Default, }

FeedbackType.FreelancerToEmployer, AllowedFeedbackStates = new[] { FeedbackState.Default, } } new FeedbackSearchQuery() { Freelancer = guruAccount, IsArchived = false, FeedbackType = FeedbackType.FreelancerToEmployer, AllowedFeedbackStates = new[] { FeedbackState.Pending, FeedbackState.Published, FeedbackState.Blocked } } new FeedbackSearchQuery() { Freelancer = guruAccount, IsArchived = true } new FeedbackSearchQuery() { Freelancer = guruAccount, IsArchived = false, FeedbackType = FeedbackType.EmployerToFreelancer, AllowedFeedbackStates = new[] { FeedbackState.Blocked } }

Feedback Left

Archived Feedback Blocked Feedback

} new FeedbackSearchQuery() { Employer = companyAccount, IsArchived = false, FeedbackType = FeedbackType.EmployerToFreelancer, AllowedFeedbackStates = new[] { FeedbackState.Pending, FeedbackState.Published, FeedbackState.Blocked } } new FeedbackSearchQuery() { Employer = companyAccount, IsArchived = true } new FeedbackSearchQuery() { Employer = companyAccount, IsArchived = false, FeedbackType = FeedbackType.FreelancerToEmployer, AllowedFeedbackStates = new[] { FeedbackState.Blocked } }

As it turns out, things are even more complicated than this. The "Leave Feedback" view must also restrict records to those that are not locked or in dispute. While the feedback record may be in the "default" state, feedback cannot be left due to a temporary hold on the feedback record so it should not be included in that result. Thus, additional query criteria must be included to determine whether these additional state fields (which are usually on the invoice itself rather than the feedback record) come into play. Filtering on whether the feedback is archived or not is a little non-intuitive. There needs to be two flags one for the employer and one for the freelancer so they can archive the record independently. The archive process is really applying to the invoice and not actual feedback (since you can archive an invoice that you haven't given feedback for), so it would be appropriate to have the two flags directly on the invoice. The original development, however, added a field to each of the feedback record types. I think that means that the employer's archived flag is on the ProjectFeedback record and that the freelancer's archived flag is on the CompanyFeedback record (though I may have that backward). So, for a freelancer, when checking the IsArchived, you're looking at one field in one feedback table (CompanyFeedback), but for an employer, you'd be looking at a different field on the other table (ProjectFeedback). If the feedback record doesn't exist, the invoice has not been archived.

Certifications and Partnerships


Note: This feature does not yet exist in the product. It was evaluated at a business level and we reached the technical feasibility stage before it was pushed out of the queue of active projects. The following are some thoughts about that project and the possible implementation. We want to allow freelancers to define partnerships they have with third parties and/or certifications they have received from vendors, but we do not know specifics about how we will need to verify these associations or other details about specific situations. This leads me to believe that we must have a flexible system in order to accommodate the unknowns, yet we want to limit the complexity to ensure a minimal development timeframe, simpler maintenance overhead, and high performance in the production environment. Last Saved 2013-09-27 07:48:00 by Michael Smith Page 10

My initial thought was that we'd use code instead of data to manage the potential variations of partners and certifications that we wish to support. This, however, does not necessarily meet the goal of a minimal development timeframe if we assume we'll be continuously adding partnership options and certifications (since each company/institution handles these situations differently, they all need to be added manually). But, verification of the partnerships will likely vary greatly and cannot be performed solely using a data-driven approach. After some thought, I believe the answer here is to segregate the verification process from the data while providing for data storage specific to the verification method. Then, we can add new associations using a completely data driven approach (e.g. an administrative web interface) and only need to do code updates when we add new specific verification methods (which should likely occur much less frequently since most of the associations are likely to provide only for a pre-built manual verification method) Let's start by listing data that we expect to be shared for all associations, whether they be formal corporate partnerships, a vendor's technical certifications, or a number of other similar themes:

Name/Title: A brief title for the association Description: A more detailed description of the association that may help to disambiguate this from other, similarly-named associations Grouping hierarchy: partnerships and certifications typically have a hierarchy to categorize them. For example, a top level grouping of Microsoft would have sub-levels for Gold Partners, Silver Partners, Certified IT Professionals, Certified Professional Developers, etc., and certifications may be further grouped by category within those top level certification programs before you get to the actual certification (i.e. the association that the freelancer has). An association would need to belong to at least group within the hierarchy, but the depth of the hierarchy above that is unspecified. It may also be possible that the same association could belong to multiple groups within the hiearchy since there may be multiple ways to cagetorize individual programs. Note: Each grouping level in the hierarchy would label it's child associations with an enumerated type for the association (as fixed by Guru): certification, partner, membership, endorsement, etc. When projects are limited by associations or employers search for freelancers with specific associations, the inputs would include either specify a specific association or an association grouping that could be at any level in this grouping hierarchy. Hierarchy branches may be limited to specific categories or category disciplines for easier navigation as the tree becomes large. Verification method(s): This specifies which of an enumerated set of verification methods is to be used to verify a freelancer has this association. The most basic of these, and probably the only one we would have to start, would be manual substantiation whereby a freelancer uploads a scanned document of some kind that indidates the appropriate association is valid. These uploads would be reviewed by administrative personnel to perform the actual verification (i.e. the act of uploading doesn't verify; the review process does). Alternatives may be a process that accepts a URL input (instead of a file upload) that can be manually reviewed to perform the verification (e.g. a URL to a list of partners on a vendor's website). There may be multiple methods allowed for performing verification for a given association (e.g. if a verification allows different sets of credentials to be used, we may have different verification methods for each available set of credentials).

When one of these associations is made with a freelancer, we add some additional data for the specific relationship: Association start: The date this association was created for the freelancer (note: this is not the date that the record would be added in the Guru system, but when the association was actually made by the institution providing the association; e.g. the date the partnership began or the date the freelancer was awarded a certification) Expiration date: After this date, the association has expired and should no longer be considered a current association. When not specified, the association is considered perpetual with no expiration date. This field can be used as a hint on when an association should be re-verified (i.e. shortly before the expiration date in case the association has been renewed) Verification date: The last time that this association was verified. When not specified, the association is unverified and may not be eligible for qualifying the freelancer for restricted projects or other benefits. It may be appropriate for some associations to be re-verified at regular intervals. In such cases, this date is updated if Last Saved 2013-09-27 07:48:00 by Michael Smith Page 11

the verification succeeds, the date could be cleared indicating it is no longer verified (likely to have expired), or remain unchanged if the verification failed but not in such a way that indicates the association is no longer valid (e.g. unable to contact a third-party webservice to perform the verification) Verification method: How was the last verification performed? Since an association can potentially be verified in multiple ways, this specifically defines which verification method was used. Also, this allows us to modify an association's available verification methods without requiring us to remove verification on previous associations, though in many (if not most) cases, new verification methods are likely to be more reliable and the historical verifications should be rescinded or at least marked for an earlier expiration with a notification to the freelancer that they should re-verify using the newer method to avoid having their association removed. Verification data: This is an arbitrary blob of data that is used by the referenced verification method to perform a verification or re-verification. Population and interpretation of this data is left up to the verification method. It may be the contents of an uploaded document (or a pointer to the uploaded document), it may be a URL, or it may be some encrypted XML containing user names and passwords, or anything else needed to perform the verification.

We'd probably also want to maintain history/audit information for when verifications are attempted and what their results were (which may have method-specific details in them like error codes or user IDs for who performed the verification), but I'll defer any details on that for now. Before getting into additional thoughts on implementing validation, I thought I'd touch on a potential implementation detail for the association organization hierarchy. Since the depth of this organization tree could easily exceed 6 levels, a simple adjacency list (i.e. a link from one level to its parent) could easily result in a large number of queries to construct an appropriate tree hierarchy, sub tree, or breadcrumb path. There are alternative approaches that could be used, however, that would allow single queries to return the appropriate hierarchy information, minimizing the potential performance impact. Specifically, an approach called the "Nested Set Model" (and sometimes known as a "Modified Preorder Tree Traversal") as popularized by Joe Celko in his book, SQL for Smarties (and later in other books), or another approach called a "Path Enumeration" (and sometimes known as "Materialized Path"). A comparison between these methods is available here. Alternatively, database-specific Common Table Expressions (CTE) may be possible to perform the recursive queries in the database itself without additional round trips. In other words, there are implementation options here for representing the data that would allow for a performant deep hierarchy for organizing freelancer associations. Manipulating the hierarchy using NHibernate may still be a challenge, however. Now, for validation of the associations, we are trying to use a completely offline/asynchronous process so that the data is not tied to the custom validation processes. This should allow for supporting a variety of mechanisms from manual user-validation to an automated task that checks a web service provided by a third party vendor. With some modifications and/or additions to the Task Engine, the automated processes could even be performed in near realtime. The challenges will be with the two main data integration points. The first integration point would be the usercollection of validation data. The user interface could quite possibly be different for each validation method with different sets of data requirements. The second integration point is with viewing the substantiation for each validation method. This could be as simple as providing a single image for all verification methods saying "Verified by Guru.com" but it may be more realistic to provide additional specifics about the association. For example, for a certification, we may want to display the score the freelancer got on the certification exam. Even assuming the latter, I believe if we address the first integration point, we can use a similar approach for the second without loss of generality. Given a particular verification method, the UI must determine what and how to display the appropriate fields for the validation method and process those fields to construct a single consolidated data blob for storage with the freelancer's association for later verification. There are two main options from here. First is a static code-driven approach and second is a dynamic, data-driven approach. More specifically, the UI can either instantiate specific user controls for specific validation methods (as specified in code), or it can try to dynamically construct appropriate fields using descriptions of such fields defined in data (example tutorial, also called metadata-driven interfaces). While my initial instinct is to shy away from a dynamic interface due to the inherent complexities involved, the two are not necessarily mutually exclusive. By implementing a code-driven approach first, we could always build a particular code-driven verification method that dynamically constructs its user interface without impacting other code-driven approaches. Taking this approach would allow us to build a few code-driven interfaces first to evaluate true Last Saved 2013-09-27 07:48:00 by Michael Smith Page 12

requirements of the validation inputs and see how to best customize the static displays with dynamic data. If, however, we find little commonality, it may be difficult to build such a dynamic-driven interface and we're no worse off. On the other hand, if we start with the dynamic approach, we would be primarily guessing at what it might need to handle and end up building extra capabilities that aren't actually useful and/or having to re-build parts that we thought would handle future cases, but don't in fact handle such cases. On a slightly related topic, it may not have been obvious in the previous post, but an individual association is not necessarily restricted to be created internally within Guru. We could allow users to create their own associations and automatically place them in a special branch of the categorization hierarchy for user-created associations. We'd likely want to add a flag to indicate whether it was user-created or not (or, maybe even an enumerated value that indicates the workflow state for the association if we add approval states to indicate whether a user-generated association is approved for display without being verified). User-created associations could be re-categorized at a later time when a more formal structure is introduced for the user-provided association and the original user-created ones could be reclassified under the formal structure. The concern here would be that users would not look through the list of existing associations and instead always create their own, requiring our resources to re-map these user-created ones to existing ones. One potential way around that may be to have these be separate "suggestions" from users to create new associations that, while possibly stored in the same place, would not retain the same functionality as provided by those that were defined internally.

Product Infrastructure
Service Locator
Guru.Business.Service.ServiceLocator is a core piece of infrastructure used by the business layer. It is a

custom-built implementation that provides services similar to an "Inversion of Control" or "Dependency Injection" container though the pattern differs slightly. Martin Fowler describes the Service Locator design pattern in his article on dependency injection. This article provides a very good overview of the concepts involved. The ServiceLocator is a central location to manage the lifecycle of various components activities, data providers, etc. which we'll term "business services" for our purposes here. When a piece of code needs to use some other business service, it can use the service locator to locate the proper object reference. TheServiceLocator, then, has the responsibility of determining an appropriate implementation for the business service and providing it. Further, the ServiceLocator will ensure that the service is initialized and cached within appropriate contexts. Some business services may want to be used like a singleton where only one instance exists in the entire system while others may want to use unique objects for different execution contexts. The ServiceLocator has the responsibility to return the appropriate instance. One of the main benefits of using such a thing is to facilitate unit testing. In a unit test, when testing a component, you don't really want to require all the dependencies to be setup and functional in order to test your component. So, using the service locator, you can tell it to use different instances of the dependencies (custom-built mocks or stubs or whatever) rather than the actual objects. Since your component gets its dependencies from the ServiceLocator, it will end up getting these replacement objects which make your test more reliable since it's not dependent on external systems. Within our system, we generally use the BusinessFacadeFactory to construct and manage a single ServiceLocator for the application. The factory registers all internal business services and initializes data provider implementations using a configuration file. One such service is the BusinessFacade which provides "public" access to the business layer functionality which the factory exposes via a public static method. The implementation of the BusinessFacade is basically just pass-through lookups to the service locator to expose business services that should be available publically. Together, the BusinessFacadeFactory and BusinessFacade provide convenient access to business services for applications. The UnitOfWork concept also comes into play here, but there is some work that needs to be done to properly integrate these concepts (see UnitOfWork, below).

Last Saved 2013-09-27 07:48:00 by Michael Smith

Page 13

In theory, the BusinessFacade should not be used within the business layer itself. Instead, the business layer component should accept a ServiceLocator instance as a constructor parameter and use the locator directly for looking up dependencies (doing so facilities proper unit testing). For activities, this pattern is defined within the BaseActivity class that can be extended by activity implementations; the base provides convenient access to services that are used across many activities. Unfortunately, however, not all business components can be registered and constructed by the ServiceLocator and thus some business components (like some persistent entities that are constructed in NHibernate) will need to use the BusinessFacadeFactory for accessing the BusinessFacade which provides access to a ServiceLocator. With unit tests, however, this ServiceLocator is likely not the custom ServiceLocator instance constructed for the specific test. This issue remains open. Fortunately, NHibernate does provide facilities to construct instances of entities without requiring a no-argument constructor. This would allow us to construct the entities using specialized constructors that take a ServiceLocator instance when such entities depend on other business services. Using this feature, however, has not yet been implemented.

UnitOfWork
The BusinessFacade and BusinessFacadeFactory were introduced to provide simple access to business methods. As a requirement to use this access mechanism, one must also use the business facade's Lifecycle to ensure there is an active unit of work. This is non-obvious and a little convoluted but hasn't been an issue since the proper lifecycle management has been performed for the Marketplace website using an IHttpModule named GuruBusinessModule. With the introduction of the task engine and its related tools, this lifecycle management became more difficult to manage. The task engine required the ability to create a nested business unit of work so that tasks can be executed in pseudo-isolation from the engine itself. Another complication was that the engine's lifecycle (long-running) differs substantially from that of the lifecycle of a given web request (short-lived). These challenges caused some evolution of the lifecycle management and resulted in a new class, UnitOfWork, that encapsulates much of the lifecycle management. After making the changes to use this new UnitOfWork in lieu of direct calls to Lifecycle for demarcating the beginning and end of a unit of work, I realized that the UnitOfWork could be combined with the BusinessFacade and BusinessFacadeFactory to ensure that a facade is not accessed or used outside a unit of work. The construction and destruction of the UnitOfWork class handles the lifecycle aspects of the business layer while the BusinessFacade and BusinessFacadeFactory handle access to the functionality aspect of the business layer. They are also both intertwined with a ServiceLocator.

Lifecycle
Regret: Lifecycle should separate authentication from transaction lifecycle. Unfortunately, the web authentication model makes this a little difficult since authentication information is not available until after our business transaction (the business lifecycle) has started, yet authentication should, in theory, be relatively consistent over the course of a business transaction.

Configuration
Logging While log4net is tightly coupled to our codebase (we directly use its logger interface and factory for finding loggers), it doesn't make much sense to proliferate the configuration of log4net to every application. For this reason, the configuration of log4net has been abstracted into the logging configuration class, Guru.Configuration.Logging.LoggingConfig. Now, applications that wish to initialize the logging subsystem should make the following call if they wish to ensure logging initialization:
Guru.Configuration.Logging.LoggingConfig.InitializeLogging();

Alternatively, a "using" statement could be used: Last Saved 2013-09-27 07:48:00 by Michael Smith Page 14

using Guru.Configuration.Logging; ... LoggingConfig.InitializeLogging();

The business layer will automatically initialize the logging subsystem if it is used through its public API (i.e. Guru.Business.BusinessFacadeFactory.Acquire()), but if logging must be initialized before the business layer or if the business layer is not explicitly being used (e.g. for a testing tool), this initialization method should make configuration of logging a little simpler.

Exception Handling
For business methods that are exposed publically, only well-defined exceptions should be thrown. That is, if a business method is using some third-party library or making database access or whatever, rather than requiring the code that calls into the business layer to "guess" what underlying exceptions may be thrown, the business layer should define specific business layer exceptions that will wrap any potential underlying exceptions. This is the reason that some exceptions are re-wrapped in other exceptions. As for whether to throw an exception or return null, it depends on the circumstances. My general rule is to ask whether the situation is an "error" condition. For example, if you're looking up a particular entity but you're not sure if it exists or not, it's not necessarily an error for that entity to not exist. In such cases where this isn't really an error, we generally will use a non-exception mechanism (e.g. return null). On the other hand, if we expect a result (e.g. looking up by ID that should be valid), then an exception is appropriate if such a result isn't found. In most cases, an exception could be considered an error of some kind. It could be caused by bad/unexpected user input or an actual bug of some kind. There are probably a lot of inconsistencies in this though, as things may differ on a case-by-case basis where the guidelines are not followed for one reason or another. Oh, one other thing. Exceptions are generally only caught if the code catching the exception can handle and recover from the error. Otherwise, the exception should be propagated to a higher level (and in such cases, it is sometimes appropriate to wrap the underlying exception within another exception).

Back End Auto


Sending email, especially for Project Matches periodically fails en masse. To address this, there is a Windows scheduled task that moves the files from the failed email directory to the spool directory to retry sending those emails. This workaround will not work if Blue Dragon is upgraded since the latest version of Blue Dragon does not monitor the spool directory and send all files in it; it will only send files that it knows it had queued (which also means that a restart could cause a loss of emails).

Task Engine
Design document:

Task Engine Framework.docx

Early Design Considerations Should tasks executed by the task engine be consumers of the business layer (i.e. contain no business logic themselves; instead just invoking business activity methods) or should tasks be considered an integral part of the business layer itself? In other words, should a task run by the task engine execute specialized business activities or should the tasks themselves contain all the business logic for the operations performed by the task?

Last Saved 2013-09-27 07:48:00 by Michael Smith

Page 15

The first thing to consider is how intrinsically related the tasks are with the business layer. For this, I question whether task implementations should be in the Guru.Business namespace and assembly, or whether they could be pulled out into a separate assembly and exist independent of the basic business layer. The tasks implemented so far (refreshing time zone data from the windows registry, creating search data for full text indexing) and the anticipated future tasks (updating published status of landing page links, refresh of available skill tests from expert rating, batch email processing) are all core business functionality and thus the implementation would certainly be appropriately implemented somewhere within Guru.Business. That said, in theory, the logic could exist in Guru.Business as fat business methods that are called by a very thin task layer where the task exists outside the main Guru.Business namespace and assembly. Im not sure what the point of doing this would be, but intentions aside, theres another difficulty. Tasks are assumed to be non-trivial operations and thus there is a feedback mechanism for task implementations to report feedback and progress of the task execution and for task executions to be interrupted gracefully. This mechanism is currently specific for tasks, so any feedback and progress related items would need to be on the task side of the equation. Further, in order to take advantage of bulk processing where caching and other similar techniques can be used to improve task performance and system impact, more logic would necessarily need to be on the task side than in the fat business methods. With all that in mind, I would find it hard to believe that you could actually have a proper thin task layer that executes a fat business method thats not specific to task executions. This argues for building the task implementation directly into the task itself rather than some separate business activity. At this point, we could consider that a task is really just a specialized form of a business activity. Whereas our existing business activities are relatively small business functions exercised by the presentation layer within the marketplace website, tasks are heavier business activities executed internally with the task engine. Ok, so tasks go somewhere in the Guru.Business namespace and assembly. But where? We want all business-logic related to particular functions or entities to be grouped together; the closer the relevant code can be, the easier it is to maintain when changes are required in the future since theres a narrower scope to search for possible impacts of such a change. Initially tasks were in a separate namespace (Guru.Business.Tasks). The hierarchy used there mimicked the tree already defined under Guru.Business.Activities, and future tasks would likely share a similar categorization. To reduce the number of locations where the business logic would be located (whether for a task or for other business activity), tasks were merged into the Guru.Business.Activities namespace. After all, we just labeled tasks as a specialized form of activities, so why not locate them there? That was a little bit of a tangent and theres still an issue to address with this new task. The task implementation may depend on the internal implementation of another activity. Its not really possible to merge them into a single class since the task must inherit from the base Task class (though we could implement the task as an inner class, I believe that could get very ugly very quickly). So, do we duplicate that shared code? Or is there some way to cleanly re-use it from the general business activity within the business task? I should note that the code in question is not a publically accessible business activity method; its a helper method used within the business activity methods. Lo and behold! Theres a very simple solution to this: the internal code can be marked internal! So, create the shared internal activity methods in the activity as internal methods. Theyre only usable by the business layer which includes both general activities and the tasks. Perfect! Conclusions: Tasks are special forms of activities and should live in the same namespace tree with the existing activities When tasks and activities have common code that shouldnt be publically available in the business API, it should be defined in the activity with an internal access modifier.

Executing a Task Engine There are two current methods for executing task engines.

First, is a windows service that will be what we use for production. This is the TaskEngineService project in the Tools folder of the Everything solution. The easiest/best way to install this service is to use the installer named TaskEngineServiceInstaller found in the Installers/Tools folder of the Everything Page 16

Last Saved 2013-09-27 07:48:00 by Michael Smith

solution. This installer is configured to build only with a release build, however you can manually select it and build it in debug mode (you may need to explicitly build the Tools/InstallSvc project as well; even though thats marked as a dependency, Visual Studio sometimes fails to build it first). After installation, you will need to create an appropriate Guru.Configuration.dll.config configuration file. On the last page of the installation wizard, theres a note reminding you about this and provides the path where the file must reside. See below for additional information about configuring a task engine.

A command-line engine testing tool can be useful for running an engine on a temporary basis without incurring the overhead of building the installer and performing the installation. This project is called TaskEngineTester and is found in the Tools folder of the Everything solution. After building the project, you will need to create an appropriate Guru.Configuration.dll.config configuration file that must be stored in the bin/Debug or bin/Release directory of the project (for debug mode builds or release mode builds, respectively). See below for additional information about configuring a task engine.

Task Engine Configuration Configuring a task engine is performed in the same way that the marketplace website is configured (with one caveat see below): You create a Guru.Configuration.dll.config file (or symbolic link) in the same directory as the Guru.Configuration.dll file and its best practice to save this configuration in the Configs/Environments directory using a unique name to save your specific environments configuration for future use. My personal environment configuration for the task engine on my machine can be found at Configs/Environments/Environment.msmith-TaskEngine.config. One major caveat: You do not want the task engine to use the same log files as other applications that may be running on the machine since they would conflict with each other and you could lose logging information. To account for this, its highly recommended that you use a different log4net configuration file (one that targets a different file location) than the log4net configuration you may be frequently using. Note the differences between my personal environment configuration file for the task engine (Environment.msmith-trunk-TaskEngine.config) and my general environment configuration file which I use for the Marketplace (Environment.msmith-trunk.config). To aid with this logging file location issue and to establish a standard for logging locations, there are two new log4net configuration files that are pre-defined for use with task engine tools: log4net.c-logsTaskEngine.config and log4net.c-logs-TaskEngine-debug.config. These will log all information to c:\logs\TaskEngine.log which will not conflict with the existing log files created for the Marketplace application. Alternatively, for command line tools like the TaskEngineTester, you can use the log4net.console.config or log4net.console-debug.config configuration files which will send log4net output to the console. These configurations are not useful for websites or windows desktop applications since there is no console for those types of applications, but for command line tools they can be significantly easier than having to open the log file alongside the running command line application to monitor its activity. Engine GUID vs. ID? What is the purpose of Guru.Business.TaskEngine.Engine's Guid property? Q: A task engine has both an ID and a GUID in the Engine class. What is the purpose of the GUID and why can't the ID be used instead? A: This is an artifact of the way the Engine is simultaneously a thread of execution performing the duties of a task engine and a data entity representing the Engine's persistent state. At initial creation, the Engine is not persistent and thus does not have an ID assigned, but a GUID is generated for it to ensure there is a unique identifier for the Engine that can be used within the thread of execution (the implementation uses this GUID as part of the thread name). An ID is not assigned for the Engine until the entity is persisted to the database which occurs within the thread of execution after the thread of execution must already have an identifier for itself. Another way of looking at this is that the ID is the persistence identifier while the GUID is the thread of execution identifier. Both are persisted to allow correlation when necessary. In theory, the GUID could also be used as the persistence identifier as well, however this design choice was rejected since the Engine's primary key is referenced from the TaskQueueEntry, there will likely be a substantial Last Saved 2013-09-27 07:48:00 by Michael Smith Page 17

number of queue entries, and using a simple integral ID for the primary key would thus have substantially less storage requirements.

NHibernate
Gotchas Using not-found=ignore in a mapping file will force eager loading of target object since NHibernate must check to see if they object exists to determine whether a proxy should be created or not (and in that case, it doesnt even bother creating a proxy since its loading the object anyway). See bug 6958. Nullable database fields that are mapped to non-nullable properties (or where the mapped entity will automatically convert the null value to something else like an empty string) will cause ghost updates where a read operation could modify the object and thus have it be written at unexpected times. The worst consequence of this would be replacing something like a null date with a date value of 0 (i.e. some really old and incorrect date). See task 5807 for analysis on type mismatches that likely need to be corrected. NHibernate defaults to using nvarchar for string parameters. When string-matching, if theres an index on a varchar column, SQL Server wont use that index with the nvarchar parameter because the collation sets are different (it needs to convert the data on each record for proper comparison on each query). In order to avoid the performance hit, when using a varchar data type in the database with an index and matching on that column in a query, you must explicitly specify the database type AnsiString in the mapping for that particular object (e.g. see RegisteredUser.hbm.xml). I do not know whether this somehow may apply to full text catalogs and full text searching. For more details, see: http://blog.brianhartsock.com/2008/12/14/nvarchar-vsvarchar-in-sql-server-beware/ NHibernates Future which allows you to defer query existing so that queries can be combined in a single batch does not properly utilize the query cache so using a future will never use a cached result. Further, it does not seem to support a single object lookup using Get (which may just pull object from the session). These items have prevented us from further exploring the usefulness of this capability.

Caching NHibernate's second-level query cache does not automatically cache the resulting entities of the query. This can lead to query execution problems in certain circumstances if the underlying entities could be removed from the query result set. Here's a specific hypothetical example: When searching for profiles, the query is marked for query caching in a cache region that has an expiration of 60 seconds. The resulting entities from this query, the ProfileSearchResult entity, are not setup for caching. Now, we have the following actions occur: 1. A query is executed in session 1. 1. NHibernate searches its query cache, finds no matching queries 2. The query is sent to the database 3. The resulting entity identifiers are cached with the query in the query cache 2. The query is re-executed in session 2 while the query is still cached in the query cache (within the 60 seconds) 1. NHibernate searches its query cache and finds a matching query 2. NHibernate looks in the entity cache for entities matching the identifiers 3. Since no cached entities are found, NHibernate queries the database to populate the entities 3. In session 3, one of those results is then changed such that it would no longer exist in the results (e.g. the profile is hidden) Last Saved 2013-09-27 07:48:00 by Michael Smith Page 18

4. Now, in session 4, while the query is cached in the query cache (still within the 60 seconds), the query is once again re-executed 1. NHibernate searches its query cache and finds a matching query 2. NHibernate looks in the entity cache for entities matching the identifiers 3. Since no cached entities are found, NHibernate queries the database to populate the entities 4. The database, however, does not find the desired entity that was removed 5. NHibernate throws an exception Now, one might ask: why doesn't NHibernate remove the query from the cache with the changes made in session 3 since the underlying data has changed and the query results may no longer be valid? Well, NHibernate does do this, so this scenario doesn't actually happen in most circumstances. In this case, however, the query and the query results are made on a view while the modifications are made on the underlying table. NHibernate does not know that the vSearchableProfile view is based on the tProfile table, so when a profile is hidden by marking the record in tProfile, NHibernate is unable to explicitly flush the cache of the query. This situation can only be involved in one of two ways: removing caching altogether or ensuring that the entities are cached for longer than the query itself. Since caching is necessary to meet performance goals, we must use the entity cache. With the entity cache enabled, we see the following behavior (using a query cache of 60 seconds and an entity cache of 75 seconds): 1. A query is executed in session 1. 1. NHibernate searches its query cache, finds no matching queries 2. The query is sent to the database 3. The resulting entity identifiers are cached with the query in the query cache 4. The resulting entities are cached in the entity cache 2. The query is re-executed in session 2 while the query is still cached in the query cache (within the 60 seconds) 1. NHibernate searches its query cache and finds a matching query 2. NHibernate looks in the entity cache for entities matching the identifiers 3. The matching entities are returned. 3. In session 3, one of those results is then changed such that it would no longer exist in the results (e.g. the profile is hidden) 4. The query is re-executed in session 4 while the query is still cached in the query cache (within the 60 seconds) 1. NHibernate searches its query cache and finds a matching query 2. NHibernate looks in the entity cache for entities matching the identifiers 3. The matching entities are returned no differently than in session 2 5. Now, in session 5, the query is re-executed, but after the query has expired from the query cache (e.g. after 60 seconds) 1. NHibernate searches its query cache, finds no matching queries 2. The query is sent to the database 3. The resulting entity identifiers are cached with the query in the query cache 4. The resulting entities are cached in the entity cache, overriding previously cached values (if any) and resetting their timeout.

Last Saved 2013-09-27 07:48:00 by Michael Smith

Page 19

Since the entity cache is re-populated when a query is re-executed after it expires from the query cache, it is impossible for the entities to not exist in the entity cache while the query does exist in the query cache, as long as the entity timeout is greater than the query timeout. If, however, the entity timeout is less than the query timeout, then there's a window (between the entity timeout and the query timeout) where we are vulnerable to the original scenario described here (consider that the original scenario is equivalent to using an entity timeout of 0). So, if an entity can be removed from the underlying table (or, equivalently from NHibernate's perspective excluded, using a WHERE clause on the NHibernate mapping) and a query cache is used for queries, the entities themselves must also be cached to avoid potential issues with out-of-band modifications and that cache region must have an expiration greater than the cache region used for the query cache. Session Management Since early December 2009, I was wrestling with issues related to our usage of business transactions and transparent persistence specifically with NHibernate's Session (it's unit of work concept) and its ability to automatically detect and flush changes to data. Here are some basic facts, background, and description of the problem:

We want our business layer to (eventually) manage the enforcement of all business rules, including data validation and workflow operations. Some business rules can be enforced immediately and directly (e.g. a setter on an entity can enforce a validation rule on the format and length of a particular property) Some business rules must have enforcement deferred when the rules depend on multiple data items that are set separately (e.g. when modifying an address, if you want to validate that the zip codes is in the appropriate state, you wont be able to use the zip code and state property setters since setting one would make the address invalid; youd need to have a separate method that takes both simultaneously, but having separate methods like this for validation isnt always convenient, so youd want to defer validation until both properties are set and then validate). Some business rules could be context sensitive the rules depend on the context of the workflow being executed (e.g. when youre modifying your basic profile information, there shouldnt be any changes to feedback records; thats a separate business workflow/rule/process) When a business rule fails to validate, we do not want to persist the changes (i.e. we want to abort our business transaction) When a workflow is in progress, we only want to allow changes related to the specific workflow NHibernate can automatically determine values that have changed and persist the changes without other intervention (i.e. a call to ISession.SaveOrUpdate is not necessary, even though most the existing codebase seems to think that it is)

My primary concerns are as follows:

There seems to be no way to prevent a malicious/ignorant developer from acquiring one piece of data (e.g. an invoice), modifying it within the bounds of basic data integrity (i.e. setter validation is ok), but instead of using the business process/workflow to perform (and fully validate) that update, they perform some other, unrelated, business process which causes all changes to be persisted those for the actual business processing and those for the unrelated entity. So, for example, if we want to send emails to someone when a change happens or queue a background operation, such change may end up bypassing that part of the workflow since the change wasnt made in the context of the appropriate workflow.

The current method of validating objects (along with some basic property setter Check.Require calls) is to use an interceptor in the data layer to validate the object before they actually hit the database. This has been sufficient so far, but the validation has no context and thus cannot validate for the current business process/workflow. Further, it is significantly more difficult to handle validation errors in a user-friendly way (and will thus result in a pink error box or a full IIS error page). Last Saved 2013-09-27 07:48:00 by Michael Smith Page 20

Three options were considered to address these concerns: Option 1: Have all communication with the business layer use data transfer objects simple objects used only to transport information from the presentation layer to the business layer so that the transparently-persistent business entities can only be used within the business layer where their use could be more specifically defined according to business rules. Option 2: Have the business layer expose a service interface (e.g. using web services) that must be used for all business layer access. This is similar to option 1, but instead of manually defining the objects used for passing to and from the business layer with each business method, a service contract is defined and the objects are automatically generated based on that contract. Such code generation is largely pre-built with web services in .NET. Option 3: Enforce with policy and assume that the presentation layer will not improperly invoke business methods outside appropriate use-cases. Ultimately, the decision reached at an internal discussion evaluating these options was to pursue the use of web services for the business layer interface (option 2), largely in order to support a long-term vision of supporting alternate thirdparty user interfaces that would need the same interface contracts and standardized business layer invocation mechanism (e.g. SOAP). The next long-term steps are to define an appropriate migration path and a basic framework for a webservice interface to the business layer that will be able to appropriately handle authentication and proper isolation of business transactions (aka units of work). While not intending to change this decision for the long term, there is another option that was not considered in that discussion and may come in handy for implementing isolated business transactions in the short term. Rather than using a limited view of the system that there can be only a single active business transaction at any given time, consider that the application could instead use nested business transactions. The presentation layer, at the top of the application stack, would open a business unit of work for presentation-related items. Business methods invoked from the presentation layer would construct their own units of work, isolated from the presentation layer, for performing actual business processes. More specifically, the top level transaction would be no different than what exists now. Entities can be loaded and modified for whatever reason. But, we would ensure these modifications are not persisted to the database by making sure the underlying NHibernate ISession is set to have a FlushMode of Never. This ensures that the session will not automagically persist any changes made to the entities. We still end up with the potential hole where a business layer operation invokes FlushChanges on the session which would allow those improper changes to be persisted. That's where the nested transactions come into play. Any business operations that are invoked would create a nested business transaction they would open another NHibernate session. Entities that were expected to be modified for the business operation could be attached to the nested session (after being detached from the outer session) without needing to re-query the database for the entities. The business operation would then only call FlushChanges on the nested transaction which contains only those entities it has specifically blessed as appropriate per the business operation. After the business operation is complete, it could re-associate the entities with the outer session and return control back to the presentation layer where at the end of the request, any excess changes in the session would be discarded. This approach, however, has not been implemented or pursued further (the above was written in early 2010). We are generally following the Option 3 approach described above. Nested units of works are, however, possible by using the UnitOfWork class. It was implemented for the Task Engine that needed to segregate units of work from one task to another while maintaining its own unit of work for self-management of the engines and queues.

Miscellaneous Utilities
Base 30 Little endian base 30 uses the ten digits 0 through 9, followed by capital letters of the English alphabet excluding the vowels (a, e, i, o, and u) and y arranged in little endian order. That is, the digits used are:

Last Saved 2013-09-27 07:48:00 by Michael Smith

Page 21

0 1 2 3 4 5 6 7 8 9 B C D F G H J K L M N P Q R S T V W X Z

With representing a two digit base 30 little endian number, one is represented with '10', ten is 'B0', and thirty is '01'. For additional details, see Guru.Utils.NumberUtility. Base 30 is used to compact length of numerical values when they need to be represented as textual strings that cannot use binary formats or include special characters. The selection of digits chosen for base 30 prevents the formation of potentially offensive words by excluding any and all vowels.

Development Practices
Merges/branches
The Expires headers in the web.config found in the js, css, images, and AppThemes subdirectories of Websites/Marketplace should use the real date in the release branch and an old date in Trunk so that forward development doesnt require cache reloads (itll get the old date).

Database schema dumps


Stored in /Database/Schema/ Purpose is for more easily determining differences between schema versions (i.e. whats changed). Side benefit is that the schema can be retrieved from source control and searched using file-based text search tools (findstr, grep, etc.) which is useful checking for usages of a column, view, or stored procedure which you believe is obsolete (yet may be used within other views or stored procedures). Updated using instructions defined in /Database/SchemaUpdateInstructions.docx Intention was that each change made to the code that involved a change to the database would have a script to make the change on the current production database (and maybe another to update a dev/qa database that already has partial updates) along with the updated schema files as generated by the script. Such use would allow you to more easily see the old version of the schema along with the new version and the asso ciated code changes. In other words, you would have a more complete view of the code and database changes at a glance all within a single changeset. Uptake of the schema dump tool and its use as been pretty much non-existent, largely through lack of communication about the tool and how to use it. Ive periodically updated things in source control to keep it from being too out of date.

Visual Studio
Editor configuration for CFM files I've been bothered with warnings and many unnecessary and incorrect autocompletions when editing cold fusion (cfm) files. I wanted to change Visual Studio to open the files as plain text files, but there was no direct way to do this. I could associate the extension with "Source Code (Text) Editor with Encoding" but this caused an annoying dialog each time a file was opened to select the propr file encoding (for which I used "auto" pretty much all the time). It also caused issues when the file was re-opened (e.g. double clicking in the solution explorer) where it doesn't recognize the file's already open, gives some weird dialog along with the encoding thing, and then opens a new window alongside the old one. I tried other editors, but they all seemed to have issues. Why couldn't I just use the regular text editor that's used when you open a text file? Well, you can. Here's how. Open the registry editor. Page 22

Last Saved 2013-09-27 07:48:00 by Michael Smith

Search for all values named "ExcludeInFileExtnMapping". Generally these values will fall under a key named "{8B382828-6202-11d1-8870-0000F87579D2}" (which is the key associated with the plain text editor factory) with a (Default) value having data of "Source Code (Text) Editor". Change the data for "ExcludeInFileExtnMapping" from 1 to 0 for each instance in the registry (I'm not sure which is the right one to edit). Restart Visual Studio. Now, in the Tools Text Editor File Extension panel, type in the extension (cfm), select the nowvisible "Source Code (Text) Editor", and click "Add"

Unfortunately, you lose all semblance of syntax highlighting, but I'll manage. This is still better than the alternatives in my opinion. Web site worker process has been terminated by IIS Sam shared this helpful hint via email regarding debugging web applications running in IIS using the Visual Studio debugger: I dont know if anyone else has had this problem but sometimes while stepping through breakpoints in the code your local IIS times out and you have to start all over again. The article below talks about the settings in the Application Pool that will prevent that from happening http://msdn.microsoft.com/en-us/library/bb763108(VS.90).aspx

Load Generation for Testing


There are three load generation tools. Each generate load in a specific area: freelancer search, project search, and login. Each of these is configured in the normal manner by creating a Guru.Configuration.dll.config file alongside the Guru.Configuration.dll file (which should be in the same directory as the exe file). When creating the configuration file, the log file to configure should be a console based logger target (e.g. log4net.console.config) rather than a file based logger target (e.g. log4net.c-logs.config). Since youre running the exe from the command line, its useful to see the logs right there in the command line and it cannot conflict with any other applications running on the machine in this manner. See Environment.msmith-trunk-console.config for an example. FreelancerSearchLoadTester Selects common freelancer search queries and executes searches using those queries. Queries are common if theyre defined as landing pages, so this is really a load generation on the landing page search queries. Outputs indicate how long the queries took to execute. These times, however, should be taken with a grain of salt unless there are only a few concurrent users since loading with a large number of users will bottleneck the CPU and network and artificially inflate the execution times. Usage:
FreelancerSearchLoadTester [options]

With any of the following options:


-users number specifies the number of simulated users making search requests. This is the number of

concurrent queries that will be executed.


-requests number specifies the number of queries made per user (number of requests per thread). -zeros true/false Alters the output to display or not display zero counts for certain query times.

Generally, the output includes a time range and the number of queries that took that time range. With -zeros true, all time ranges are displayed even if no queries fell into that time range (displaying a zero). When false, the time ranges that did not have any queries take that amount of time would not be displayed. Last Saved 2013-09-27 07:48:00 by Michael Smith Page 23

-seedwith number the number of queries to execute per user prior to executing the timed queries. This

ensures some level of cache initialization for loading dependent objects such as cities and countries. ProjectSearchLoadTester Generates arbitrary project search queries and executes searches using those queries. Specific refinements are added based on random numbers; the code can be tweaked to more or less frequently include a specific option as necessary. The ratios in the code at the current time should have been based on a random sample of project searches performed in the production environment, however query patterns can vary and may have changed since last updated. Outputs indicate how long the queries took to execute. These times, however, should be taken with a grain of salt unless there are only a few concurrent users since loading with a large number of users will bottleneck the CPU and network and artificially inflate the execution times. Usage:
ProjectSearchLoadTester [options]

With any of the following options:


-users number specifies the number of simulated users making search requests. This is the number of

concurrent queries that will be executed.


-requests number specifies the number of queries made per user (number of requests per thread). -zeros true/false Alters the output to display or not display zero counts for certain query times.

Generally, the output includes a time range and the number of queries that took that time range. With -zeros true, all time ranges are displayed even if no queries fell into that time range (displaying a zero). When false, the time ranges that did not have any queries take that amount of time would not be displayed.
-seedwith number the number of queries to execute per user prior to executing the timed queries. This

ensures some level of cache initialization for loading dependent objects such as cities and countries. LoginLoadTester This load generation program generates simulated login requests for a set of 1000 randomly selected users. The login requests arent actually real logins; they are just a lookup by user name combined with an update to the last login time. This is largely used to test the impact of updates to the registered users (which happen relatively frequently on login due to the last login time). Usage:
LoginLoadTester [options]

With any of the following options:


-threads number specifies the number of simulated threads executing login requests. -logins number specifies the number of logins made per thread. -mindelay number Specifies the minimum number of milliseconds for a thread to wait before performing

another login.
-maxdelay number Specifies the maximum number of milliseconds for a thread to wait before performing

another login.
-zeros true/false Alters the output to display or not display zero counts for certain query times.

Generally, the output includes a time range and the number of queries that took that time range. With -zeros true, all time ranges are displayed even if no queries fell into that time range (displaying a zero). When false, the time ranges that did not have any queries take that amount of time would not be displayed. Last Saved 2013-09-27 07:48:00 by Michael Smith Page 24

Deployment Instructions
Preparing a Build for Deployment
1. Queue a release build using the appropriate TFS build definition and wait for it to complete successfully. 2. Open a remote desktop session to 10.100.76.30 using your personal guru-ecom domain account. 3. Open a Windows Explorer window at D:\Release_Archive 4. Ensure there is a subdirectory for the major release version (e.g. 3.23.x). If there isnt one (e.g. for a new major release), create one using the following steps: a. Create an appropriately named subdirectory for the new release. Replace the minor release number with an x to indicate it the directory can contain a number of hot fix releases for the same major release. b. Copy the deployprep.bat batch file from the previous major releases directory into the newly created directory. This script is used to prepare a build for actual deployment. Currently, it doesnt do too much other than move and copy some directories to make it easier to find the right files to deploy to the servers. c. If there are any changes to the deployment process or if there is a desire to automate more of the manual processes, make sure to update the preparation script (and update these instructions as necessary). 5. From the D:\Release_Archive\TFSBuilds directory, copy the appropriate build directory into the specific release directory (e.g. copy D:\Release_Archive\TFSBuilds\3.23.x_20110728.1 to D:\Release_Archive\3.23.x). 6. Rename the directory created by the copy in step 5 (e.g.
D:\Release_Archive\3.23.x\3.23.x_20110728.1) to populate the actual release number into the directory name (e.g. rename 3.23.x_20110728.1 to 3.23.1_20110728.1). Do not rename the parent directory used to group the minor releases (i.e. the 3.23.x)

7. Execute the deployprep.bat batch file using the absolute path to the renamed directory as a parameter. The easiest method of doing this is to drag the directory onto the batch file from the major release directory. This will re-organize the raw build files into a format more appropriate for deployment (and perform any other predeployment steps that are added to the batch file) 8. The release files directory should now contain top-level folders corresponding to the target machine where the build will be deployed. For example, the Admin directory is for deployment to the admin.guru.com machine, whereas the Marketplace directory is for deployment to the main web servers. 9. If a deployment is being made to the web servers, follow the following steps to generate the site map files; otherwise, skip to step #10. a. Open a command prompt window and change directory to the ExeTools directory for the release. b. Execute LandingPageURLs.exe Note: currently, the application complains on startup about not being able to find System.Core for .NET 3.5. The app still seems to run properly despite that. c. Hit G to generate the landing page site map file. The results will be in a newly created Output subdirectory of the ExeTools directory. d. Check the Output directory to see the number of site map files generated. e. If the number of sitemap files does not equal the number referred to in the sitemap_index.xml file in the root of Marketplace, you will either need to adjust the sitemap_index.xml file to point to the appropriate number of sitemap files, or you can adjust the settings in the LandingPageURLs tool and re-generate. Last Saved 2013-09-27 07:48:00 by Michael Smith Page 25

f.

Copy the resulting sitemap_lpages_x.xml files from the Output directory into the Marketplace directory that is a peer to the ExeTools directory.

10. For patch and hot fix releases, remove any directories that are not relevant for the release. This just keeps things clean and could help identify what was actually deployed for that patch or hotfix release. If need be, they can be recovered from the TFSBuilds directory (see step #5). 11. Your build is now prepared for deployment. The first part of deploying will be to get your build to the appropriate servers. In general, this is done by copying from the release directory to a new folder on the target machine. For example, copying the Marketplace folder to c:\new on each of the web servers.

Performing a Rolling Outage on Web Servers


1. Prepare the build as detailed in Preparing a Build for Deployment 2. Open a web browser on your local machine and load the F5 BIG-IP Control Panel: https://10.25.0.245/ 3. Login as the admin user (or other user as designated by IT) 4. Click on Local Traffic in left sidebar to open the Local Traffic sub menu 5. Click on Nodes nested under Virtual Servers in the Local Traffic sub menu. Here, you will be able to see the IP addresses and names of the machines behind the F5 load balancer. For a rolling outage we are only concerned about the machines in the production web pool (10.100.74.15/web1, 10.100.74.22/web3, 10.100.74.23/web4, and 10.100.74.28/web5). 6. Choose a machine to take out of the pool. Generally, you can go in any order for the servers if you are only taking one server out of the pool at a time, but its best to alternate between servers on one virtual host to the other (web1 and web5 are on GURU-VH10; web3 and web4 are on GURU-VH11) so that load does not balance unnaturally to a single physical machine. 7. Click the machines IP address. 8. Toggle the State from All traffic allowed to Only active connections allowed 9. Click the Update button to save the node state. 10. Open a remote desktop session to 10.100.76.30 using your personal guru-ecom domain account. 11. From the remote desktop session opened in step #10, open a nested remote desktop session to the machine chosen in step #6 using the 2kboss user account. 12. Allow time for the current users to transition off of the machine. You can monitor the number of active connections in three ways: a. Reload the URL reached in Step #7 (note: if you try to reload after submitting the form by clicking the Update button in step #9, your browser will ask about re-submitting the form. You do not want to do this repeatedly, though it likely doesnt matter too much. Instead, use the back button to view the page prior to submitting the form where the servers IP is shown in the URL as a query parameter and reload from there). From this page, you can watch the Current Connections property. b. In the F5 BIG-IP Control Panel, click on the Nodes menu option nested under Virtual Servers in the Local Traffic sub menu (which likely is already expanded due to step #4, above). In the content area, above the table listing the server IPs and names, click on the Statistics menu tab. This will show you the number of connections to the server in the Current column under the Connections banner. This page has an auto-refresh capability. c. On the server removed from the pool (via the nested remote desktop connection opened in step #11), open a command prompt and execute netstat -an | findstr EST. This will list all established socket connections on the machine. The number of connections visible on the F5 should be equal to the number of entries listed connecting to local ports :80 and :443 (the first column of IP:Ports is the local address). This is my personal preferred method when the connection counts are low. It allows me to Last Saved 2013-09-27 07:48:00 by Michael Smith Page 26

more easily review the specific HTTP requests being made on the connection since this provides the IP addresses of those with open connections which can be correlated with the log files. See Step #13 on why this is useful. 13. Frequently, there will be stubborn connections that linger for far longer than any other connections. These usually fall into four categories and their requests (as seen in the web log files) must be evaluated to determine the appropriate action. The log files can be found at c:\inetpub\logs\LogFiles\W3SVC1 on each of the web server machines. Only the most recent log entries from the most recent log file should be reviewed. a. RSS Consumers: Connections for RSS feeds that use a keep-alive connection that refreshes frequently can prevent the connection from ever closing and thus keeping the connection alive on the server after it is removed from the pool. Log file entries in this case will appear as requests for /pro/ProjectResults.aspx. Connections for IPs making such requests can be ignored. b. Automated browser refreshes: Some browsers or other software may check for changes to a particular page in an automated manner that keeps a connection open (similar to the RSS feed requests as detailed above). In general, such requests (usually for myadmin.cfm) that are made repeatedly and at regular intervals are not actual usage by a user and the connections for the IPs making such requests can be ignored. c. Spiders: Most web spiders will close and reopen connections after a fixed low number of requests. Some misbehaving spiders, however, will continue to crawl the website well after the server has been removed from the pool because the always reuse the existing connection. These requests generally appear in the logs as repeated requests for arbitrary landing pages. Connections by such spiders can be ignored. d. Legitimate traffic: A user may be using the site where they make requests for restricted pages and perform POST operations or other seemingly legitimate traffic with no extended periods of inactivity that would result in their browser connection being closed. This most frequently happens with users going through a long list of projects and submitting bids for many of the projects, but any traffic that looks like it might be a real user interacting with the site cannot be ignored. You must wait for these users connections to terminate gracefully to ensure you do not interrupt their activity mid-process. 14. Once there are no connections with legitimate traffic (see #13.d, above), shut down IIS by executing iisreset /stop from a command line on the target machine (using remote desktop connection opened in step #11): 15. Rename c:\Marketplace on the target machine to a unique name by combining the name with the previous release version. For example, when deploying 3.22.2 to replace 3.22.1, you would rename to c:\Marketplace_3.22.1. 16. Create the new Marketplace directory at c:\Marketplace by moving the Marketplace directory from c:\new or from the prepared build directory on .30 (see Preparing a Build for Deployment for instructions) 17. On the remote desktop connection to .30 opened in step #10, open a Windows Explorer window to
D:\LogArchive

18. Open the sub-folder associated with the target machine (e.g. when deploying to 10.100.74.15/web1, the folder name is web1 (.15)) 19. Execute the clear-x-logs.bat where x is the target machine name (e.g. clear-web1-logs.bat) 20. If the batch file reports you are not authenticated, you must open a Windows Explorer window to the target machines administrative share for its C drive and provide the credentials for the 2kboss user. For example, web1s location would be \\10.100.74.15\c$ 21. This script will archive log files for the machine into a dated subfolder. 22. After moving log files, the script will open a new Windows Explorer window rooted at the target machines c:\Windows\Temp directory. All temporary files from this directory should be deleted. The hidden/system files (Cookies, History, and Temporary Internet Files) do not need to be deleted, although it doesnt Last Saved 2013-09-27 07:48:00 by Michael Smith Page 27

hurt anything to do so. There may be two files that are locked and cannot be removed (FXSAPIDebugLogFile.txt and FXSTIFFDebugLogFile.txt). You can leave these two files alone. 23. After purging the temporary files, close the Explorer window and return to the command prompt window where the script is running. Hit Enter to indicate the temporary files have been purged. The script will then open another Windows Explorer window rooted at the target machines IIS web log directory (e.g. \\10.100.74.15\c$\inetpub\logs\LogFiles\W3SVC1). Delete the access log files older than one week. 24. After removing the old access logs, close the Explorer window and return to the command prompt window where the script is running. Hit Enter to indicate the logs have been deleted, and then hit Enter again to exit the script. 25. Return to the remote desktop connection to the target machine opened in step #11. In a command prompt window (probably the same one used in step #13), restart IIS using iisreset /start: 26. On the target machine, open Internet Explorer to http://www.guru.com (should be the home page set for the browser). The page should take a while to load as IIS initializes the application. Continue to browser to and load the employer home page, project search page, freelancer home page, and search for freelancers page. Login to ensure proper https functionality. Its very unlikely you will encounter issues here but it is still useful to pre-seed caches to ensure the site responds quickly when users first hit it. 27. Deployment to the target machine is now complete. Return to the F5 BIG-IP Console and follow similar instructions as when the server was removed from the pool, but instead of changing the radio button as described in step #8, you will be returning the state from Only active connections allowed to All traffic allowed. 28. After a few seconds, check the number of connections to the server (see step #11 for checking the number of connections). It should increase above zero as traffic is routed to the server. 29. Deployment to the target machine is complete. Repeat from Step #6 for the next target server. When traffic levels are low, you could remove the next target from the pool prior to completing the deployment on the previous server as long as the current and next servers are not on the same virtual host.

Upgrading the Task Engine


1. Prepare the build as detailed in Preparing a Build for Deployment 2. Open a remote desktop session to 10.100.76.30 using your personal guru-ecom domain account. 3. From the remote desktop session opened in step #2, open a nested remote desktop session to the 10.100.74.30/web-services machine using the 2kboss user account. 4. Open the Services control panel. 5. If the Guru.com Task Engine service exists, stop it. If there is an error, try stopping it again. Usually this will reset the service to a state where it looks stopped in the UI (i.e. it no longer lists the state as Started and the Start button is once again enabled). 6. Open the Programs and Features control panel 7. Find the Guru.com Task Engine application and if it exists, choose to uninstall it. 8. Open a Windows Explorer window at \\10.100.76.30\d$\Release_Archive and navigate to the prepared release deployment directory (see Preparing a Build for Deployment). 9. From the TaskEngine subdirectory, execute setup.exe or TaskEngineServiceInstaller.msi to begin the installer. 10. There are no options in the installer, so proceed through the installation wizard until complete.

Last Saved 2013-09-27 07:48:00 by Michael Smith

Page 28

11. If the installer gives an error similar to the following, then the previous uninstallation (in step #7) did not succeed completely and a reboot is required to get the machine to a known state to allow re-installation. Choose Exit Installation then reboot and repeat from step #6)

12. If this is the first time the task engine is installed on the machine, you will need to create an appropriate configuration file. For ease in future deployment, this is done using a symbolic link so that the underlying file can be updated in source control without needing to update the link on the target Task Engine Machine. Follow these steps to create the link: a. Open a command prompt window and change to the Task Engine installation directory (c:\Program Files\Guru.com\Guru.com Task Engine) b. Execute mklink Guru.Configuration.dll.config Configs\Environments\Environment.xxx.config where xxx is the task engine environment configuration file appropriate for the environment (e.g. Environment.productionTaskEngine.config) 13. In the Services control panel, start the service.

Periodic Tasks / How To


Misc. Important Tasks
Check the business faade API for consistency and organization. It should be obvious how to access business activities from the faade. And if theres more than one logical way to get to an activity, it should be easy to support both without needing to duplicate the activity itself. For example, the FreelancerMessaging activity can be accessed using facade.Freelancer.Messaging and facade. Messaging.Freelancer. There is plenty of reorganization thats already possible because things were not added in an organized manner. Check search performance by running reports on the aggregated search data and/or the live search data. Here are some spreadsheets to help out:
Freelancer Search - Freelancer Search Aggregated Query Performance.xlsx Query Performance - 48hours - 144 twenty minute blocks - v1.xlsx

Clean out mailboxes (undelivered processed, privatedb processed) to avoid disk space issues on the exchange server and to avoid performance issues that may occur when trying to open and look through the exchange folder for finding specific emails. See also How to access Production Email Accounts. A few months of bounce messages is probably plenty. For processed project messages (privatedb), before the Messages release in July 2011, we werent retaining any of the messages, so keeping a month of emails is probably more than sufficient at least until we can be absolutely positive the messages are processing 100% accurately and there is no need to retain them after processing again. Review log files! This isnt done enough during QA.

Last Saved 2013-09-27 07:48:00 by Michael Smith

Page 29

How to Update Generated Help Content in TFS


1. Make sure you have Perl installed (http://www.activestate.com/activeperl/downloads) 2. Make sure your TFS workspace is clean (i.e. you have no items in your pending changes list) 3. Make sure tf.exe is on your PATH (usually in C:\Program Files (x86)\Microsoft Visual Studio 9.0\Common7\IDE) 4. Open a command prompt and change directory to the Tools\HelpScripts directory within the branch where you would like to update the help content. 5. Execute SyncHelp.pl. The script will output the changes to files that it detects. Along with the actual content changes, there are a number of other files that support the help system that regularly change including search indexing files (e.g. Search_Chunk1.xml) and other miscellaneous files (e.g. catapult.log.zip) 6. After the script completes (which may take quite a while), review the changes in your pending changes list checking to ensure changes make sense. In the past, freelancer content had accidentally been published to the employer directory (or vice-versa) which would need to be corrected before checking in the changes. 7. If/when the changes appear reasonable, commit the changeset to TFS.

How to access Production Email Accounts


1. Open the Account Settings dialog:

Last Saved 2013-09-27 07:48:00 by Michael Smith

Page 30

2. On the Email tab, click Change with your main outlook email account selected.

3. Click More Settings

4. On the Advanced tab, click Add

5. Type in the name of the mailboxes you wish to have access to. The production accounts include the following: info, privatedb, contactus, welcome, dispute, disputes_out, safepay, projects, contactout, and

Last Saved 2013-09-27 07:48:00 by Michael Smith

Page 31

contactcopy.

6. The mailbox will appear below your account in the navigation pane of Outlook. 7. While you can add any mailbox, you will need permission to be able to view the contents of that mailbox. If you get the following when trying to expand or open the mailbox, then you do not have the appropriate permissions and will need to request permissions from the exchange administrator.

How to update the Help URL Reference Spreadsheet


In order to create deep links into the help content, MadCap Flare allows the creation of aliases for, anchors for, or a mapping to specific help content (exact terminology seems to vary). To help the support group determine what URL to provide in requirements documents when a link to a help page is desired, we generate a spreadsheet containing the help URL (and a javascript form that opens the help in a new window). Whenever aliases/anchors/mappings are added or removed (i.e. help pages are created or removed), the spreadsheet needs to be updated. The reference spreadsheet is checked into TFS at Websites/Marketplace/Admin/HelpUrlReference.csv and is linked from the admin site for download. Next to that file are two supporting files that act as inputs to generate the spreadsheet, HelpUrlReferenceTopicAliasMap.txt and HelpUrlReferenceTopicNavigationMap.txt. The former is the text of a report that is run in Flare (how to generate this report is unknown; Stacy provides the output) and then copy/pasted into the text document. The latter is a manually curated list of mappings on how to reach a particular page within help. The first text on each line in that file is the content page (within flare) and any associated HTML anchor. The rest of the line is how one would manually reach that content if they were navigating through help. This is a manual thing and has not been well maintained. To update the .csv, follow these steps: 1. Make sure you have Perl installed (http://www.activestate.com/activeperl/downloads) 2. Make sure your TFS workspace is clean (i.e. you have no items in your pending changes list) 3. Make sure tf.exe is on your PATH (usually in C:\Program Files (x86)\Microsoft Visual Studio 9.0\Common7\IDE) 4. Check out the HelpUrlReferenceTopicAliasMap.txt file 5. Edit it with the latest contents provided from Stacy or whoever is maintaining the help content. 6. Open a command prompt and change directory to the Tools\HelpScripts directory within the branch where you would like to update the help content. 7. Execute UpdateHelpUrlReference.pl. The script will output INFO lines if any of the help content does not have corresponding entries in the HelpUrlReferenceTopicNavigationMap.txt. For those content pages, Last Saved 2013-09-27 07:48:00 by Michael Smith Page 32

the script will guess how to get to the page by using the name of the content file itself (which is not always accurate). To indicate that the navigation path was a guess, the script will insert three question marks into the navigation path in the .csv. 8. After the script completes, review the changes in your pending changes list checking to ensure changes make sense. 9. If/when the changes appear reasonable, commit the changeset to TFS.

How to locate original bounce messages


Sometimes, a customer may need the actual bounce message received in order to investigate email delivery issues with their provider. The admin site provides something like the following regarding the bounce:
Tue, Aug S/R: 02, 2011 12:17:50 AM P: Tue, Aug 02, 2011 12:20:40 AM Final-recipient: RFC822; GeoSheehy@GASIII.com Action: failed Status: 5.5.0 X-Supplementary-Info: < #5.5.0 smtp;554 The message was rejected because it contains prohibited virus or spam content>

M: Exchange T: HardBounce

REPORT.IPM.Note.NDR

<ddu5qE8Dz0000363f @smtp.guru.com>

From this, you can find the actual received bounce message using the following process. 1. Make sure you have access to the production application email accounts (see How to access Production Email Accounts). 2. Under the inbox of the production mail accounts, there is a subfolder named Undeliverable. This folder contains subfolders for the different processing methods depending on the bounce type as indicated by the M value in the second column above using the naming pattern of ProcessedByMethod where Method is the processing method. In this example case (and in most cases), this is a bounce message that is processed using the Exchange method and thus the subfolder would be ProcessedByExchange. 3. While the first account to check should be the info account since most messages are sent using that from address, its possible that the bounce could be in other production accounts that are used as the sender address (contactus, dispute, welcome, safepay, and privatedb). 4. Use the Received date (the date labeled with R in the first column) to help you find the message. The message ID from the fourth column can be checked against message headers to validate you find the right message if there are similar looking messages. Unfortunately, Outlooks search capabilities are limited, so you cannot search message headers for this message id though you may have some luck searching for the recipients email address or some of the bounce details (e.g. the X-Supplementary-Info value)

SQL Queries
Task Engine
Clean up failed engines Used when theres an exception in the main engine thread (usually due to DB outage) that prevents the engine from properly updating its status in the database so it appears as though theres a task engine running per the data, yet there isnt actually any running engine.
update tTaskEngines set version = version + 1, Stopped = getDate() where stopped is null and isnull(LastPing, Started) < dateadd(mi, -15, getDate()) update tTaskQueueEntry set tTaskQueueEntry.QueueStateID = 10, tTaskQueueEntry.ExecutionEnd=tTaskEngines.Stopped from tTaskQueueEntry inner join tTaskEngines on tTaskQueueEntry.ExecutingEngine = tTaskEngines.ID and tTaskEngines.Stopped IS NOT NULL

Last Saved 2013-09-27 07:48:00 by Michael Smith

Page 33

where tTaskQueueEntry.QueueStateID in (2, 5, 6)

View recent Task Engine queue


select top 50 tTaskQueueEntry.id, tTaskQueueEntry.version, tTaskQueueEntry.DateQueued, tTaskQueueState.title as 'State', tTasks.Name + ' (' + cast(tTasks.ID as varchar) + ')' as 'Task', isnull(tTaskTrigger.Name + ' (' + cast(tTaskTrigger.ID as varchar) + ')', '<MANUAL>') 'Trigger', tTaskQueueEntry.ExecutionStart, tTaskQueueEntry.ExecutionEnd, tTaskQueueEntry.StatusMessage, tTaskQueueEntry.PercentComplete, tTaskEngines.ServerName, tTaskEngines.InstanceGUID from tTaskQueueEntry left join tTasks on tTaskQueueEntry.TaskID = tTasks.ID left join tTaskQueueState on tTaskQueueEntry.QueueStateID = tTaskQueueState.ID left join tTaskEngines on tTaskEngines.ID = tTaskQueueEntry.ExecutingEngine left join tTaskTrigger on tTaskTrigger.ID = tTaskQueueEntry.TriggerId order by id desc

as

Task Execution Performance Reporting


select t.Name + ' (' + cast(t.ID as varchar) + ')' as 'Task', COUNT(distinct e.ID) as ExecutionCount, min(datediff(ss, e.ExecutionStart, e.ExecutionEnd)) as MinExecutionTime, avg(datediff(ss, e.ExecutionStart, e.ExecutionEnd)) as AvgExecutionTime, max(datediff(ss, e.ExecutionStart, e.ExecutionEnd)) as MaxExecutionTime, min(datediff(ss, e.DateQueued, e.ExecutionStart)) as MinWaitTime, avg(datediff(ss, e.DateQueued, e.ExecutionStart)) as AvgWaitTime, max(datediff(ss, e.DateQueued, e.ExecutionStart)) as MaxWaitTime from tTaskQueueEntry e left join tTasks t on e.TaskID = t.ID where QueueStateID = 9 -- only successfully completed tasks group by t.Name, t.ID order by t.ID

Manually Queue Tasks for Execution


-- update timezones from registry insert into tTaskQueueEntry (TaskId, version, QueueStateID) values (1, 0, 1) -- update profile search data insert into tTaskQueueEntry (TaskId, version, QueueStateID) values (2, 0, 1) -- update "published" landing page links insert into tTaskQueueEntry (TaskId, version, QueueStateID) values (3, 0, 1) -- update landing page link URLs insert into tTaskQueueEntry (TaskId, version, QueueStateID) values (4, 0, 1) -- skill test list refresh insert into tTaskQueueEntry (TaskId, version, QueueStateID) values (5, 0, 1) -- process undeliverable email insert into tTaskQueueEntry (TaskId, version, QueueStateID) values (6, 0, 1) -- Membership expiration reminders insert into tTaskQueueEntry (TaskId, version, QueueStateID) values (7, 0, 1) -- Membership autorenewal insert into tTaskQueueEntry (TaskId, version, QueueStateID) values (8, 0, 1) -- Incoming project message processing insert into tTaskQueueEntry (TaskId, version, QueueStateID) values (9, 0, 1) -- Project Message outgoing notifications (realtime)

Last Saved 2013-09-27 07:48:00 by Michael Smith

Page 34

insert into tTaskQueueEntry (TaskId, version, QueueStateID) values (10, 0, 1) -- Project Announcement outgoing notifications insert into tTaskQueueEntry (TaskId, version, QueueStateID) values (11, 0, 1) -- Project questions outgoing notifications insert into tTaskQueueEntry (TaskId, version, QueueStateID) values (12, 0, 1)

Email/Messaging
Details for a specific Conversation
declare @conversationID int --select top 10 conversationid from tConversation order by conversationid desc set @conversationID = 1--(select max(conversationid) from tConversation) --select @conversationID = conversationid from tConversationmessage where subject = 'important update' and messageBody like '1- no one is allowed%' and conversationtypeid = 4 select ct.Description, c.ProjectID, c.CompanyID, c.GuruID, c.ProfileID, isnull(count(distinct cp.ConversationParticipantID), 0) as 'ParticipantCount', case isnull(count(distinct cp.UserType), 0) when 0 then 'n/a' when 1 then case max(UserType) when 0 then 'Profile' when 1 then 'Employer' else 'Unknown' end when 2 then case max(UserType) when 0 then 'Profile' when 1 then 'Employer' else 'Unknown' end + ' and ' + case min(UserType) when 0 then 'Profile' when 1 then 'Employer' else 'Unknown' end else 'Unknown' end as ParticipantsInvolved, isnull(count(distinct cm.messageID), 0) as '# Msgs', isnull(count(distinct case when cm.ModerationStatusID = 1 then cm.messageID else null end), 0) as '# Msgs Hidden', isnull(count(distinct case when cm.ModerationStatusID = 2 then cm.messageID else null end), 0) as '# Msgs Visible', isnull(count(distinct case when cm.ModerationStatusID = 3 then cm.messageID else null end), 0) as '# Msgs Blocked', isnull(count(distinct case when cm.ModerationStatusID = 4 then cm.messageID else null end), 0) as '# Msgs Approved', isnull(convert(varchar, min(coalesce(cm.DateModerated, cm.DateCreated)), 120), 'n/a') as 'Earliest message', isnull(convert(varchar, max(coalesce(cm.DateModerated, cm.DateCreated)), 120), 'n/a') as 'Latest message' from tConversation c inner join tConversationType ct on ct.ConversationTypeID = c.ConversationTypeID left join tConversationParticipant cp on cp.ConversationID = c.ConversationID left join tConversationmessage cm on cm.ConversationID = c.ConversationID where c.conversationID = @conversationID Group by ct.Description, c.ProjectID, c.CompanyID, c.GuruID, c.ProfileID select cm.Messageid, case sender.UserType when 0 then 'Profile' when 1 then 'Employer' else 'Unknown' end + ' (' + cast(sender.UserID as varchar) + ')' as Sender, cast(isnull(recipient.RecipientCount, 0) as varchar) + ' ' + cast(isnull(recipient.RecipientType, 'n/a') as varchar) as Recipients, DateCreated, DateModerated, ModeratorAccountID, cmms.Description as 'ModerationStatus', recipient.PendingNotifications, case when isnull(attachment.Total, 0) = 0 then 'No attachments' when isnull(attachment.Downloadable, 0) > 0 and attachment.Downloadable = attachment.Total then cast(attachment.Downloadable as varchar) + ' downloadable attachment(s)' when isnull(attachment.Downloadable, 0) > 0 then cast(attachment.Downloadable as varchar) + ' of ' + cast(attachment.Total as varchar) + ' downloadable attachment(s)' end as 'Attachments', isnull(attachment.TotalSize, 0) as AttachmentSize, Subject, MessageBody from tConversationMessage cm

Last Saved 2013-09-27 07:48:00 by Michael Smith

Page 35

left join tConversationMessageModerationStatus cmms on cmms.ModerationStatusID = cm.ModerationStatusID left join ( select cmp.MessageID, cp.UserType, cp.UserID from tConversationMessageParticipant cmp inner join tConversationParticipant cp on cmp.ConversationParticipantID = cp.ConversationParticipantID where cmp.RelationshipID = 1 ) sender on sender.MessageID = cm.MessageID left join ( select cmp.MessageID, count(distinct cast(cp.UserType as varchar) + '_' + cast(cp.UserId as varchar)) as RecipientCount, case isnull(count(distinct cp.UserType), 0) when 0 then 'n/a' when 1 then case max(UserType) when 0 then 'Profile' when 1 then 'Employer' else 'Unknown' end when 2 then case max(UserType) when 0 then 'Profile' when 1 then 'Employer' else 'Unknown' end + ' and ' + case min(UserType) when 0 then 'Profile' when 1 then 'Employer' else 'Unknown' end else 'Unknown' end as RecipientType, sum(case when cmp.NewMessageNotificationRequired = 1 then 1 else 0 end) as PendingNotifications from tConversationMessageParticipant cmp inner join tConversationParticipant cp on cmp.ConversationParticipantID = cp.ConversationParticipantID where cmp.RelationshipID = 2 group by cmp.Messageid ) recipient on recipient.MessageID = cm.MessageID left join ( select MessageID, count(*) as Total, sum(case when FileStatus = 3 then 1 else 0 end) as Downloadable, SUM(FileSize) as TotalSize from tConversationMessageAttachment group by MessageID ) attachment on attachment.MessageID = cm.MessageID where cm.ConversationID = @conversationID order by MessageID desc

Reset incoming customer support email processing Usually this would be in response to an error email being sent which already includes a query for performing this, but this is a general one that does all accounts.
UPDATE tPrivateDB_emailaccount SET status=0,process_start=null WHERE status >= 12 UPDATE tDispute_emailaccount SET status=0,process_start=null WHERE status >= 12 UPDATE tCSemail_account SET status=0,process_start=null WHERE status >= 12

View incoming email accounts (with configured) passwords


select 'tPrivateDB_emailaccount', id, username, status, process_start, password from tPrivateDB_emailaccount union all select 'tCSemail_account', id, username, status, process_start, password from tCSemail_account where disabled = 0 union all select 'tDispute_emailaccount', id, username, status, process_start, password from tDispute_emailaccount where disabled = 0

Last Saved 2013-09-27 07:48:00 by Michael Smith

Page 36

Bounce messages processed per day


select left(convert(varchar, processdate, 120), 10) as 'Day', count(distinct MessageId) as BounceMessages, count(distinct EmailAddress) as EmailAddressesBounced from tEmailAutoReplyHistory Group by left(convert(varchar, processdate, 120), 10) order by 'Day' desc

Bounce details for specific email address This functionality is also available from the admin site: https://admin.guru.com/admin/EmailAutoReplyHistory.cfm
declare @email varchar(150) set @email = 'staceybspencerf@comcast.net' declare declare declare declare declare declare declare declare declare @id int @processDate datetime @parseMethod varchar(50) @autoReplyType varchar(50) @itemClass varchar(50) @messageId varchar(100) @receiveDate datetime @sentDate datetime @parseDetails varchar(max)

declare cur cursor for select top 100 ID, ProcessDate, ParseMethod, AutoReplyType, ItemClass, MessageId, ReceiveDate, SentDate, ParseDetails from tEmailAutoReplyHistory where EmailAddress = @email order by id desc open cur fetch next from cur into @id, @processDate, @parseMethod, @autoReplyType, @itemClass, @messageId, @receiveDate, @sentDate, @parseDetails while @@fetch_status = 0 begin print '----------------------------------------------------------------------' print 'Bounce received ' + convert(varchar, @receiveDate, 120) + ' and processed ' + convert(varchar, @processDate, 120) + ' using ' + @parseMethod + ' parse method and determined to be ' + @autoReplyType + '. Details:' print @parseDetails fetch next from cur into @id, @processDate, @parseMethod, @autoReplyType, @itemClass, @messageId, @receiveDate, @sentDate, @parseDetails end print '----------------------------------------------------------------------' close cur deallocate cur

Freelancer Search
Query Performance and Execution Counts by Query Input
set transaction isolation level read uncommitted select top 1000 q.ID, q.KeywordCount as '# Keywords', q.Keywords, q1.Description as 'Category', q2.Description as 'Industry', q3.CityName as 'ClosestCity', q4.ProvinceName as 'Province', q5.CountryName as 'Country', q6.RegionName as 'WorldRegion', q.ZipCode, q.RadiusInMiles, q.PriceMinimum, q.PriceMaximum, s.Minimum, s.Average, s.Maximum, s.Executions

Last Saved 2013-09-27 07:48:00 by Michael Smith

Page 37

from (select *, (select k.Keyword + ' ' from tKeywords k inner join tFreelancerSearchQuery_Keywords qk on qk.KeywordID = k.ID where qk.FreelancerSearchQueryID = tFreelancerSearchQuery.ID for xml path('')) as Keywords from tFreelancerSearchQuery) q left join tCategory q1 on q.CategoryID = q1.CategoryID left join tIndustry q2 on q.IndustryID = q2.IndustryID left join tWorldCities q3 on q.ClosestCityID = q3.CityID left join tCountryProvinces q4 on q.ProvinceID = q4.ProvinceID left join tCountries q5 on q.CountryID = q5.ID left join tWorldRegions q6 on q.WorldRegionID = q6.WorldRegionID left join ( select FreelancerSearchQueryID, min(QueryTime) as Minimum, avg(QueryTime) as Average, max(QueryTime) as Maximum, count(*) as Executions from tExecutedFreelancerSearch Group by FreelancerSearchQueryID ) s on s.FreelancerSearchQueryID = q.ID where 1=1 -- and q.ID in (...) order by s.Executions desc

Recent Query Executions


set transaction isolation level read uncommitted select top 2000 s.ID, s.DateCreated, s.FreelancerSearchQueryID, case when datediff(s, q.datecreated, s.DateCreated) < 10 then 'Initial' else 'Repeat' end as 'query', t.Name + ' (' + cast(s.SearchType as varchar) + ')' as 'SearchType', s.PageRequested, s.PageSize, o.Name + ' (' + cast(s.SortUsed as varchar) + ')' as 'SortUsed', s.ResultCount, s.QueryTime, u.username + ' (' + cast(s.SearcherID as varchar) + ')' as 'Searcher', s.HostName, s.HostAddress, q.KeywordCount as '# Keywords', q1.Description as 'Category', q2.Description as 'Industry', q3.CityName as 'ClosestCity', q4.ProvinceName as 'Province', q5.CountryName as 'Country', q6.RegionName as 'WorldRegion', q.ZipCode, q.RadiusInMiles, q.PriceMinimum, q.PriceMaximum from tExecutedFreelancerSearch s left join tFreelancerSearchType t on s.SearchType = t.Id left join tFreelancerSearchOrder o on s.SortUsed = o.Id left join tAuthority u on s.SearcherID = u.AuthorityID left join tFreelancerSearchQuery q on s.FreelancerSearchQueryID = q.ID left join tCategory q1 on q.CategoryID = q1.CategoryID left join tIndustry q2 on q.IndustryID = q2.IndustryID left join tWorldCities q3 on q.ClosestCityID = q3.CityID left join tCountryProvinces q4 on q.ProvinceID = q4.ProvinceID left join tCountries q5 on q.CountryID = q5.ID left join tWorldRegions q6 on q.WorldRegionID = q6.WorldRegionID where 1=1 --and hostname <> '127.0.0.1' and hostname <> '::1' --and u.username='msmith-guru-employer' --and s.HostAddress='122.164.234.223' --and (hostaddress like '208.40.131.15_' or HostAddress = '173.13.52.17') --and freelancersearchqueryid=1 --224023 --and querytime > 30 order by s.ID DESC

Misc. Additional random queries (possibly including the above) are in this file:

Diagnostics Freelancer Search.sql

Last Saved 2013-09-27 07:48:00 by Michael Smith

Page 38

Bookmarks
Libraries
NHibernate Reference Documentation: http://nhforge.org/doc/nh/en/index.html NHibernate: Difference between ISession.Get and ISession.Load: http://ayende.com/blog/3988/nhibernate-the-difference-between-get-load-and-querying-by-id NVelocity template reference: http://velocity.apache.org/engine/releases/velocity-1.5/vtl-reference-guide.html Castle projects NVelocity extensions: http://www.castleproject.org/others/nvelocity/improvements.html

Development Tools
Diff/Merge tool integration in Visual Studio: http://blogs.msdn.com/jmanning/articles/535573.aspx SQL formatter: http://www.dpriver.com/pp/sqlformat.htm .NET regular expression tester: http://derekslager.com/blog/posts/2007/09/a-better-dotnet-regular-expressiontester.ashx Editing work item templates (TFS): http://geekswithblogs.net/KirstinJ/archive/2008/05/23/editing-work-itemtemplates-in-team-system-2008.aspx (saved this for the witexport/witimport commands)

External Bugs/Issues/Questions/etc.
Bug 6318: BCC field exposed to mail recipients sending mail using blue dragon: http://www.newatlanta.com/c/auth/support/bluedragon/bugtracking/detail?bugId=3339 http://stackoverflow.com/questions/6233298 http://stackoverflow.com/questions/1627707 http://stackoverflow.com/questions/1711205 http://stackoverflow.com/questions/1111336

Future Improvements
Infrastructure
Create F5 iRules for manually choosing server using specially crafted URL parameter (allow checking individual servers by monitis and internally) Upgrade to BD 7.1.1 (or other later version)

Email processing
Add new trigger type that uses notifications from the mail server to trigger arrival of new messages rather than use a time-based approach for looking in mailboxes. This would likely use a push subscription, like the SubscribeToStreamingNotifications method on the ExchangeService class of the Exchange Web Services Managed API. Exact details are unknown. Like receiving notifications from Exchange when new mail is received, some tasks (like the project message notification sending task) could be explicitly triggered to run when a new message is processed that likely needs a notification. Care would need to be taken here to ensure that the notification isnt consumed without actually sending the notification (which means that just queuing another task execution would not necessarily work since it could be coalesced with an existing execution without that existing execution processing the additional email due to a race condition between its last check for new notifications and the coalescing of the new task. Last Saved 2013-09-27 07:48:00 by Michael Smith Page 39

Email reading occurs by one task in the task engine; no concurrent processing to avoid duplicate processing of a single email. May be able to reserve processing of a single email to a particular queue entry using a tracking table in the database (e.g. use message id; first engine to insert a record for the message id will process it).

Task Engine
Purge tTaskQueueEntry older than a month or two; the historical records likely provide no long-term value. Add online-group membership checking so that failed engines do not prevent proper operation. This is noted in the Task Engine design document. See Task Engine, above.

Freelancer/Project Search
Rather than using SQL Server to perform searches on the live operational data, create search indexes that denormalize the searchable information. Possibly use a tool like Lucene.NET and run the indexes on multiple machines. This should provide speed improvements (more machines == faster performance; a la Google search).

Web Services
Our vision is to move towards a web service-based interface to the business layer. The following enumerates some initial thoughts on the requirements of our web service-based business layer. These requirements are not yet set in stone and are provided only as the basis for an initial discussion. With an eventual possible goal of exposing the business layer to third parties via the webservice, there are a number of important considerations. For example, bolting on appropriate security infrastructure after an initial rollout would be significantly more difficult than imposing rules from the start. So, let's begin by defining some rudimentary requirements: Authentication & Authorization

Must be able to support existing user identities for Freelancers, Employers, and Administrators With an eye towards enhancing user management functionality, must be able to support a more fine-grained role-based permission model to accommodate potential future user identities that can act as subsets of a Freelancer, Employer, or Admin. Actually, administrative users already have varying permissions depending on the individual administrative users, so the permission model must be able to accommodate the existing administrative permission sets. For auditing purposes, we must be able to track and authenticate the specific application accessing the service (e.g. if we sign on a third-party that wants to expose their own interface to our platform, we may want to restrict which specific business functionality they can access). This would also be useful to track which thirdparty integrators are actually using the service and with some additional tracking, to what degree they are using the service (i.e. frequency of API calls). Clients of the service must not need to provide their original credentials with every request; the service must utilize some sort of ticket-based authentication model where an initial authentication provides a ticket that is valid for a certain amount of time and can be used in lieu of the actual credentials. This must be supported because individual applications (the clients of the service) will not necessarily always have access to the original credentials, nor should they maintain references to those credentials. The clients will likely only have the original credentials at the time of the underlying user authentication. Note: This applies for both the application authentication and the end-user authentication. Within the business layer, authorization checks should be indelible. Business methods should not need to change as new roles are added or capabilities are changed for users of a particular role.

Method Granularity Since marshaling input and output parameters can be expensive especially if invoking two separate methods results in the same entity data being returned from each the API methods should be as coarse-grained as possible without sacrificing the ability to re-use methods for multiple usage scenarios that are very similar in Last Saved 2013-09-27 07:48:00 by Michael Smith Page 40

nature. In other words, it would not be ideal for every page on our website to call its own unique business method custom tailored for that page. On the other hand, we don't want to overload individual methods to be so large and encompassing (in the name of re-use) that there is a substantial amount of excess work being performed that is unneeded for use case. In general, service methods should align with actual business usecases.

The web service method implementations should be as lightweight as possible, essentially void of any actual business logic. Instead, service methods should only be responsible for marshaling the request, invoking appropriate business methods to perform appropriate operations and retrieve necessary information for processing the request, and marshaling the results back to the service client. This includes potential exception handling if a service response has error results defined. The motivation here is that the access mechanism may change over time; even now, there may be multiple service APIs: SOAP, REST, JSON, etc. By limiting the actual service method to a thin layer over actual business methods, a new API interface could more easily be created as technology or API requirements change. Fine-grained methods for partial-updates or post-backs or whatever for supporting use-cases.

Versioning & Deployment

Change in features and functionality is inevitable. We must support some level of backwards compatibility in the API so that new versions of the business service can be released without requiring simultaneous releases to the clients of the service. While this may not seem critical until we have third parties that are integrating with us using our API (and thus we can't control the specific timing of their upgrades to the latest versions), this is required even internally in order to support any sort of rolling outage for rolling out new product versions without an absolute outage of the service or the potential for user-visible errors. When strict backwards compatibility of an upgrade is not possible (for whatever reason), we should be able to support maintenance of multiple API versions. In a sense, adding backwards compatibility by maintaining both the old version and new version side-by-side for a period of time to allow appropriate client upgrades before the old version is decommissioned. Given backwards compatibility as described above, the web service backend should be upgradeable independently from the website front end (or other applications). This should allow less frequent downtime or reduction in capacity for the web service API during upgrades since the API will likely not need to be upgraded as frequently as the front end applications.

Documentation & Naming


Clear, concise, self-documenting method, parameter, and entity naming Additional API documentation for methods and parameters as necessary API usage documentation with examples that demonstrate exemplary usage Changelog. All changes between versions in the API must be documented.

Resources

http://mollyrocket.com/873 http://www.youtube.com/watch?v=aAb7hSCtvGw, http://www.sbvb.com.br/pdf/APIkeynoteGoogle.pdf, and http://www.infoq.com/articles/API-Design-Joshua-Bloch http://www.bserban.org/2009/02/web-api-best-practices/

Random Thoughts
Refactoring business layer gotcha: could be used in cfm and not detected by resharpers refactoring tools Sending private db and CCing the other persons email address could allow the other person to respond as the original since theyre getting the original message with the other persons code in it. Last Saved 2013-09-27 07:48:00 by Michael Smith Page 41

File system redesign stuff Have database per environment. Use DB backups/restores to manage schema differences for projects. I.e. pinky database for pinky environment. This allows configuration and build definitions to remain unchanged (which is problematic since both .net and BD need to point to the same place, and they invariably get messed up). Only a DB swap would be required when changing an environment from running on one DB schema to another. Qa_integration_test should be named Trunk. It doesnt really have anything to do with qa. ProviderHelper probably should be merged into GenericEntityProvider

Last Saved 2013-09-27 07:48:00 by Michael Smith

Page 42

Das könnte Ihnen auch gefallen