Patterns of Enterprise Application Architecture
What is Architecture?
- high level breakdown of system into parts
- decisions that are hard to change
In the end, architecture boils down to the important stuff – whatever that is. (2)
- most widely used technique is to break (decompose) the application into layers and how these layers work together
Enterprise Applications
- they involve persistent data around for multiple runs of an application
- lots of data
- many users access data concurrently which involves potential access issues
- lots of user interface screens to interact with data
- need to integrate with other apps in a variety of different languages/stacks
- conceptual dissonance between technology and data – needs to be read, munged, & written in a variety of syntactic and semantic flavors
- complex business “illogic” to handle domain complexity
Performance
- any performance advice shouldn’t be treated as fact until actually tested
- basic, universal advice - minimize remote calls
- significant changes to config will invalidate assumptions about performance
- some performance vocab
- response time - time it takes for system to process response fromt he outside
- responsiveness - how quickly system acknowledges requests as opposed to processing
- latency - minimum time required to get any form of response
- throughput - how much stuff you can do in a given time
- load - how much stress a system is under
- load sensitivity - how response time varies under load (a system degrades under load)
- efficiency - performance divided by resources
- capacity - max effective throughput or load
- scalability - how additional resources affect performance and can either be horizontal (scaling out, adding new machines) or vertical (scaling up, adding memory, CPU, storage to a machine)
Pattern
Each pattern describes a problem which occurs over and over again in our environment, and then describes the core of the solution to that problem, in a way that you can use this solution a million times over, without ever doing it the same way twice. (9)
- focus of a pattern is a common and effective solution to a particular problem
- don’t read all the details of each pattern, just enough to know what to look up
- never apply a solution blindly
- value isn’t giving you the idea of a pattern but rather helping you communicate it
Layers
- way to decompose complicated software systems
- high level layer uses lower level services, but lower level should remain unaware of higher level
- benefits:
- single layer as a coherent whole without knowing much about other layers
- you can substitute layers
- minimize dependencies between layers
- layers are good places for standardization
- lower layers can be reused for multiple higher level services
- downsides:
- you can’t encapsulate everything (thinking adding an additional UI element requires database changes and corresponding changes to all intermediary layers)
- performance hit for layering
3 Principal Layers
- presentation - handles interaction between user and software - interprets actions/commands and displays info to user
- data source - communicates with other systems that carry out task on behalf of the application
- domain logic - business logic that is work the application does for the domain you’re working in
Layer | Responsibilities |
---|---|
Presentation | Provision of services, display of information (e.g., in Win-dows or HTML, handling of user request (mouse clicks, keyboardhits), HTTP requests, command-line invoca-tions, batch API) |
Domain Logic | that is the real point of the system |
Data Source | Communication with databases, messaging systems, trans-action managers, other packages |
- hexagonal architecture - any system as a core surrounded by interfaces to external systems – this is a symmetrical view that doesn’t distinguish between services you provide and services you consume
- choose most appropriate for of separation for problem, but make sure there is separation
- domain and data source should never be dependent on presentation
- generally, we are talking about logical layers – typically physical layers break down to client vs. server
- don’t separate layers into discrete processes unless needed.
- complexity boosters: they all come at a cost. e.g., extreme performance requirements, explicit multi-threading, distribution, paradigm chasms
Domain Logic
3 patterns to organize domain logic
- transaction script
- domain model
- table module
Transaction Script
- transaction here is used in the sense of a business operation, not an ACID-compliant database transaction
- single procedure for each action a user might want to do
- useful for very simple domains, since as logic increases in complexity, duplication increases, and application code becomes hard to untangle
- all behavior for action is within the transaction script
Domain Model
- build a model of the domain around the nouns in the domain
- behavior then is in the interactions between objects
- moving to domain model involves a paradigm shift to object-oriented thinking
- have to deal with more complex mapping to database
Table Module
- looks like domain model, but instead of object for all nouns, you get an object for every table in the database
- pulls all records as a Record Set from the database, and in order to work on 1 you’d pass in an
id
to that Record Set - there needs to be special tooling for these Record Sets
Record Set
- in memory representation of tabular data
- looks exactly like the result of a SQL query but can be manipulated by other parts of the system
- can easily be manipulated by domain logic
- typically a list of maps:
[{...}, {...} ...]
- can have an implicit or implicit interface (think
person['name']
vsperson.name
) - can be connected (need active connection to database), or disconnected (can be manipulated offline)
Service Layer
- defines application boundary with a layer of services that establish a set of available actions (API) & coordinates the applications response to each operation
- lay over a Domain Model or a Table Module and provides a clean API & a good spot to put things like transaction control and security
- minimal case is to make it a Façade, maximal is to put business logic in it
- controller entity - have logic and behavior exclusive to use case or transaction in a separate Transaction Script called a controller or service type
Gateway (Base Pattern)
- an object that encapsulates access to an external resource or system
- in reality, this is a simple wrapper pattern
- good spot to apply service stub
- some overlap with Gang of Four patterns Façade and Adapter
- useful for encapsulating an awkward interface for something rather than letting it affect rest of code
- if you need to decouple subsystems, another choice is Mapper, but this is more complicated
Mapper (Base Pattern)
- an object that sets up communication between two independent objects
- similar to Mediator
- useful when you want neither subsystem to have dependency on their interaction, like with a database (Data Mapper)
Service Stub
- removes dependence on problematic services while testing
- in extreme programming, this is called a “mock object”
- should be as simple as possible
- replace service with service stub that runs in memory and locally
Managing Database Connections
- connections act as the link between application code and the database
- expensive to create, so preferable to use a pool (although verify a pool helps performance as with some modern data source tooling it doesn’t matter – best to use a connection manager to encapsulate the connection entirely)
- connections must be closed as soon as you are done, and might be done in two ways:
- rely on garbage collection to close, but garbage collection isn’t immediate
- explicit closing, which is riskier and more prone to forgetting
- a good approach is to tie connection open and close with a transaction
Table Data Gateway
- an object that acts as a gateway to a table – one object handles all rows in a table
- way to stop mixing SQL into application logic
- simple interface with several find, update, insert, delete methods
- useful to think of as a wrapper for SQL statements
Row Data Gateway
- an object that acts as a gateway to a single row in the database – one instance per row
- exactly mimics a single record
- if there is any domain logic in this object, then it is Active Record
- useful with Transaction Script and less so with Domain Model
Active Record
- an object that wraps a row in a database table or view, encapsulates database access, and adds domain logic to that object
- each Active Record is responsible for saving to and loading from the database, and also any behavior (domain logic) that exercises on that object
- should exactly match the database
- good for domain logic that isn’t that complex – create, read, update, delete operations
Data Mapper
- a layer of mappers that moves data between objects and a database while keeping them independent of each other and the mapper itself
- separate in memory objects from the database via a layer of software
- primary occasion to use is when you want the database schema and the object model to evolve independently
- price is the extra layer required to maintain
- more complicated business logic leads to Domain Model or Data Mapper
Unit of Work
- maintains a list of objects affected by a business transaction and coordinates the writing out of changes and the resolution of concurrency problems
- when it comes time to commit, the Unit of Work decides what to do – i.e., open a transaction, concurrency checks, writes changes to a database
- keeps track of all objects you modify, so all you need to worry about are the objects altered in synching in memory data to database
- great strength is that it keeps all info in one place
Identity Map
- ensures that every object only gets loaded once, and keeps each object in a map – looks up objects in the map when referring to them
- primary key or surrogate key (any column or set of columns that can be used as a PK instead of natural key e.g., incrementing integer)
- either explicit (each data point has access method) or generic (one access method for all data)
- useful if any data needs to be pulled from the database and modified
- also useful as transactional cache
- won’t help for cross-session concurrency protections, as this is meant for single session
Lazy Load
- an object that doesn’t contain all the data you need but knows where to get it
-
lazy initialization - use
NULL
to signal field hasn’t been loaded yet - value holder object that wraps other object – ask the value holder for data and it goes to the data source (useful for avoiding identity problems of virtual proxy)
- virtual proxy - object that looks like object in the field but doesn’t contain anything – only returns data when one of its methods are called
- ghost - a real object in a partial state
- ripple loading - when you cause more database accesses than needed, hurting performance
Web Server
- web server’s job to interpret URL of request and hand over control to a web server program
- 2 ways of structuring program in a web server - as a script or as a server page
- script works best for interpreting the request, server page for formatting a response
- this leads to Model-View-Controller pattern
Model-View-Controller
- request comes to input controller
- input controller pulls info from request
- forwards business logic to model object
- model object talks to data source and does what is indicated in request, as well as gathers info for response
- control passed to input controller from model, which decides what view to select
- control is passed to view along with response data for dipslay
- most important reason for MVC is to separate models from presentation
Application Controller
- different from input controller in MVC – controls the flow of the application
- good rule of thumb is if machine is in control of screen flow, it is useful, and if the user is in control, it is not
Offline Concurrency
- concurrency control for data that’s manipulated during multiple database transactions
Lost Updates
- when someone starts an update after someone & finishes before them, the first transaction will wipe out the second
Inconsistent Reads
- when a transaction reads object
x
twice andx
has different values – between the two reads, someone modifiedx
Correctness vs. Liveness
- correctness (or safety, consistency) vs. liveness (how much concurrent activity can go on)
- these concepts are in contention (think CAP theorem)
Execution Contexts
- request - a single call from outside world that the software works on a potentially sends a response back
- session - a long-running interaction between client and service in which multiple requests can happen
- process - a heavy weight execution context that provide isolation with internal data
- thread - lighter weight active agent that’s set up for multiple threads in a single process but memory is often shared so there can be concurrency problems
- isolated threads - threads that don’t share memory
- transaction - pulls together several requests the client wants to treat as a single request
Isolation
- partition data so it can only be access by one active agent, e.g., OS memory or file locks
Immutability
- if data can’t be modified there is no concurrency problem
Optimistic vs Pessimistic Concurrency Control
- optimistic allows multiple users to edit data and only reconciles differences on save
- pessimistic allows one agent to edit data and gives others read only access
- optimistic is conflict detection while pessimistic is conflict prevention
- pessimistic reduces concurrency while optimistic makes conflict resolution tricky
- choice comes down to frequency and severity of conflicts
Deadlocks
- occurs with pessimistic control when 2 or more processes need to acquire locks the others are holding
- can use victims, or when deadlocks appear the process that will lose data
- also can use timeouts, or enforce all locks are acquired at the beginning
- deadlocks are easy to get wrong so simple, conservative schemes work best
Transaction
- a transaction is a bounded sequence of work with the start and end well-defined
- all participating resources in a consistent state at the beginning and end
- must complete on all or nothing basis
Transaction Isolation
Isolation Level | Dirty read | Non-repeatable read | Phantom Read |
---|---|---|---|
READ UNCOMMITTED | Possible | Possible | Possible |
READ COMMITTED | Not Possible | Possible | Possible |
REPEATABLE READ | Not Possible | Not Possible | Possible |
SERIALIZABLE | Not Possible | Not Possible | Not Possible |
ACID
- atomicity - all steps in sequence must complete or be rolled back
- consistency - system’s resources must be in a consistent, non-corrupt state
- isolation - results of transaction must not be visible to any other transaction until the transaction is complete
- durability - any results must be made permanent
Process Per Session
- one way of handling concurrency is to have a single process per session
- avoids all the problems of multi-threading, and is equally isolated memory-wise
- downside is processes are expensive, so you can use pooled process-per-request
- need to ensure all resources are released at end of request
- thread-per-request if further performance is needed – this has a fair bit of multi-threading overhead, so process-per-request is often sufficient
Session State
- state (data) retained in between requests or across business transactions
- different from record data, which is session state persisted to disk
- session state might not be consistent (ACID consistent) at any point
- biggest problem with session state is isolation
- three ways to store session state
- client session state - stores on client, e.g., cookies, encoding in URL, hidden forms
- server session state - store on server e.g., in memory, or more durably as a serialized object
- database session state - stored in a database
- with client, data needs to be transferred over the wire, so ideal for smaller payloads
- also need to deal with security and encryption, so client-side presume all data is available
Session Migration vs. Server Affinity
- session migration allows a session to move from server to server as it handles a request
- server affinity forces one server to handle all requests
- might be problematic if all requests are clustered and go to same server
Other Session State Concerns
- what happens when a user cancels requests or leaves? cleaning up state might be tricky on the server or the database side
- development effort – database and client side are the heaviest lifts
- if server session data stored so it can survive a crash, this might be ideal
Fine-Grained Interface
- separate setters and getters for each property
- optimized for future extensibility (OO principle)
- not useful for objects used remotely because of number of calls
Coarse-Grained Interface
- grouped setters and getters
- minimize calls - optimized for remote calls
- lose flexibility and extensibility
Distributed Object Design
- Don’t distribute objects - a procedure call within a process is fast, across 2 processes is slower, and processes running on separate machines is slower still
- Minimize distribution boundaries and use clustering as much as possible - sometimes there are needs for boundaries e.g., client-server, app-db
-
use remote facade pattern - use coarse-grained at the distribution boundaries, fine-grained internally
- this advice based on Remote Procedure Call synchronous architecture, and message-based async might be preferable