When writing fixtures, it's currently possible to define associations that don't exist, even if a foreign key exists. For example:
```yml
george:
name: "Curious George"
pirate: redbeard
blackbeard:
name: "Blackbeard"
```
When the fixtures are created, `parrots(:george).pirate` will be nil, but it's not immediately clear why. This can make it hard to debug tests and can give false confidence in passing ones.
This can happen because Rails [disables referential integrity](f263530bf7/activerecord/lib/active_record/connection_adapters/abstract/database_statements.rb (L407)) when inserting fixtures. This makes the fixtures algorithm much simpler - it can just create the fixtures in alphabetical order and assume that the other side of a foreign key constraint will *eventually* be added.
Ideally we would check foreign keys once all fixtures have been loaded, so that we can be sure that the foreign key constraints were met. This PR introduces that. To enable it:
```ruby
config.active_record.verify_foreign_keys_for_fixtures = true
```
I'm proposing we enable this in 7.0 for new apps and have added it to new framework defaults. When run against our app, it found 3 fixture files with unmet FK constraints - turns out all those fixtures weren't being used and were safe to delete.
Otherwise you can't change `config.active_record.*` from `config/initializers`.
Until now it somewhat worked because the config would be assigned
when `ActiveRecord::Base` would be first referenced. But it was quite
brittle anyway.
Ref: https://github.com/rails/rails/pull/42237
Ref: https://bugs.ruby-lang.org/issues/17763
Ruby class variables are slow, and get slower the more ancestors the class has.
And it doesn't seem likely to get faster any time soon. Overall I'm under the
impression that ruby-core consider them more or less deprecated.
But in many cases where `cattr_accessor` is used, a class instance variable
could work just fine, and would be much faster.
For Active Record this means most of the `cattr_accessor` on `ActiveRecord::Base`
could actually be `attr_accessor` on `ActiveRecord`.
I started with `legacy_connection_handling` as a proof of concept.
In a multiple database application, associations can't join across
databases. When set, this option tells Rails to make 2 or more queries
rather than using joins for associations.
Set the option on a has many through association:
```ruby
class Dog
has_many :treats, through: :humans, disable_joins: true
has_many :humans
end
```
Then instead of generating join SQL, two queries are used for `@dog.treats`:
```
SELECT "humans"."id" FROM "humans" WHERE "humans"."dog_id" = ? [["dog_id", 1]]
SELECT "treats".* FROM "treats" WHERE "treats"."human_id" IN (?, ?, ?) [["human_id", 1], ["human_id", 2], ["human_id", 3]]
```
This code is extracted from a gem we use internally at GitHub which
means the implementation here is used in production daily and isn't
experimental.
I often get the question "why can't Rails do this automatically" so I
figured I'd include the answer in the commit. Rails can't do this
automatically because associations are lazily loaded. `dog.treats` needs
to load `Dog`, then `Human` and then `Treats`. When `dog.treats` is
called Rails pre-generates the SQL that will be run and puts that
information into a reflection object. Because the SQL parts are pre-generated,
as soon as `dog.treats` is loaded it's too late to skip a join. The join
is already available on the object and that join is what's run to load
`treats` from `dog` through `humans`. I think the only way to avoid setting
an option on the association is to rewrite how and when the SQL is
generated for associations which is a large undertaking. Basically the
way that Active Record associations are designed, it is currently
impossible to have Rails figure out to not join (loading the association
will cause the join to occur, and that join will raise an error if the
models don't live in the same db).
The original implementation was written by me and Aaron. Lee helped port
over tests, and I refactored the extraction to better match Rails style.
Co-authored-by: Lee Quarella <leequarella@gmail.com>
Co-authored-by: Aaron Patterson <aaron@rubyonrails.org>
Sometimes a controller or a job has to perform multiple independent queries, e.g.:
```
def index
@posts = Post.published
@categories = Category.active
end
```
Since these two queries are totally independent, ideally you could
execute them in parallel, so that assuming that each take 50ms, the
total query time would be 50ms rather than 100ms.
A very naive way to do this is to simply call `Relation#to_a` in a
background thread, the problem is that most Rails applications, and
even Rails itself rely on thread local state (`PerThreadRegistry`,
`CurrentAttributes`, etc). So executing such a high level interface
from another thread is likely to lead to many context loss problems
or even thread safety issues.
What we can do instead, is to schedule a much lower level operation
(`Adapter#select_all`) in a thread pool, and return a future/promise.
This way we kepp most of the risky code on the main thread, but perform
the slow IO in background, with very little chance of executing some
code that rely on state stored in thread local storage.
Also since most users are on MRI, only the IO can really be parallelized,
so scheduling more code to be executed in background wouldn't lead
to better performance.
Sometimes cascading association deletions can cause timeouts due to
an IO issue. Perhaps a model has associations that are destroyed on
deletion which in turn trigger other deletions and this can continue
down a complex tree. Along this tree you may also hit other IO
operations. Such deep deletions can lead to server timeouts while
awaiting completion and really the user may not notice all the
changes on their side immediately making them wait unnecesarially or
worse causing a timeout during the operation.
We now allow associations supporting the `dependent:` key to take `:destroy_async`,
which schedules a background job to destroy associations.
Co-authored-by: Adrianna Chang <adrianna.chang@shopify.com>
Co-authored-by: Rafael Mendonça França <rafael@franca.dev>
Co-authored-by: Cory Gwin @gwincr11 <gwincr11@github.com>
Removes the use of `ActiveRecord::AdvisoryLockBase` since it inherits
from `ActiveRecord::Base` and hence share module attributes that are defined in `ActiveRecord::Base`.
This is problematic because establishing connections through
`ActiveRecord::AdvisoryLockBase` can end up changing state of the default
connection handler of `ActiveRecord::Base` leading to unexpected
behaviors in a Rails application.
In the case of https://github.com/rails/rails/issues/39157,
Running migrations with `rails db:migrate:primary_shard_one` was not working as
the application itself defined the following
```
class ApplicationRecord < ActiveRecord::Base
self.abstract_class = true
connects_to shards: {
default: { writing: :primary },
shard_one: { writing: :primary_shard_one }
}
end
```
In the database migrate rake task, the default connection was
established with the database config of `primary_shard_one`. However,
the default connection was altered to that of `primary` because
`ActiveRecord::AdvisoryLockBase.establish_connection` ended up loading
`ApplicationRecord` which calls `connects_to shards:`. Since all we
really need here is just a normal database connection, we can avoid
accidentally altering the default connection handler state during the migration
by creating a custom connection handler used for retrieving a connection.
We had removed the dedicated `MysqlDateTime`, `MysqlJson`, and
`OID::Json` classes in the past (f1a0fa9, #29666), so legacy YAML
loading has no longer always perfectly compatiblity.
Fortunately, v2 (Rails 5.1 style) YAML doesn't contain type information
in almost all cases (unless serializing object using custom select), so
usually removing dedicated type affects to legacy YAML older than v1
(Rails 5.0 style) YAML only.
This restores legacy YAML compatibility for MySQL with v1 format by
adding the class alias in YAML for `MysqlString` which is most recently
reported about compatibility concern. It also affects to legacy Rails
4.2 style YAML, but 4.2 style YAML had already broken by removing
`MysqlDateTime` over 4 years ago.
Add support for finding records based on signed ids, which are tamper-proof, verified ids that can be set to expire and scoped with a purpose. This is particularly useful for things like password reset or email verification, where you want the bearer of the signed id to be able to interact with the underlying record, but usually only within a certain time period.
This PR moves advisory lock to it's own connection instead of
`ActiveRecord::Base` to fix#37748. As a note the issue is present on
both mysql and postgres. We don't see it on sqlite3 because sqlite3
doesn't support advisory locks.
The underlying problem only appears if:
1) the app is using multiple databases, and therefore establishing a new
connetion in the abstract models
2) the app has a migration that loads a model (ex `Post.update_all`)
which causes that new connection to get established.
This is because when Rails runs migrations the default connections are
established, the lock is taken out on the `ActiveRecord::Base`
connection. When the migration that calls a model is loaded, a new
connection will be established and the lock will automatically be
released.
When Rails goes to release the lock in the ensure block it will find
that the connection has been closed. Even if the connection wasn't
closed the lock would no longer exist on that connection.
We originally considered checking if the connection was active, but
ultimately that would hide that the advisory locks weren't working
correctly because there'd be no lock to release.
We also considered making the lock more granular - that it only blocked
on each migration individually instead of all the migrations for that
connection. This might be the right move going forward, but right now
multi-db migrations that load models are very broken in Rails 6.0 and
master.
John and I don't love this fix, it requires a bit too much knowledge of
internals and how Rails picks up connections. However, it does fix the
issue, makes the lock more global, and makes the lock more resilient to
changing connections.
Co-authored-by: John Crepezzi <john.crepezzi@gmail.com>
In 7254d23764f7abe8023f3daeb07d99ea1c8e777a, an autoload for
`ConnectionAdapters::AbstractAdapter` was added to `active_record.rb`.
Later in d6b923adbdfc9a4df20132f741bbfb43db12113c, a manual require for
that class was added to `active_record/base.rb` as some constants under
`ConnectionAdapters` weren't defined until `AbstractAdapter` was loaded.
In 1efd88283ef68d912df215125951a87526768a51, the require was removed and
replaced with an autoload in `active_record.rb`, above the previous one.
Because the first autoload was for the `ConnectionAdapters` constant and
the second one tried to create it, the autoload would fire immediately.
Rather than fixing the autoload problem, the require had effectively
just been moved from `active_record/base.rb` to `active_record.rb`.
Instead of defining autoloads for constants under `ConnectionAdapters`
in the `abstract_adapter.rb` file, we can create a separate, autoloaded
`connection_adapters.rb` file for this purpose.
To avoid a "circular require considered harmful" warning from Ruby, we
have to fix the module nesting in `schema_creation.rb`, as a followup to
e4108fc619e0f1c28cdec6049d31f2db01d56dfd.
`AbstractAdapter` loads many other dependencies, so making it autoload
properly has a noticeable impact on the load time of `active_record.rb`.
Benchmark:
$ cat test.rb
require "bundler/setup"
before = ObjectSpace.each_object(Module).count
start = Process.clock_gettime(Process::CLOCK_MONOTONIC)
require "active_record"
finish = Process.clock_gettime(Process::CLOCK_MONOTONIC)
after = ObjectSpace.each_object(Module).count
puts "took #{finish - start} and created #{after - before} modules"
Before:
$ ruby test.rb
took 0.47532399999909103 and created 901 modules
After:
$ ruby test.rb
took 0.3299509999342263 and created 608 modules
The implementation of `Relation#cache_key` depends on some internal
relation methods (e.g. `apply_join_dependency`, `build_subquery`), but
somehow that implementation exists on the model class
(`collection_cache_key`), it sometimes bothers to me.
This refactors that implementation moves to `Relation#cache_key`, then
we can avoid `send` to call internal methods.
The following PR adds behavior to Rails to allow an application to
automatically switch it's connection from the primary to the replica.
A request will be sent to the replica if:
* The request is a read request (`GET` or `HEAD`)
* AND It's been 2 seconds since the last write to the database (because
we don't want to send a user to a replica if the write hasn't made it
to the replica yet)
A request will be sent to the primary if:
* It's not a GET/HEAD request (ie is a POST, PATCH, etc)
* Has been less than 2 seconds since the last write to the database
The implementation that decides when to switch reads (the 2 seconds) is
"safe" to use in production but not recommended without adequate testing
with your infrastructure. At GitHub in addition to the a 5 second delay
we have a curcuit breaker that checks the replication delay
and will send the query to a replica before the 5 seconds has passed.
This is specific to our application and therefore not something Rails
should be doing for you. You'll need to test and implement more robust
handling of when to switch based on your infrastructure. The auto
switcher in Rails is meant to be a basic implementation / API that acts
as a guide for how to implement autoswitching.
The impementation here is meant to be strict enough that you know how to
implement your own resolver and operations classes but flexible enough
that we're not telling you how to do it.
The middleware is not included automatically and can be installed in
your application with the classes you want to use for the resolver and
operations passed in. If you don't pass any classes into the middleware
the Rails default Resolver and Session classes will be used.
The Resolver decides what parameters define when to
switch, Operations sets timestamps for the Resolver to read from. For
example you may want to use cookies instead of a session so you'd
implement a Resolver::Cookies class and pass that into the middleware
via configuration options.
```
config.active_record.database_selector = { delay: 2.seconds }
config.active_record.database_resolver = MyResolver
config.active_record.database_operations = MyResolver::MyCookies
```
Your classes can inherit from the existing classes and reimplment the
methods (or implement more methods) that you need to do the switching.
You only need to implement methods that you want to change. For example
if you wanted to set the session token for the last read from a replica
you would reimplement the `read_from_replica` method in your resolver
class and implement a method that updates a new timestamp in your
operations class.
While the three-tier config makes it easier to define databases for
multiple database applications, it quickly became clear to offer full
support for multiple databases we need to change the way the connections
hash was handled.
A three-tier config means that when Rails needed to choose a default
configuration (in the case a user doesn't ask for a specific
configuration) it wasn't clear to Rails which the default was. I
[bandaid fixed this so the rake tasks could work](#32271) but that fix
wasn't correct because it actually doubled up the configuration hashes.
Instead of attemping to manipulate the hashes @tenderlove and I decided
that it made more sense if we converted the hashes to objects so we can
easily ask those object questions. In a three tier config like this:
```
development:
primary:
database: "my_primary_db"
animals:
database; "my_animals_db"
```
We end up with an object like this:
```
@configurations=[
#<ActiveRecord::DatabaseConfigurations::HashConfig:0x00007fd1acbded10
@env_name="development",@spec_name="primary",
@config={"adapter"=>"sqlite3", "database"=>"db/development.sqlite3"}>,
#<ActiveRecord::DatabaseConfigurations::HashConfig:0x00007fd1acbdea90
@env_name="development",@spec_name="animals",
@config={"adapter"=>"sqlite3", "database"=>"db/development.sqlite3"}>
]>
```
The configurations setter takes the database configuration set by your
application and turns them into an
`ActiveRecord::DatabaseConfigurations` object that has one getter -
`@configurations` which is an array of all the database objects.
The configurations getter returns this object by default since it acts
like a hash in most of the cases we need. For example if you need to
access the default `development` database we can simply request it as we
did before:
```
ActiveRecord::Base.configurations["development"]
```
This will return primary development database configuration hash:
```
{ "database" => "my_primary_db" }
```
Internally all of Active Record has been converted to use the new
objects. I've built this to be backwards compatible but allow for
accessing the hash if needed for a deprecation period. To get the
original hash instead of the object you can either add `to_h` on the
configurations call or pass `legacy: true` to `configurations.
```
ActiveRecord::Base.configurations.to_h
=> { "development => { "database" => "my_primary_db" } }
ActiveRecord::Base.configurations(legacy: true)
=> { "development => { "database" => "my_primary_db" } }
```
The new configurations object allows us to iterate over the Active
Record configurations without losing the known environment or
specification name for that configuration. You can also select all the
configs for an env or env and spec. With this we can always ask
any object what environment it belongs to:
```
db_configs = ActiveRecord::Base.configurations.configurations_for("development")
=> #<ActiveRecord::DatabaseConfigurations:0x00007fd1acbdf800
@configurations=[
#<ActiveRecord::DatabaseConfigurations::HashConfig:0x00007fd1acbded10
@env_name="development",@spec_name="primary",
@config={"adapter"=>"sqlite3", "database"=>"db/development.sqlite3"}>,
#<ActiveRecord::DatabaseConfigurations::HashConfig:0x00007fd1acbdea90
@env_name="development",@spec_name="animals",
@config={"adapter"=>"sqlite3", "database"=>"db/development.sqlite3"}>
]>
db_config.env_name
=> "development"
db_config.spec_name
=> "primary"
db_config.config
=> { "adapter"=>"sqlite3", "database"=>"db/development.sqlite3" }
```
The configurations object is more flexible than the configurations hash
and will allow us to build on top of the connection management in order
to add support for primary/replica connections, sharding, and
constructing queries for associations that live in multiple databases.
Moves the configs_for and DatabaseConfig struct into it's own file. I
was considering doing this in a future refactoring but our set up forced
me to move it now. You see there are `mattr_accessor`'s on the Core
module that have default settings. For example the `schema_format`
defaults to Ruby. So if I call `configs_for` or any methods in the Core
module it will reset the `schema_format` to `:ruby`. By moving it to
it's own class we can keep the logic contained and avoid this
unfortunate issue.
The second change here does a double loop over the yaml files. Bear with
me...
Our tests dictate that we need to load an environment before our rake
tasks because we could have something in an environment that the
database.yml depends on. There are side-effects to this and I think
there's a deeper bug that needs to be fixed but that's for another
issue. The gist of the problem is when I was creating the dynamic rake
tasks if the yaml that that rake task is calling evaluates code (like
erb) that calls the environment configs the code will blow up because
the environment is not loaded yet.
To avoid this issue we added a new method that simply loads the yaml and
does not evaluate the erb or anything in it. We then use that yaml to
create the task name. Inside the task name we can then call
`load_config` and load the real config to actually call the code
internal to the task. I admit, this is gross, but refactoring can't all
be pretty all the time and I'm working hard with `@tenderlove` to
refactor much more of this code to get to a better place re connection
management and rake tasks.
When dumping the database schema, Rails will dump foreign key names only
if those names were not generate by Rails. Currently this is determined
by checking if the foreign key name is `fk_rails_` followed by
a 10-character hash.
At [Cookpad](https://github.com/cookpad), we use
[Departure](https://github.com/departurerb/departure) (Percona's
pt-online-schema-change runner for ActiveRecord migrations) to run migrations.
Often, `pt-osc` will make a copy of a table in order to run a long migration
without blocking it. In this copy process, foreign keys are copied too,
but [their name is prefixed with an underscore to prevent name collision
](https://www.percona.com/doc/percona-toolkit/LATEST/pt-online-schema-change.html#cmdoption-pt-online-schema-change-alter-foreign-keys-method).
In the process described above, we often end up with a development
database that contains foreign keys which name starts with `_fk_rails_`.
That name does not match the ignore pattern, so next time Rails dumps
the database schema (eg. when running `rake db:migrate`), our
`db/schema.rb` file ends up containing those unwanted foreign key names.
This also produces an unwanted git diff that we'd prefer not to commit.
In this PR, I'd like to suggest a way to expose the foreign key name
ignore pattern to the Rails configuration, so that individual projects
can decide on a different pattern of foreign keys that will not get
their names dumped in `schema.rb`.
Provides both a forked process and threaded parallelization options. To
use add `parallelize` to your test suite.
Takes a `workers` argument that controls how many times the process
is forked. For each process a new database will be created suffixed
with the worker number; test-database-0 and test-database-1
respectively.
If `ENV["PARALLEL_WORKERS"]` is set the workers argument will be ignored
and the environment variable will be used instead. This is useful for CI
environments, or other environments where you may need more workers than
you do for local testing.
If the number of workers is set to `1` or fewer, the tests will not be
parallelized.
The default parallelization method is to fork processes. If you'd like to
use threads instead you can pass `with: :threads` to the `parallelize`
method. Note the threaded parallelization does not create multiple
database and will not work with system tests at this time.
parallelize(workers: 2, with: :threads)
The threaded parallelization uses Minitest's parallel exector directly.
The processes paralleliztion uses a Ruby Drb server.
For parallelization via threads a setup hook and cleanup hook are
provided.
```
class ActiveSupport::TestCase
parallelize_setup do |worker|
# setup databases
end
parallelize_teardown do |worker|
# cleanup database
end
parallelize(workers: 2)
end
```
[Eileen M. Uchitelle, Aaron Patterson]
Fixes a bug that can occur when ActiveJob tries to access ActiveRecord.
Specifically, I had an Active Job process fail on Sidekiq with this error:
```
ActiveJob::DeserializationError: Error while trying to deserialize arguments:
uninitialized constant ActiveRecord::Core::ClassMethods::TableMetadata
Did you mean? ActiveRecord::TableMetadata
```
raised by these lines of code:
```
[GEM_ROOT]/gems/activerecord-5.0.0.1/lib/active_record/core.rb:300 :in `table_metadata`
298
299 def table_metadata # :nodoc:
300 TableMetadata.new(self, arel_table)
301 end
302 end
[GEM_ROOT]/gems/activerecord-5.0.0.1/lib/active_record/core.rb:273 :in `predicate_builder`
[GEM_ROOT]/gems/activerecord-5.0.0.1/lib/active_record/core.rb:290 :in `relation`
```
The problem seems to be that, inside ActiveRecord::Core, the `TableMetadata`
class has not been loaded and, therefore, Rails tries to access the constant
`ActiveRecord::Core::ClassMethods::TableMetadata` which does not exist.
Eager loading `ActiveRecord::TableMetadata` should fix the issue.
@rafaelfranca -- see our Campfire discussion