Matching Model¶
Matching models module.
- This module contains functionality for instantiating, training, and evaluating deep
learning and neural-symbolic matching models
- class neer_match.matching_model.DLMatchingModel(similarity_map, initial_feature_width_scales=10, feature_depths=2, initial_record_width_scale=10, record_depth=4, **kwargs)¶
A deep learning matching model class.
Inherits
tensorflow.keras.Model
and automates deep-learning-based entity matching using the similarity map supplied by the user.- record_pair_network¶
The record pair network.
- Type:
- __init__(similarity_map, initial_feature_width_scales=10, feature_depths=2, initial_record_width_scale=10, record_depth=4, **kwargs)¶
Initialize a deep learning matching model.
Generate a record pair network from the passed similarity map. The input arguments are passed to the record pair network (see
RecordPairNetwork
).- Parameters:
similarity_map (
SimilarityMap
) – A similarity map object.initial_feature_width_scales (
Union
[int
,List
[int
]]) – The initial width scales of the feature networks.feature_depths (
Union
[int
,List
[int
]]) – The depths of the feature networks.initial_record_width_scale (
int
) – The initial width scale of the record network.record_depth (
int
) – The depth of the record network.**kwargs – Additional keyword arguments passed to parent class (
tensorflow.keras.Model
).
- build(input_shapes)¶
Build the model.
- Return type:
None
- call(inputs)¶
Call the model on inputs.
- Return type:
Tensor
- evaluate(left, right, matches, **kwargs)¶
Evaluate the model.
Construct a data generator from the input data frames using the similarity map with which the model was initialized and evaluate the model. The model is evaluated by calling the
tensorflow.keras.Model.evaluate()
- Parameters:
left (
DataFrame
) – The left data frame.right (
DataFrame
) – The right data frame.matches (
DataFrame
) – The matches data frame.**kwargs – Additional keyword arguments passed to parent class (
tensorflow.keras.Model.evaluate()
).
- Return type:
dict
- fit(left, right, matches, batch_size=16, mismatch_share=0.1, shuffle=True, **kwargs)¶
Fit the model.
Construct a data generator from the input data frames using the similarity map with which the model was initialized and fit the model. The model is trained by calling the
tensorflow.keras.Model.fit()
method.- Parameters:
left (
DataFrame
) – The left data frame.right (
DataFrame
) – The right data frame.matches (
DataFrame
) – The matches data frame.batch_size (
int
) – Batch size.mismatch_share (
float
) – Mismatch share.shuffle (
bool
) – Shuffle flag.**kwargs – Additional keyword arguments passed to parent class (
tensorflow.keras.Model.fit()
).
- Return type:
None
- predict(left, right, batch_size=16, **kwargs)¶
Generate model predictions.
Construct a data generator from the input data frames using the similarity map with which the model was initialized and generate predictions.
- Parameters:
left (
DataFrame
) – The left data frame.right (
DataFrame
) – The right data frame.batch_size (
int
) – Batch size.**kwargs – Additional keyword arguments passed to parent class (
tensorflow.keras.Model.predict()
).
- Return type:
Tensor
- predict_from_generator(generator, **kwargs)¶
Generate model predictions from a generator.
- Parameters:
generator (
DataGenerator
) – The data generator.**kwargs – Additional keyword arguments passed to parent class (
tensorflow.keras.Model.predict()
).
- Return type:
Tensor
- property similarity_map: SimilarityMap¶
Similarity Map of the Model.
- suggest(left, right, count, batch_size=16, **kwargs)¶
Generate model suggestions.
Construct a data generator from the input data frames using the similarity map with which the model was initialized and generate suggestions.
- Parameters:
left (
DataFrame
) – The left data frame.right (
DataFrame
) – The right data frame.count (
int
) – The number of suggestions to generate.**kwargs – Additional keyword arguments passed to the suggest function.
- Return type:
DataFrame
- class neer_match.matching_model.NSMatchingModel(similarity_map, initial_feature_width_scales=10, feature_depths=2, initial_record_width_scale=10, record_depth=4)¶
A neural-symbolic matching model class.
- record_pair_network¶
The record pair network.
- Type:
- bce¶
The training loss function (binary cross-entropy, see
tensorflow.keras.losses.BinaryCrossentropy()
).- Type:
tf.keras.losses.Loss
- optimizer¶
The optimizer used for training.
- Type:
tensorflow.keras.optimizers.Optimizer
- __init__(similarity_map, initial_feature_width_scales=10, feature_depths=2, initial_record_width_scale=10, record_depth=4)¶
Initialize a neural-symbolic matching learning matching model.
Generate a record pair network from the passed similarity map. The input arguments are passed to the record pair network (see
RecordPairNetwork
).The class uses a custom training loop with neural-symbolic (or hybrid) loss function. It does not inherit from
tensorflow.keras.Model
, but to provide a consistent interface with the deep learning matching model, it implements the same methods.- Parameters:
similarity_map (
SimilarityMap
) – A similarity map object.initial_feature_width_scales (
Union
[int
,List
[int
]]) – The initial width scales of the feature networks.feature_depths (
Union
[int
,List
[int
]]) – The depths of the feature networks.initial_record_width_scale (
int
) – The initial width scale of the record network.record_depth (
int
) – The depth of the record network.
- compile(optimizer=<keras.src.optimizers.adam.Adam object>)¶
Compile the model.
- Parameters:
optimizer (
Optimizer
) – The optimizer used for training.- Return type:
None
- evaluate(left, right, matches, batch_size=16, mismatch_share=1.0, satisfiability_weight=1.0)¶
Evaluate the model.
Construct a data generator from the input data frames using the similarity map with which the model was initialized and evaluate the model. It returns a dictionary with evaluation metrics.
- Parameters:
left (
DataFrame
) – The left data frame.right (
DataFrame
) – The right data frame.matches (
DataFrame
) – The matches data frame.batch_size (
int
) – Batch size.mismatch_share (
float
) – The mismatch share.satisfiability_weight (
float
) – The weight of the satisfiability loss.
- Return type:
dict
- fit(left, right, matches, epochs, mismatch_share=0.1, satisfiability_weight=1.0, verbose=1, log_mod_n=1, **kwargs)¶
Fit the model.
Construct a data generator from the input data frames using the similarity map with which the model was initialized and fit the model.
The model is trained using a custom training loop. The loss can either be purely defined using fuzzy logic axioms (default case with satisfiability weight 1.0) or as a weighted sum of binary cross-entropy and satisfiability loss (by setting the satisfiability weight to a value between 0 and 1).
- Parameters:
left (
DataFrame
) – The left data frame.right (
DataFrame
) – The right data frame.matches (
DataFrame
) – The matches data frame.epochs (
int
) – The number of epochs to train.mismatch_share (
float
) – The mismatch share.satisfiability_weight (
float
) – The weight of the satisfiability loss.verbose (
int
) – The verbosity level.log_mod_n (
int
) – The log modulo.**kwargs – Additional keyword arguments passed to the data generator.
- Return type:
None
- predict(left, right, batch_size=16)¶
Generate model predictions.
Construct a data generator from the input data frames using the similarity map with which the model was initialized and generate predictions.
- Parameters:
left (
DataFrame
) – The left data frame.right (
DataFrame
) – The right data frame.batch_size (
int
) – Batch size.
- Return type:
Tensor
- predict_from_generator(generator)¶
Generate model predictions from a generator.
- Parameters:
generator (
DataGenerator
) – The data generator.- Return type:
Tensor
- property similarity_map: SimilarityMap¶
Similarity Map of the Model.
- suggest(left, right, count, batch_size=16)¶
Generate model suggestions.
Construct a data generator from the input data frames using the similarity map with which the model was initialized and generate suggestions.
- Parameters:
left (
DataFrame
) – The left data frame.right (
DataFrame
) – The right data frame.count (
int
) – The number of suggestions to generate.batch_size – Batch size.
- Return type:
DataFrame