Matching Model

Matching models module.

This module contains functionality for instantiating, training, and evaluating deep

learning and neural-symbolic matching models

class neer_match.matching_model.DLMatchingModel(similarity_map, initial_feature_width_scales=10, feature_depths=2, initial_record_width_scale=10, record_depth=4, **kwargs)

A deep learning matching model class.

Inherits tensorflow.keras.Model and automates deep-learning-based entity matching using the similarity map supplied by the user.

record_pair_network

The record pair network.

Type:

RecordPairNetwork

__init__(similarity_map, initial_feature_width_scales=10, feature_depths=2, initial_record_width_scale=10, record_depth=4, **kwargs)

Initialize a deep learning matching model.

Generate a record pair network from the passed similarity map. The input arguments are passed to the record pair network (see RecordPairNetwork).

Parameters:
  • similarity_map (SimilarityMap) – A similarity map object.

  • initial_feature_width_scales (Union[int, List[int]]) – The initial width scales of the feature networks.

  • feature_depths (Union[int, List[int]]) – The depths of the feature networks.

  • initial_record_width_scale (int) – The initial width scale of the record network.

  • record_depth (int) – The depth of the record network.

  • **kwargs – Additional keyword arguments passed to parent class (tensorflow.keras.Model).

build(input_shapes)

Build the model.

Return type:

None

call(inputs)

Call the model on inputs.

Return type:

Tensor

evaluate(left, right, matches, **kwargs)

Evaluate the model.

Construct a data generator from the input data frames using the similarity map with which the model was initialized and evaluate the model. The model is evaluated by calling the tensorflow.keras.Model.evaluate()

Parameters:
  • left (DataFrame) – The left data frame.

  • right (DataFrame) – The right data frame.

  • matches (DataFrame) – The matches data frame.

  • **kwargs – Additional keyword arguments passed to parent class (tensorflow.keras.Model.evaluate()).

Return type:

dict

fit(left, right, matches, batch_size=16, mismatch_share=0.1, shuffle=True, **kwargs)

Fit the model.

Construct a data generator from the input data frames using the similarity map with which the model was initialized and fit the model. The model is trained by calling the tensorflow.keras.Model.fit() method.

Parameters:
  • left (DataFrame) – The left data frame.

  • right (DataFrame) – The right data frame.

  • matches (DataFrame) – The matches data frame.

  • batch_size (int) – Batch size.

  • mismatch_share (float) – Mismatch share.

  • shuffle (bool) – Shuffle flag.

  • **kwargs – Additional keyword arguments passed to parent class (tensorflow.keras.Model.fit()).

Return type:

None

predict(left, right, batch_size=16, **kwargs)

Generate model predictions.

Construct a data generator from the input data frames using the similarity map with which the model was initialized and generate predictions.

Parameters:
  • left (DataFrame) – The left data frame.

  • right (DataFrame) – The right data frame.

  • batch_size (int) – Batch size.

  • **kwargs – Additional keyword arguments passed to parent class (tensorflow.keras.Model.predict()).

Return type:

Tensor

predict_from_generator(generator, **kwargs)

Generate model predictions from a generator.

Parameters:
  • generator (DataGenerator) – The data generator.

  • **kwargs – Additional keyword arguments passed to parent class (tensorflow.keras.Model.predict()).

Return type:

Tensor

property similarity_map: SimilarityMap

Similarity Map of the Model.

suggest(left, right, count, batch_size=16, **kwargs)

Generate model suggestions.

Construct a data generator from the input data frames using the similarity map with which the model was initialized and generate suggestions.

Parameters:
  • left (DataFrame) – The left data frame.

  • right (DataFrame) – The right data frame.

  • count (int) – The number of suggestions to generate.

  • **kwargs – Additional keyword arguments passed to the suggest function.

Return type:

DataFrame

class neer_match.matching_model.NSMatchingModel(similarity_map, initial_feature_width_scales=10, feature_depths=2, initial_record_width_scale=10, record_depth=4)

A neural-symbolic matching model class.

record_pair_network

The record pair network.

Type:

RecordPairNetwork

bce

The training loss function (binary cross-entropy, see tensorflow.keras.losses.BinaryCrossentropy()).

Type:

tf.keras.losses.Loss

optimizer

The optimizer used for training.

Type:

tensorflow.keras.optimizers.Optimizer

__init__(similarity_map, initial_feature_width_scales=10, feature_depths=2, initial_record_width_scale=10, record_depth=4)

Initialize a neural-symbolic matching learning matching model.

Generate a record pair network from the passed similarity map. The input arguments are passed to the record pair network (see RecordPairNetwork).

The class uses a custom training loop with neural-symbolic (or hybrid) loss function. It does not inherit from tensorflow.keras.Model, but to provide a consistent interface with the deep learning matching model, it implements the same methods.

Parameters:
  • similarity_map (SimilarityMap) – A similarity map object.

  • initial_feature_width_scales (Union[int, List[int]]) – The initial width scales of the feature networks.

  • feature_depths (Union[int, List[int]]) – The depths of the feature networks.

  • initial_record_width_scale (int) – The initial width scale of the record network.

  • record_depth (int) – The depth of the record network.

compile(optimizer=<keras.src.optimizers.adam.Adam object>)

Compile the model.

Parameters:

optimizer (Optimizer) – The optimizer used for training.

Return type:

None

evaluate(left, right, matches, batch_size=16, mismatch_share=1.0, satisfiability_weight=1.0)

Evaluate the model.

Construct a data generator from the input data frames using the similarity map with which the model was initialized and evaluate the model. It returns a dictionary with evaluation metrics.

Parameters:
  • left (DataFrame) – The left data frame.

  • right (DataFrame) – The right data frame.

  • matches (DataFrame) – The matches data frame.

  • batch_size (int) – Batch size.

  • mismatch_share (float) – The mismatch share.

  • satisfiability_weight (float) – The weight of the satisfiability loss.

Return type:

dict

fit(left, right, matches, epochs, mismatch_share=0.1, satisfiability_weight=1.0, verbose=1, log_mod_n=1, **kwargs)

Fit the model.

Construct a data generator from the input data frames using the similarity map with which the model was initialized and fit the model.

The model is trained using a custom training loop. The loss can either be purely defined using fuzzy logic axioms (default case with satisfiability weight 1.0) or as a weighted sum of binary cross-entropy and satisfiability loss (by setting the satisfiability weight to a value between 0 and 1).

Parameters:
  • left (DataFrame) – The left data frame.

  • right (DataFrame) – The right data frame.

  • matches (DataFrame) – The matches data frame.

  • epochs (int) – The number of epochs to train.

  • mismatch_share (float) – The mismatch share.

  • satisfiability_weight (float) – The weight of the satisfiability loss.

  • verbose (int) – The verbosity level.

  • log_mod_n (int) – The log modulo.

  • **kwargs – Additional keyword arguments passed to the data generator.

Return type:

None

predict(left, right, batch_size=16)

Generate model predictions.

Construct a data generator from the input data frames using the similarity map with which the model was initialized and generate predictions.

Parameters:
  • left (DataFrame) – The left data frame.

  • right (DataFrame) – The right data frame.

  • batch_size (int) – Batch size.

Return type:

Tensor

predict_from_generator(generator)

Generate model predictions from a generator.

Parameters:

generator (DataGenerator) – The data generator.

Return type:

Tensor

property similarity_map: SimilarityMap

Similarity Map of the Model.

suggest(left, right, count, batch_size=16)

Generate model suggestions.

Construct a data generator from the input data frames using the similarity map with which the model was initialized and generate suggestions.

Parameters:
  • left (DataFrame) – The left data frame.

  • right (DataFrame) – The right data frame.

  • count (int) – The number of suggestions to generate.

  • batch_size – Batch size.

Return type:

DataFrame