The BLEU, chrF++, and METEOR metrics, though widely used, do have some limitations. BLEU does not measure meaning but pays more attention to word combinations, leading to disregarding semantic meaning. The BLEU score also heavily penalises synonyms if they are not listed in the reference translations or treats them as unknown if they do not occur at least twice in the test set. The chrF++ scores can sometimes rate nonsensical translations as precise or closer to human translations if the combinations of two words are present in the reference translations. The chrF++ scores have not been exhaustively tested on Indian languages, which are exceptionally morphologically rich. This creates a concern regarding the reliability of the chrF++ score for translation software dealing with Indian languages. METEOR, though more sophisticated, is computationally expensive, requires language-specific resources, and can be overly lenient with word order.