llmclean: 'LLM'-Assisted Data Cleaning with Multi-Provider Support
Detects and suggests fixes for semantic inconsistencies in data
frames by calling large language models (LLMs) through a unified,
provider-agnostic interface. Supported providers include 'OpenAI'
('GPT-4o', 'GPT-4o-mini'), 'Anthropic' ('Claude'), 'Google' ('Gemini'),
'Groq' (free-tier 'LLaMA' and 'Mixtral'), and local 'Ollama' models.
The package identifies issues that rule-based tools cannot detect:
abbreviation variants, typographic errors, case inconsistencies, and
malformed values. Results are returned as tidy data frames with column,
row index, detected value, issue type, suggested fix, and confidence
score. An offline fallback using statistical and fuzzy-matching methods
is provided for use without any API key. Interactive fix application
with human review is supported via 'apply_fixes()'. Methods follow
de Jonge and van der Loo (2013)
<https://cran.r-project.org/doc/contrib/de_Jonge+van_der_Loo-Introduction_to_data_cleaning_with_R.pdf>
and Chaudhuri et al. (2003) <doi:10.1145/872757.872796>.
Documentation:
Downloads:
Linking:
Please use the canonical form
https://CRAN.R-project.org/package=llmclean
to link to this page.