MCP.Tools
get_data_info
def get_data_info(data_path: str | Path,
vars_list: List[str] | None = None,
encoding: str = "utf-8") -> str:
...
Input Parameters:
- data_path: Absolute filesystem path or URL to data file (required)
- vars_list: Optional variable subset specification for selective analysis (default: null, all variables)
- encoding: Character encoding for text-based formats (default: UTF-8, ignored for .dta)
Return Structure: Serialized JSON string containing multi-layered metadata:
{
"overview": {"source": <path>, "obs": <int>, "var_numbers": <int>, "var_list": [<array>]},
"info_config": {"metrics": [<array>], "max_display": <int>, "decimal_places": <int>},
"vars_detail": {<variable_name>: {"var": <str>, "type": <str>, "summary": {...}}},
"saved_path": <cache_file_path>
}
Operational Examples:
# Local file analysis
get_data_info("/data/econometrics/survey.dta")
get_data_info("~/Documents/exports/quarterly.csv", vars_list=["gdp", "inflation", "unemployment"])
# Remote data ingestion
get_data_info("https://repository.org/datasets/panel_data.xlsx")
# Encoded source handling
get_data_info("/data/legacy/latin1_data.csv", encoding="latin1")
Supported Formats:
- Stata: .dta
- CSV/Text: .csv, .tsv, .psv
- Excel: .xlsx, .xls
- SPSS: .sav, .zsav
Implementation Architecture:
The tool operates through a multi-layered abstraction cascade. At the foundation lies a polymorphic class hierarchy where DataInfoBase defines the abstract interface for format-specific handlers (DtaDataInfo, CsvDataInfo, ExcelDataInfo, SpssDataInfo). Content integrity verification employs MD5 hashing with configurable suffix length for cache identification. Configuration propagation follows a precedence chain: runtime parameters override environment variables (STATA_MCP_DATA_INFO_DECIMAL_PLACES, STATA_MCP_DATA_INFO_STRING_KEEP_NUMBER), which in turn override TOML-based configuration at ~/.statamcp/config.toml.
Statistical computation leverages pandas DataFrame operations with NumPy backend. The metrics system implements a configurable computation pipeline where default metrics (obs, mean, stderr, min, max) can be extended through configuration to include quartiles (q1, q3) and distribution shape measures (skewness, kurtosis). Type dispatch separates string variables (observation counting with unique value sampling under max_display threshold) from numeric variables (central tendency, dispersion, and distribution shape computation with decimal_places precision rounding).
Caching strategy employs content-addressable storage where hash computation determines cache file naming: data_info__<name>_<ext>__hash_<suffix>.json. Cache resolution occurs at invocation time, with automatic regeneration on content hash divergence. The cache directory defaults to ~/.statamcp/.cache/ but can be overridden to project-specific stata-mcp-tmp/ locations through the cache_dir parameter.
stata_do
def stata_do(dofile_path: str,
log_file_name: str | None = None,
is_read_log: bool = True) -> Dict[str, Union[str, None]]:
...
Input Parameters:
- dofile_path: Absolute or relative path to target .do file (required)
- log_file_name: Custom log filename without timestamp (optional, auto-generated if null)
- is_read_log: Boolean flag for log content retrieval (default: true)
Return Structure: Dictionary containing execution metadata and optional log payload:
{
"log_file_path": "<absolute_path_to_stata_log>",
"log_content": "<full_log_text_or_'Not_read_log'>"
}
Error condition returns: {"error": "<exception_message>"}
Operational Examples:
# Standard execution with log retrieval
stata_do("/Users/project/stata-mcp-dofile/20250104153045.do")
# Custom log naming
stata_do("~/analysis/regression_pipeline.do", log_file_name="quarterly_results")
# Execution without log reading
stata_do("/tmp/estimation.do", is_read_log=False)
Implementation Architecture:
The tool encapsulates the StataDo executor class which implements platform-specific command invocation strategies. Cross-platform abstraction abstracts Stata executable location through the StataFinder class: macOS probes /Applications/Stata/ hierarchy, Windows interrogates Program Files registry, and Linux queries system PATH for stata-mp. The execution pipeline involves do-file staging, Stata CLI invocation with -b batch mode flag, log file redirection, and exit code monitoring.
Log file management operates within the stata-mcp-log/ directory structure with automatic timestamp generation when log_file_name is omitted. The executor implements differential log handling based on is_read_log flag: when enabled, performs file read operation and returns content; when disabled, returns placeholder string to minimize I/O overhead.
Exception handling categorizes failures into three tiers: FileNotFoundError for missing do-file artifacts, RuntimeError for Stata execution failures or log generation issues, and PermissionError for insufficient execution or write permissions. Error conditions return dictionary with "error" key rather than raising exceptions to maintain MCP protocol compatibility.
write_dofile
⚠️ Disabled by Default: This tool requires
ENABLE_WRITE_DOFILE=trueconfiguration.Modern AI agents have native file writing capabilities, making this tool redundant. To enable, set
STATA_MCP__ENABLE_WRITE_DOFILE=trueor add[BETA] ENABLE_WRITE_DOFILE = trueto your config.
def write_dofile(content: str,
encoding: str | None = None) -> str:
...
Input Parameters:
- content: Stata command sequence to persist (required)
- encoding: Character encoding for file output (optional, defaults to UTF-8)
Return Structure: String containing absolute POSIX-compliant path to generated do-file
Operational Examples:
# Basic regression analysis
write_dofile("""
use "/data/survey.dta", clear
regress income age education experience
outreg2 using "`output_path'/results.doc", replace
""")
# Time series analysis with encoding specification
write_dofile("""
tsset date
arima gdp, ar(1) ma(1)
predict forecast
""", encoding="latin1")
# Data transformation pipeline
write_dofile("""
gen log_gdp = ln(gdp)
gen diff_income = d(income)
xtset country_id year
xtreg diff_income log_gdp, fe
""")
Implementation Architecture:
The tool implements atomic file creation within the stata-mcp-dofile/ directory hierarchy. File naming employs ISO 8601 basic format timestamp generation (YYYYMMDDHHMMSS.do) ensuring temporal uniqueness and chronological sorting. The write operation utilizes Python's built-in open() function with mode "w" and specified encoding parameter, performing implicit file creation and truncation for atomic write semantics.
Integration with output redirection commands (outreg2, esttab) requires coordination with the results_doc_path prompt to establish output directory paths prior to do-file generation. This separation of concerns enables deterministic output path management across multiple Stata execution cycles.
The tool does not perform syntactic validation or semantic analysis of the Stata code content. Code correctness, command sequencing, and macro expansion validity remain the responsibility of the calling context. Error handling wraps file I/O operations in try-except blocks with structured logging for success/failure tracking.
⚠️ Deprecation Notice: This tool is disabled by default and will be removed in a future version. Modern AI agents have native file writing capabilities - use those instead.
read_log
def read_log(file_path: str,
encoding: str = "utf-8",
is_beta: bool = False,
*,
output_format: Literal["full", "core", "dict"] = "dict") -> str:
...
Input Parameters:
- file_path: Absolute path to target log file (required, .log or .smcl)
- encoding: Character encoding for text decoding (optional, defaults to UTF-8)
- is_beta: Enable structured log parsing with StataLog module (optional, default: false)
- macOS/Linux only - Windows users should use default behavior
- Recommended for .smcl files with dict format
- output_format: Output format when is_beta=true (optional, default: "dict")
- full: Raw log content without processing
- core: Cleaned content with framework lines removed
- dict: Structured command-result pairs (recommended)
Return Structure:
- Default mode (is_beta=false): Raw string content of the file
- Beta mode (is_beta=true): Depends on output_format:
- full: Plain text log content
- core: Log content without framework (headers, footers, log commands)
- dict: String representation of command-result list
Operational Examples:
# Read log file (default mode)
read_log("/Users/project/stata-mcp-log/20250104153045.log")
# Read SMCL log with structured parsing (macOS/Linux)
read_log("/Users/project/stata-mcp-log/20250104153045.smcl",
is_beta=True,
output_format="dict")
# Get cleaned log content without framework
read_log("~/stata-mcp-log/session.log",
is_beta=True,
output_format="core")
# Read with custom encoding
read_log("~/analysis/tables/results.txt", encoding="utf-8")
Implementation Architecture:
The tool implements dual-mode log reading: traditional file reading and structured parsing via the StataLog module.
Traditional Mode (is_beta=false): Generic file reading via Python's open() function with mode "r". Path validation checks file existence through Path.exists(). Content reading uses single file.read() operation for complete file retrieval.
Structured Parsing Mode (is_beta=true, Unix only): Leverages the stata_log module which provides:
- StataLogTEXT: Parser for .log (plain text) files
- StataLogSMCL: Parser for .smcl (Stata Markup and Control Language) files
- StataLogInfo: Dataclass containing command_result_list with structured command-output pairs
The StataLog factory class (from_path() method) automatically detects file extension and returns the appropriate parser. Framework removal eliminates log headers/footers, log using/close commands, and do-file execution markers, preserving only actual Stata commands and their outputs.
Output Format Details:
- full: Equivalent to read_plain_text() - raw file content
- core: Equivalent to read_without_framework() - cleaned content
- dict: Returns str(log_info.read_as_dict()) - structured mapping
Error handling covers: FileNotFoundError for missing files, IOError for I/O failures, ValueError for invalid output_format, and UnicodeDecodeError for encoding mismatches.
ado_package_install
def ado_package_install(package: str,
source: str = "ssc",
is_replace: bool = True,
package_source_from: str | None = None) -> str:
...
Input Parameters:
- package: Package identifier (required)
- SSC: package name (e.g., "outreg2")
- GitHub: "username/reponame" format (e.g., "sepinetam/texiv")
- net: package name with package_source_from specifying source
- source: Distribution source (optional, default: "ssc")
- Options: "ssc", "github", "net"
- is_replace: Force replacement flag (optional, default: true)
- package_source_from: Source URL or directory for 'net' installations (optional)
Return Structure: String containing complete Stata execution log from installation operation
Operational Examples:
# SSC package installation
ado_package_install("outreg2", source="ssc")
# GitHub package installation
ado_package_install("sepinetam/texiv", source="github")
# Network installation
ado_package_install("custompkg", source="net", package_source_from="https://example.com/stata/")
# Force reinstall
ado_package_install("estout", source="ssc", is_replace=True)
Implementation Architecture:
The tool implements platform-divergent installation strategies. Unix systems (macOS/Linux) execute through specialized installer classes inheriting from base installer interface: SSC_Install invokes ssc install <package>, replace via Stata CLI; GITHUB_Install executes github install <username/reponame>, replace; NET_Install runs net install <package> from(<source>), replace. Windows systems bypass direct installation, instead generating temporary do-file via write_dofile and delegating to stata_do execution.
Installation verification occurs through message parsing where installer classes examine Stata output for success indicators. The check_installed_from_msg() method performs regex or substring matching to identify successful installation patterns. Failed installations trigger error logging with full message capture via debug-level logging.
Performance considerations advise against unnecessary invocations due to network latency, repository lookup overhead, and redundant installation attempts when packages already exist in Stata's ado directory. The tool implements no local installation cache—each invocation queries remote repositories or filesystem.
help
macOS and Linux only
def help(cmd: str) -> str:
...
Input Parameters:
- cmd: Stata command name (required, e.g., "regress", "describe", "xtset")
Return Structure: String containing Stata help text output with optional cache status prefix (e.g., "Cached result for regress: ...")
Operational Examples:
# Regression command help
help("regress")
# Panel data commands
help("xtset")
help("xtreg")
# Data management
help("merge")
help("reshape")
Implementation Architecture:
The tool implements Stata command documentation retrieval through CLI invocation with caching layer. Documentation requests execute Stata in batch mode with help <cmd> command, capturing stdout for return value. The StataHelp class manages invocation through platform-specific Stata CLI paths detected by StataFinder.
Caching architecture maintains help text cache at ~/.statamcp/help/ directory with file-based storage keyed by command name. Cache behavior controllable via environment variables: STATA_MCP_CACHE_HELP (default: true) enables/disables caching; STATA_MCP_SAVE_HELP controls cache persistence. Cached results include prefix message indicating cache status: "Cached result for {cmd}: ..." versus live help text.
Dual decoration pattern registers tool as both MCP resource and executable function. Resource URI pattern help://stata/{cmd} enables URI-based access through MCP resource protocol, while function decorator @stata_mcp.tool() enables direct invocation. This dual registration provides flexible access patterns for different MCP client implementations.
Cache invalidation requires manual deletion of cache files or environment variable configuration; no TTL-based expiration exists. Help text language depends on Stata installation locale; multilingual support requires separate Stata installations or locale reconfiguration.