Domain & Models
Q1: What fields does PatientRecord capture and how does it infer categoria?
The dataclass in src/core/models.py stores id_paciente, nombre, fecha_nacimiento, edad, sexo, email, telefono, ciudad, and categoria; PatientRecord.from_dict calls categorize_patient_age to compute categoria from the provided age or birthdate.
Q2: Which values does PatientCategory expose and when does code pick UNKNOWN?
The enum lists CHILD, ADULT, SENIOR, and UNKNOWN; categorize_patient_age returns UNKNOWN when neither an integer age nor a parsable fecha_nacimiento is present.
Q3: What strategy does categorize_patient_age apply when given an age versus a birthdate?
If edad is an int it delegates to categorize_by_value; otherwise it tries to parse ISO-formatted fecha_nacimiento, calculates the difference versus the current UTC year, and defaults to UNKNOWN when parsing fails.
Q4: How does categorize_by_value bucket ages?
Values below 18 map to CHILD, values below 65 map to ADULT, and all others become SENIOR.
Q5: What metrics does CompletenessMetric track for each field?
It records the field, the total and missing counters, the completeness ratio, and dictionaries per_city_missing and per_category_missing so stakeholders can see percentages per city and per patient category.
Q6: What information does ImputationPlan describe?
The dataclass bundles a target field, the chosen strategy, and a rationale string describing the recommended fix for missing data.
Q7: What does AgeCorrectionLogEntry record for inconsistencies?
Each entry logs id_paciente, nombre, fecha_nacimiento, the registered and calculated ages, an action such as inconsistent_age or imputed_age, and a descriptive note.
Q8: How does AppointmentRecord.from_dict normalize the ciudad value?
It prefers ciudad and falls back to ciudad_cita so each appointment retains a city even when only the appointment location is supplied.
Q9: Which bucket metadata does AppointmentIndicatorEntry expose?
The entry captures period_type, period_value, especialidad, estado_cita, medico, and the aggregated count.
Q10: What summary elements live inside CostAuditReport?
The report stores total_records, analyzed_records, a list of summaries (SpecialtyCostSummary) with averages and deviations, and a list of anomalies (CostAnomalyEntry).
Q11: What does PatientTravelEntry log for each traveler?
It records id_paciente, nombre, residence, the travel_cities set, travel_count, the computed severity, and last_travel_dates.
Q12: How is the business rule catalog modeled?
A BusinessRule provides an id, title, description, and flexible details; BusinessRulesCatalog aggregates those rules with a created_at timestamp.
Repositories & ETL
Q13: What abstract operations do PatientRepository and AppointmentRepository export?
Each port defines a single abstract method, list_patients or list_appointments, so the core services depend on interfaces instead of concrete adapters.
Q14: How do the JSON repositories load records from disk?
They open the dataset path, parse JSON, fetch the pacientes or citas_medicas array, and yield domain objects through PatientRecord.from_dict or AppointmentRecord.from_dict.
Q15: Which dataset file does backend/app.py point to by default?
The module sets DATASET to BASE_PATH / "dataset_hospital 2 AWS.json", so every service call uses that JSON file unless it is replaced.
Q16: What steps does ETLPipelineService.run() perform?
It records a start timestamp, calls extract, transform, and load, enriches the summary with patient and appointment counts plus start/end/duration, persists metrics, and returns that summary dictionary.
Q17: How does transform handle orphan appointments?
After cleaning both dataframes, it applies a mask to drop rows whose id_paciente is not in the cleaned patient frame and notes the dropped id_citas as orphans.
Q18: Where are the cleaned tables written?
The load step writes pacientes_cleaned.csv/.parquet and citas_cleaned.csv/.parquet under reports/etl.
Q19: What happens if parquet support is unavailable?
The helper catches ImportError from df.to_parquet and writes an empty UTF-8 placeholder so the CSV still documents the table.
Q20: How does _persist_metrics keep track of ETL runs?
It appends an entry to reports/etl/etl_metrics.json with start/end times, duration, exported counts, and orphan total, creating the file when it does not exist.
Q21: How does the backend derive dataset and report directories?
It computes BASE_PATH as two levels above app.py and defines REPORT_DIR and SCRIPTS_DIR relative to that root.
Q22: How are automatable scripts discovered for the API?
The loader looks for scripts/run_*.py, reads each docstring, normalizes the key, and registers descriptors used by the /scripts endpoints.
TextNormalizationService
Q109: Which normalization steps does TextNormalizationService apply?
It strips whitespace, lowercases, and removes diacritics via unicodedata normalization to deliver the NORMALIZATION_METHOD.
Q110: Which fields are normalized per record?
It iterates over nombre and ciudad.
Q111: What does TextNormalizationReport include?
It returns total_records, a normalized_fields count, and the normalization log_entries.
Backend API & Scripts
Q115: What does GET /reports return?
It lists all JSON report filenames under reports by reading REPORT_DIR.glob("*.json").
Q116: How are HTML reports delivered?
GET /reports/{name}/html returns a FileResponse streaming the corresponding HTML file after checking the path exists.
Q117: How does POST /run/{case} execute built-in services?
The endpoint calls _service_runner with the requested case, runs the selected service (doctor notifications, utilization, patient travel, management KPIs, or ETL), and returns the summary dictionary or report.to_dict().
Q118: What metadata does GET /scripts return?
It returns ScriptInfo objects for each discovered script, including the normalized key, module path, description, and relative path.
Q119: What validations does /dataset/upload perform?
It ensures the uploaded file has content type application/json and is not empty before overwriting the dataset file.
Q120: What does GET /health provide?
A JSON payload with status ok and the dataset path string so callers can confirm the API and dataset location.